[ensembl-dev] Prediction of consequence type for novel variants

Tue Dec 14 14:58:57 GMT 2010

Hi Will,

One more question about start/end positions in case of indels.

In the API document
(http://www.ensembl.org/info/docs/Pdoc/ensembl-variation/modules/Bio/EnsEMBL/Variation/VariationFeature.html),
it says:
    # Variation feature representing a 2bp insertion
    $vf = Bio::EnsEMBL::Variation::VariationFeature->new
       (-start   => 1522,
        -end     => 1521, # end = start-1 for insert
        -strand  => -1,
        -slice   => $slice,
        -allele_string => '-/AA',
        -variation_name => 'rs12111',
        -map_weight  => 1,
        -variation => $v2);

The example above is only for -1 strand?
How can I generalise to set -start and -end?

Cheers,
Sung

On 10 December 2010 11:41, Will McLaren <wm2 at ebi.ac.uk> wrote:
> Hi Sung
>
> The codons() method will work; it returns the codon something like:
>
> aGa/aCa
>
> where the base changed is in capital letters.
>
> Will
>
> On 10 December 2010 11:26, Sung Gong <sung at bio.cc> wrote:
>> Hi Will,
>>
>> Thanks for the paper. I appreciate your work.
>>
>> Before aware of your script, I used to get the corresponding codon and
>> the position (0, 1 or 2) where a single DNA variant occur using the
>> core API.
>> Any work-around for this?
>>
>> I found a 'codons' method from 'TranscriptVariation', but it is a
>> method of ConsequenceType?
>>
>> Thought better to ask you before going further.
>>
>> Cheers,
>> Sung
>>
>> On 9 December 2010 14:02, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>> Hi Sung,
>>>
>>> There is a publication referring to the system, but it does not go
>>> into great detail on the internal workings:
>>>
>>> http://bioinformatics.oxfordjournals.org/content/26/16/2069.abstract
>>>
>>> Here's an approximate flow of what happens in the API. The vast
>>> majority of the code used is in the Core module
>>> Bio::EnsEMBL::Utils::TranscriptAlleles.pm, mainly the methods
>>> type_variation() and apply_aa_change():
>>>
>>> - find overlapping transcripts (using $vf->feature_Slice and
>>> $slice->get_all_Transcripts), then for each transcript:
>>>
>>> - get transcript mapper and map variation's coordinates to cDNA, CDS and peptide
>>>
>>> - any variants that don't fall in the coding sequence are classified
>>> here (e.g. INTRONIC, UPSTREAM) and the flow ends
>>>
>>> - if variation falls in exon (i.e. has defined CDS coordinates),
>>> generate alternative codon(s) and resulting translation
>>>
>>> - compare translation to reference; classify as e.g.
>>> SYNONYMOUS_CODING, NON_SYNONYMOUS_CODING
>>>
>>> We are currently working on an overhaul to this system which should
>>> make it easier to comprehend by following the code.
>>>
>>> I would recommend trying to follow through the code in Perl's
>>> debugger, using the "perl -d" option.
>>>
>>> Hope this helps
>>>
>>> Will McLaren
>>> Ensembl Variation
>>>
>>> On 9 December 2010 13:19, Sung Gong <sung at bio.cc> wrote:
>>>> Hi,
>>>>
>>>> I was thrilled to find that Ensembl API provides a nice script
>>>> (ftp://ftp.ensembl.org/pub/misc-scripts/) which can predict the
>>>> consequence types of novel variations.
>>>> Also, good to see a good demonstration how to use the API for that purpose:
>>>> http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html
>>>>
>>>> Before realising the variation API can help predicting consequence
>>>> type of novel variants, I used to use only core API to map the
>>>> position of my variants to see whether they are within coding region,
>>>> intron, exon and so on.
>>>> Now, I wondered how the variation API works for that purpose - looked
>>>> at the source code, but found it is somewhat overwhelming.
>>>>
>>>> Can anybody explain how the novel prediction works internally under the hood?
>>>>
>>>> Cheers,
>>>> Sung
>>>>
>>>> _______________________________________________
>>>> Dev mailing list
>>>> Dev at ensembl.org
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>
>>>
>>
>