[ensembl-dev] Prediction of consequence type for novel variants

Tue Dec 14 15:15:56 GMT 2010

The coordinates for a deletion reflect the bases of the reference deleted:

1 2 3 4 5
A A C T G

A deletion of bases 2, 3 and 4 would have start = 2, end = 4 and an
allele_string of ACT/- (this is the same even for the negative strand).

Generally in Ensembl if a feature spans some region of DNA, start is always
less than or equal to end (it is equal to end for features of length 1, such
as SNPs).

Start is only greater than end for insertions, since they occur _between_
bases of the reference sequence.

Cheers

Will

On 14 December 2010 15:10, Sung Gong <sung at bio.cc> wrote:
> Start 1 smaller than end for a deletion?
>
>
> On 14 December 2010 15:03, Will McLaren <wm2 at ebi.ac.uk> wrote:
>> Hi Sung,
>>
>> The coordinates would be the same regardless of the strand.
>>
>> Start is _always_ 1 greater than end for an insertion, regardless of
>> strand or the size of the insertion.
>>
>> Will
>>
>> On 14 December 2010 14:58, Sung Gong <sung at bio.cc> wrote:
>>> Hi Will,
>>>
>>> One more question about start/end positions in case of indels.
>>>
>>> In the API document
>>> (
http://www.ensembl.org/info/docs/Pdoc/ensembl-variation/modules/Bio/EnsEMBL/Variation/VariationFeature.html
),
>>> it says:
>>>    # Variation feature representing a 2bp insertion
>>>    $vf = Bio::EnsEMBL::Variation::VariationFeature->new
>>>       (-start   => 1522,
>>>        -end     => 1521, # end = start-1 for insert
>>>        -strand  => -1,
>>>        -slice   => $slice,
>>>        -allele_string => '-/AA',
>>>        -variation_name => 'rs12111',
>>>        -map_weight  => 1,
>>>        -variation => $v2);
>>>
>>> The example above is only for -1 strand?
>>> How can I generalise to set -start and -end?
>>>
>>> Cheers,
>>> Sung
>>>
>>> On 10 December 2010 11:41, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>> Hi Sung
>>>>
>>>> The codons() method will work; it returns the codon something like:
>>>>
>>>> aGa/aCa
>>>>
>>>> where the base changed is in capital letters.
>>>>
>>>> Will
>>>>
>>>> On 10 December 2010 11:26, Sung Gong <sung at bio.cc> wrote:
>>>>> Hi Will,
>>>>>
>>>>> Thanks for the paper. I appreciate your work.
>>>>>
>>>>> Before aware of your script, I used to get the corresponding codon and
>>>>> the position (0, 1 or 2) where a single DNA variant occur using the
>>>>> core API.
>>>>> Any work-around for this?
>>>>>
>>>>> I found a 'codons' method from 'TranscriptVariation', but it is a
>>>>> method of ConsequenceType?
>>>>>
>>>>> Thought better to ask you before going further.
>>>>>
>>>>> Cheers,
>>>>> Sung
>>>>>
>>>>> On 9 December 2010 14:02, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>>>> Hi Sung,
>>>>>>
>>>>>> There is a publication referring to the system, but it does not go
>>>>>> into great detail on the internal workings:
>>>>>>
>>>>>> http://bioinformatics.oxfordjournals.org/content/26/16/2069.abstract
>>>>>>
>>>>>> Here's an approximate flow of what happens in the API. The vast
>>>>>> majority of the code used is in the Core module
>>>>>> Bio::EnsEMBL::Utils::TranscriptAlleles.pm, mainly the methods
>>>>>> type_variation() and apply_aa_change():
>>>>>>
>>>>>> - find overlapping transcripts (using $vf->feature_Slice and
>>>>>> $slice->get_all_Transcripts), then for each transcript:
>>>>>>
>>>>>> - get transcript mapper and map variation's coordinates to cDNA, CDS
and peptide
>>>>>>
>>>>>> - any variants that don't fall in the coding sequence are classified
>>>>>> here (e.g. INTRONIC, UPSTREAM) and the flow ends
>>>>>>
>>>>>> - if variation falls in exon (i.e. has defined CDS coordinates),
>>>>>> generate alternative codon(s) and resulting translation
>>>>>>
>>>>>> - compare translation to reference; classify as e.g.
>>>>>> SYNONYMOUS_CODING, NON_SYNONYMOUS_CODING
>>>>>>
>>>>>> We are currently working on an overhaul to this system which should
>>>>>> make it easier to comprehend by following the code.
>>>>>>
>>>>>> I would recommend trying to follow through the code in Perl's
>>>>>> debugger, using the "perl -d" option.
>>>>>>
>>>>>> Hope this helps
>>>>>>
>>>>>> Will McLaren
>>>>>> Ensembl Variation
>>>>>>
>>>>>> On 9 December 2010 13:19, Sung Gong <sung at bio.cc> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was thrilled to find that Ensembl API provides a nice script
>>>>>>> (ftp://ftp.ensembl.org/pub/misc-scripts/) which can predict the
>>>>>>> consequence types of novel variations.
>>>>>>> Also, good to see a good demonstration how to use the API for that
purpose:
>>>>>>>
http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html
>>>>>>>
>>>>>>> Before realising the variation API can help predicting consequence
>>>>>>> type of novel variants, I used to use only core API to map the
>>>>>>> position of my variants to see whether they are within coding
region,
>>>>>>> intron, exon and so on.
>>>>>>> Now, I wondered how the variation API works for that purpose -
looked
>>>>>>> at the source code, but found it is somewhat overwhelming.
>>>>>>>
>>>>>>> Can anybody explain how the novel prediction works internally under
the hood?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Sung
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Dev mailing list
>>>>>>> Dev at ensembl.org
>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20101214/c573286b/attachment.html>