[ensembl-dev] Prediction of consequence type for novel variants

Sung Gong sung at bio.cc
Sat Oct 22 14:55:22 BST 2011


Correction:
POS REF ALT
1      ACG AT

On 22 October 2011 14:53, Sung Gong <sung at bio.cc> wrote:
> Hi,
>
> I thought it's better to follow this thread rather than making another.
>
> Just wondering how to express complex types in terms of Ensembl API
> language (esp. VariationFeature).
> Fore example:
> ATGC
> A-TT
>
> VCF format says:
> POS REF ALT
> 1      AGT AT
>
> Cheers,
> Sung
>
> On 14 December 2010 15:15, Will McLaren <wm2 at ebi.ac.uk> wrote:
>> The coordinates for a deletion reflect the bases of the reference deleted:
>>
>> 1 2 3 4 5
>> A A C T G
>>
>> A deletion of bases 2, 3 and 4 would have start = 2, end = 4 and an
>> allele_string of ACT/- (this is the same even for the negative strand).
>>
>> Generally in Ensembl if a feature spans some region of DNA, start is always
>> less than or equal to end (it is equal to end for features of length 1, such
>> as SNPs).
>>
>> Start is only greater than end for insertions, since they occur _between_
>> bases of the reference sequence.
>> Cheers
>>
>> Will
>>
>> On 14 December 2010 15:10, Sung Gong <sung at bio.cc> wrote:
>>> Start 1 smaller than end for a deletion?
>>>
>>>
>>> On 14 December 2010 15:03, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>> Hi Sung,
>>>>
>>>> The coordinates would be the same regardless of the strand.
>>>>
>>>> Start is _always_ 1 greater than end for an insertion, regardless of
>>>> strand or the size of the insertion.
>>>>
>>>> Will
>>>>
>>>> On 14 December 2010 14:58, Sung Gong <sung at bio.cc> wrote:
>>>>> Hi Will,
>>>>>
>>>>> One more question about start/end positions in case of indels.
>>>>>
>>>>> In the API document
>>>>>
>>>>> (http://www.ensembl.org/info/docs/Pdoc/ensembl-variation/modules/Bio/EnsEMBL/Variation/VariationFeature.html),
>>>>> it says:
>>>>>    # Variation feature representing a 2bp insertion
>>>>>    $vf = Bio::EnsEMBL::Variation::VariationFeature->new
>>>>>       (-start   => 1522,
>>>>>        -end     => 1521, # end = start-1 for insert
>>>>>        -strand  => -1,
>>>>>        -slice   => $slice,
>>>>>        -allele_string => '-/AA',
>>>>>        -variation_name => 'rs12111',
>>>>>        -map_weight  => 1,
>>>>>        -variation => $v2);
>>>>>
>>>>> The example above is only for -1 strand?
>>>>> How can I generalise to set -start and -end?
>>>>>
>>>>> Cheers,
>>>>> Sung
>>>>>
>>>>> On 10 December 2010 11:41, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>>>> Hi Sung
>>>>>>
>>>>>> The codons() method will work; it returns the codon something like:
>>>>>>
>>>>>> aGa/aCa
>>>>>>
>>>>>> where the base changed is in capital letters.
>>>>>>
>>>>>> Will
>>>>>>
>>>>>> On 10 December 2010 11:26, Sung Gong <sung at bio.cc> wrote:
>>>>>>> Hi Will,
>>>>>>>
>>>>>>> Thanks for the paper. I appreciate your work.
>>>>>>>
>>>>>>> Before aware of your script, I used to get the corresponding codon and
>>>>>>> the position (0, 1 or 2) where a single DNA variant occur using the
>>>>>>> core API.
>>>>>>> Any work-around for this?
>>>>>>>
>>>>>>> I found a 'codons' method from 'TranscriptVariation', but it is a
>>>>>>> method of ConsequenceType?
>>>>>>>
>>>>>>> Thought better to ask you before going further.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Sung
>>>>>>>
>>>>>>> On 9 December 2010 14:02, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>>>>>> Hi Sung,
>>>>>>>>
>>>>>>>> There is a publication referring to the system, but it does not go
>>>>>>>> into great detail on the internal workings:
>>>>>>>>
>>>>>>>> http://bioinformatics.oxfordjournals.org/content/26/16/2069.abstract
>>>>>>>>
>>>>>>>> Here's an approximate flow of what happens in the API. The vast
>>>>>>>> majority of the code used is in the Core module
>>>>>>>> Bio::EnsEMBL::Utils::TranscriptAlleles.pm, mainly the methods
>>>>>>>> type_variation() and apply_aa_change():
>>>>>>>>
>>>>>>>> - find overlapping transcripts (using $vf->feature_Slice and
>>>>>>>> $slice->get_all_Transcripts), then for each transcript:
>>>>>>>>
>>>>>>>> - get transcript mapper and map variation's coordinates to cDNA, CDS
>>>>>>>> and peptide
>>>>>>>>
>>>>>>>> - any variants that don't fall in the coding sequence are classified
>>>>>>>> here (e.g. INTRONIC, UPSTREAM) and the flow ends
>>>>>>>>
>>>>>>>> - if variation falls in exon (i.e. has defined CDS coordinates),
>>>>>>>> generate alternative codon(s) and resulting translation
>>>>>>>>
>>>>>>>> - compare translation to reference; classify as e.g.
>>>>>>>> SYNONYMOUS_CODING, NON_SYNONYMOUS_CODING
>>>>>>>>
>>>>>>>> We are currently working on an overhaul to this system which should
>>>>>>>> make it easier to comprehend by following the code.
>>>>>>>>
>>>>>>>> I would recommend trying to follow through the code in Perl's
>>>>>>>> debugger, using the "perl -d" option.
>>>>>>>>
>>>>>>>> Hope this helps
>>>>>>>>
>>>>>>>> Will McLaren
>>>>>>>> Ensembl Variation
>>>>>>>>
>>>>>>>> On 9 December 2010 13:19, Sung Gong <sung at bio.cc> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I was thrilled to find that Ensembl API provides a nice script
>>>>>>>>> (ftp://ftp.ensembl.org/pub/misc-scripts/) which can predict the
>>>>>>>>> consequence types of novel variations.
>>>>>>>>> Also, good to see a good demonstration how to use the API for that
>>>>>>>>> purpose:
>>>>>>>>>
>>>>>>>>> http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html
>>>>>>>>>
>>>>>>>>> Before realising the variation API can help predicting consequence
>>>>>>>>> type of novel variants, I used to use only core API to map the
>>>>>>>>> position of my variants to see whether they are within coding
>>>>>>>>> region,
>>>>>>>>> intron, exon and so on.
>>>>>>>>> Now, I wondered how the variation API works for that purpose -
>>>>>>>>> looked
>>>>>>>>> at the source code, but found it is somewhat overwhelming.
>>>>>>>>>
>>>>>>>>> Can anybody explain how the novel prediction works internally under
>>>>>>>>> the hood?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Sung
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Dev mailing list
>>>>>>>>> Dev at ensembl.org
>>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>




More information about the Dev mailing list