[ensembl-dev] complex variant effect predition

Fiona Cunningham fiona at ebi.ac.uk
Fri Dec 17 16:14:16 GMT 2010


 While structural variant consequence prediction is on our road map
our immediate priorities for 2011 are  modifying the database in order
to be able to attach consequences to specific alleles rather than to
only a variation location. We are also going to extend the short
sequence variation consequences to look at motif changes in regulatory
regions and accept and PolyPHEN predictions into our pipeline
(hopefully for release 62). Therefore, I think it will be towards mid-
2011 for the structural variant consequence prediction.

Fiona

------------------------------------------------------
Fiona Cunningham
Ensembl Variation Project Leader, EBI
www.ensembl.org
www.lrg-sequence.org
t: 01223 494612 || e: fiona at ebi.ac.uk



On 15 December 2010 10:27, Ewan Birney <birney at ebi.ac.uk> wrote:
>
>
>> The vcf format does also foresee an option to encode structural
>> variations, does anyone want to shed some light on how these structural
>> variations will be encoded in the ensembl API? Presuming there already
>> exists a consensus on that?
>>
>> At the moment Ensembl Variation offers only limited support for structural
>> variations; we store some structural variation locations in our human, mouse
>> and dog databases, but we do not carry any further information, nor do we
>> currently have the ability to predict the effect of structural variations.
>>
>> When we do come to address this, the issues will obviously be a lot more
>> complex, since the size and variation type of structural variations can lead
>> to so many different consequences (exon loss, gene loss, gene duplication,
>> loss of regulatory region etc.)
>>
>
>
> But I think it's worth adding that this is definitely in our roadmap
> (consequences on structural variation); what I don't know
> is how close it is to being done. I think I can speak for Fiona in saying
> that dbSNP releases which have a substantial human
> update (like the current one) are a understandable big time sink and it's
> hard to provide accurate development prediction
> through these times...
>
>
>
>> Cheers
>>
>> Will
>>
>>
>> Kind regards
>>
>> bram
>>
>> kind regards
>>
>> On 14-dec-2010, at 17:30, Will McLaren wrote:
>>
>>> Hi Bram,
>>>
>>> You're not alone in wondering about these more complex variation types!
>>>
>>> While the software does support them (provided you get the input format
>>> right!), the results may not always reflect the complexity, as the
>>> consequence types we currently return do not cover all possible events. You
>>> can however get a good sense of what is happening by carefully considering
>>> what is returned by the coordinate methods (cds_start, cds_end, cdna_start,
>>> cdna_end, translation_start, translation_end) and the pep_allele_string()
>>> and codons() methods.
>>>
>>> In determining the input format, you need to consider what region of the
>>> reference sequence is being affected, and what is replacing the reference. I
>>> would input this variant as:
>>>
>>>        1 2 3 4 5 6 7
>>> ref: A C G T A G A
>>> var: A C A - - G A
>>>
>>> You could view this as a SNP and a deletion (two events, as you
>>> describe), or as an unbalanced substitution (one event)
>>>
>>> As two events, this would have input (assuming chromosome 1 and coords as
>>> above for simplicity):
>>>
>>> 1  3  3  G/A  +
>>> 1  4  5  TA/-  +
>>>
>>> As one event, which is how I would input this variation:
>>>
>>> 1  3  5  GTA/A  +
>>>
>>> So we are substituting GTA (bases 3-5 of the reference) with A.
>>>
>>> Although (and I guess this isn't a real example!), a better alignment
>>> would surely be:
>>>
>>>        1 2 3 4 5 6 7
>>> ref: A C G T A G A
>>> var: A C - - A G A
>>>
>>> where there's just one deletion event of bases 3 & 4. But that's just
>>> nit-picking!
>>>
>>> Hope this helps anyway
>>>
>>> Cheers
>>>
>>> Will McLaren
>>> Ensembl Variation
>>>
>>>
>>> On 14 December 2010 15:42, Bram De Wilde <gbramdewilde at gmail.com> wrote:
>>> Hi everyone,
>>>
>>> While unraveling the complex variants that can be encoded in the vcf
>>> format
>>> (http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-40)
>>> I came to realize that I don't know how to submit some of these complex
>>> alleles to the ensembl variation API for effect prediction.
>>> For simple SNP's and indels the situation is clearly described in the
>>> help pages. The problem I seem to be having is with complex alleles:
>>>
>>> eg. when a SNP is directly followed by an deletion on a chromosome z
>>> ref: ACGTAGA
>>> var: ACA--GA
>>>
>>> this can be encoded as 2 variants:
>>> chr start stop variant strand
>>> z 3 3 G/A +
>>> z 4 5 TA/- +
>>> but clearly none of these  will have the functional consequence of the
>>> true allele namely:
>>> z 3 5 GTA/A +
>>> unfortunately this kind of allele does not seem to return any response
>>> from the variation API
>>>
>>> I can think of a simmilar situation for an insertion:
>>> ref: ACG-TAGA
>>> var: ACACTAGA
>>>
>>> where:
>>> chr start stop variant strand
>>> z 3 3 G/A +
>>> z 4 3 -/C +
>>> will not have the same consequence as
>>> z 4 3 G/AC +
>>>
>>>
>>> Or do I see this all wrong?
>>> is there a way to submit alleles like these for effect prediction?
>>>
>>>
>>> Kind regards,
>>>
>>>
>>> Bram De Wilde, MD
>>> Center for Medical Genetics Ghent (CMGG) Ghent University Hospital
>>> Medical Research Building (MRB), 2nd floor, room 120.050 De Pintelaan 185,
>>> B-9000 Ghent, Belgium +32 9 332 4812 (phone) | +32 9 332 6549 (fax)
>>> http://medgen.ugent.be/ Bram.DeWilde at UGent.be
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list
>>> Dev at ensembl.org
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>
>>>
>>
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
>




More information about the Dev mailing list