[ensembl-dev] complex variant effect predition

Ewan Birney birney at ebi.ac.uk
Wed Dec 15 10:27:53 GMT 2010



> The vcf format does also foresee an option to encode structural  
> variations, does anyone want to shed some light on how these  
> structural variations will be encoded in the ensembl API? Presuming  
> there already exists a consensus on that?
>
> At the moment Ensembl Variation offers only limited support for  
> structural variations; we store some structural variation locations  
> in our human, mouse and dog databases, but we do not carry any  
> further information, nor do we currently have the ability to predict  
> the effect of structural variations.
>
> When we do come to address this, the issues will obviously be a lot  
> more complex, since the size and variation type of structural  
> variations can lead to so many different consequences (exon loss,  
> gene loss, gene duplication, loss of regulatory region etc.)
>


But I think it's worth adding that this is definitely in our roadmap  
(consequences on structural variation); what I don't know
is how close it is to being done. I think I can speak for Fiona in  
saying that dbSNP releases which have a substantial human
update (like the current one) are a understandable big time sink and  
it's hard to provide accurate development prediction
through these times...



> Cheers
>
> Will
>
>
> Kind regards
>
> bram
>
> kind regards
>
> On 14-dec-2010, at 17:30, Will McLaren wrote:
>
>> Hi Bram,
>>
>> You're not alone in wondering about these more complex variation  
>> types!
>>
>> While the software does support them (provided you get the input  
>> format right!), the results may not always reflect the complexity,  
>> as the consequence types we currently return do not cover all  
>> possible events. You can however get a good sense of what is  
>> happening by carefully considering what is returned by the  
>> coordinate methods (cds_start, cds_end, cdna_start, cdna_end,  
>> translation_start, translation_end) and the pep_allele_string() and  
>> codons() methods.
>>
>> In determining the input format, you need to consider what region  
>> of the reference sequence is being affected, and what is replacing  
>> the reference. I would input this variant as:
>>
>>         1 2 3 4 5 6 7
>> ref: A C G T A G A
>> var: A C A - - G A
>>
>> You could view this as a SNP and a deletion (two events, as you  
>> describe), or as an unbalanced substitution (one event)
>>
>> As two events, this would have input (assuming chromosome 1 and  
>> coords as above for simplicity):
>>
>> 1  3  3  G/A  +
>> 1  4  5  TA/-  +
>>
>> As one event, which is how I would input this variation:
>>
>> 1  3  5  GTA/A  +
>>
>> So we are substituting GTA (bases 3-5 of the reference) with A.
>>
>> Although (and I guess this isn't a real example!), a better  
>> alignment would surely be:
>>
>>         1 2 3 4 5 6 7
>> ref: A C G T A G A
>> var: A C - - A G A
>>
>> where there's just one deletion event of bases 3 & 4. But that's  
>> just nit-picking!
>>
>> Hope this helps anyway
>>
>> Cheers
>>
>> Will McLaren
>> Ensembl Variation
>>
>>
>> On 14 December 2010 15:42, Bram De Wilde <gbramdewilde at gmail.com>  
>> wrote:
>> Hi everyone,
>>
>> While unraveling the complex variants that can be encoded in the  
>> vcf format (http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-40 
>> ) I came to realize that I don't know how to submit some of these  
>> complex alleles to the ensembl variation API for effect prediction.
>> For simple SNP's and indels the situation is clearly described in  
>> the help pages. The problem I seem to be having is with complex  
>> alleles:
>>
>> eg. when a SNP is directly followed by an deletion on a chromosome z
>> ref: ACGTAGA
>> var: ACA--GA
>>
>> this can be encoded as 2 variants:
>> chr start stop variant strand
>> z 3 3 G/A +
>> z 4 5 TA/- +
>> but clearly none of these  will have the functional consequence of  
>> the true allele namely:
>> z 3 5 GTA/A +
>> unfortunately this kind of allele does not seem to return any  
>> response from the variation API
>>
>> I can think of a simmilar situation for an insertion:
>> ref: ACG-TAGA
>> var: ACACTAGA
>>
>> where:
>> chr start stop variant strand
>> z 3 3 G/A +
>> z 4 3 -/C +
>> will not have the same consequence as
>> z 4 3 G/AC +
>>
>>
>> Or do I see this all wrong?
>> is there a way to submit alleles like these for effect prediction?
>>
>>
>> Kind regards,
>>
>>
>> Bram De Wilde, MD
>> Center for Medical Genetics Ghent (CMGG) Ghent University Hospital  
>> Medical Research Building (MRB), 2nd floor, room 120.050 De  
>> Pintelaan 185, B-9000 Ghent, Belgium +32 9 332 4812 (phone) | +32 9  
>> 332 6549 (fax) http://medgen.ugent.be/ Bram.DeWilde at UGent.be
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>>
>>
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev





More information about the Dev mailing list