[ensembl-dev] complex variant effect predition
Ewan Birney
birney at ebi.ac.uk
Wed Dec 15 10:27:53 GMT 2010
> The vcf format does also foresee an option to encode structural
> variations, does anyone want to shed some light on how these
> structural variations will be encoded in the ensembl API? Presuming
> there already exists a consensus on that?
>
> At the moment Ensembl Variation offers only limited support for
> structural variations; we store some structural variation locations
> in our human, mouse and dog databases, but we do not carry any
> further information, nor do we currently have the ability to predict
> the effect of structural variations.
>
> When we do come to address this, the issues will obviously be a lot
> more complex, since the size and variation type of structural
> variations can lead to so many different consequences (exon loss,
> gene loss, gene duplication, loss of regulatory region etc.)
>
But I think it's worth adding that this is definitely in our roadmap
(consequences on structural variation); what I don't know
is how close it is to being done. I think I can speak for Fiona in
saying that dbSNP releases which have a substantial human
update (like the current one) are a understandable big time sink and
it's hard to provide accurate development prediction
through these times...
> Cheers
>
> Will
>
>
> Kind regards
>
> bram
>
> kind regards
>
> On 14-dec-2010, at 17:30, Will McLaren wrote:
>
>> Hi Bram,
>>
>> You're not alone in wondering about these more complex variation
>> types!
>>
>> While the software does support them (provided you get the input
>> format right!), the results may not always reflect the complexity,
>> as the consequence types we currently return do not cover all
>> possible events. You can however get a good sense of what is
>> happening by carefully considering what is returned by the
>> coordinate methods (cds_start, cds_end, cdna_start, cdna_end,
>> translation_start, translation_end) and the pep_allele_string() and
>> codons() methods.
>>
>> In determining the input format, you need to consider what region
>> of the reference sequence is being affected, and what is replacing
>> the reference. I would input this variant as:
>>
>> 1 2 3 4 5 6 7
>> ref: A C G T A G A
>> var: A C A - - G A
>>
>> You could view this as a SNP and a deletion (two events, as you
>> describe), or as an unbalanced substitution (one event)
>>
>> As two events, this would have input (assuming chromosome 1 and
>> coords as above for simplicity):
>>
>> 1 3 3 G/A +
>> 1 4 5 TA/- +
>>
>> As one event, which is how I would input this variation:
>>
>> 1 3 5 GTA/A +
>>
>> So we are substituting GTA (bases 3-5 of the reference) with A.
>>
>> Although (and I guess this isn't a real example!), a better
>> alignment would surely be:
>>
>> 1 2 3 4 5 6 7
>> ref: A C G T A G A
>> var: A C - - A G A
>>
>> where there's just one deletion event of bases 3 & 4. But that's
>> just nit-picking!
>>
>> Hope this helps anyway
>>
>> Cheers
>>
>> Will McLaren
>> Ensembl Variation
>>
>>
>> On 14 December 2010 15:42, Bram De Wilde <gbramdewilde at gmail.com>
>> wrote:
>> Hi everyone,
>>
>> While unraveling the complex variants that can be encoded in the
>> vcf format (http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-40
>> ) I came to realize that I don't know how to submit some of these
>> complex alleles to the ensembl variation API for effect prediction.
>> For simple SNP's and indels the situation is clearly described in
>> the help pages. The problem I seem to be having is with complex
>> alleles:
>>
>> eg. when a SNP is directly followed by an deletion on a chromosome z
>> ref: ACGTAGA
>> var: ACA--GA
>>
>> this can be encoded as 2 variants:
>> chr start stop variant strand
>> z 3 3 G/A +
>> z 4 5 TA/- +
>> but clearly none of these will have the functional consequence of
>> the true allele namely:
>> z 3 5 GTA/A +
>> unfortunately this kind of allele does not seem to return any
>> response from the variation API
>>
>> I can think of a simmilar situation for an insertion:
>> ref: ACG-TAGA
>> var: ACACTAGA
>>
>> where:
>> chr start stop variant strand
>> z 3 3 G/A +
>> z 4 3 -/C +
>> will not have the same consequence as
>> z 4 3 G/AC +
>>
>>
>> Or do I see this all wrong?
>> is there a way to submit alleles like these for effect prediction?
>>
>>
>> Kind regards,
>>
>>
>> Bram De Wilde, MD
>> Center for Medical Genetics Ghent (CMGG) Ghent University Hospital
>> Medical Research Building (MRB), 2nd floor, room 120.050 De
>> Pintelaan 185, B-9000 Ghent, Belgium +32 9 332 4812 (phone) | +32 9
>> 332 6549 (fax) http://medgen.ugent.be/ Bram.DeWilde at UGent.be
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>>
>>
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
More information about the Dev
mailing list