[ensembl-dev] complex variant effect predition

Will McLaren wm2 at ebi.ac.uk
Tue Dec 14 16:30:46 GMT 2010


Hi Bram,

You're not alone in wondering about these more complex variation types!

While the software does support them (provided you get the input format
right!), the results may not always reflect the complexity, as the
consequence types we currently return do not cover all possible events. You
can however get a good sense of what is happening by carefully considering
what is returned by the coordinate methods (cds_start, cds_end, cdna_start,
cdna_end, translation_start, translation_end) and the pep_allele_string()
and codons() methods.

In determining the input format, you need to consider what region of the
reference sequence is being affected, and what is replacing the reference. I
would input this variant as:

        1 2 3 4 5 6 7
ref: A C G T A G A
var: A C A - - G A

You could view this as a SNP and a deletion (two events, as you describe),
or as an unbalanced substitution (one event)

As two events, this would have input (assuming chromosome 1 and coords as
above for simplicity):

1  3  3  G/A  +
1  4  5  TA/-  +

As one event, which is how I would input this variation:

1  3  5  GTA/A  +

So we are substituting GTA (bases 3-5 of the reference) with A.

Although (and I guess this isn't a real example!), a better alignment would
surely be:

        1 2 3 4 5 6 7
ref: A C G T A G A
var: A C - - A G A

where there's just one deletion event of bases 3 & 4. But that's just
nit-picking!

Hope this helps anyway

Cheers

Will McLaren
Ensembl Variation


On 14 December 2010 15:42, Bram De Wilde <gbramdewilde at gmail.com> wrote:

> Hi everyone,
>
> While unraveling the complex variants that can be encoded in the vcf format
> (
> http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-40)
> I came to realize that I don't know how to submit some of these complex
> alleles to the ensembl variation API for effect prediction.
> For simple SNP's and indels the situation is clearly described in the help
> pages. The problem I seem to be having is with complex alleles:
>
> eg. when a SNP is directly followed by an deletion on a chromosome z
> ref: ACGTAGA
> var: ACA--GA
>
> this can be encoded as 2 variants:
> chr start stop variant strand
> z 3 3 G/A +
> z 4 5 TA/- +
> but clearly none of these  will have the functional consequence of the true
> allele namely:
> z 3 5 GTA/A +
> unfortunately this kind of allele does not seem to return any response
> from the variation API
>
> I can think of a simmilar situation for an insertion:
> ref: ACG-TAGA
> var: ACACTAGA
>
> where:
> chr start stop variant strand
> z 3 3 G/A +
> z 4 3 -/C +
> will not have the same consequence as
> z 4 3 G/AC +
>
>
> Or do I see this all wrong?
> is there a way to submit alleles like these for effect prediction?
>
>
> Kind regards,
>
>
> Bram De Wilde, MD
> Center for Medical Genetics Ghent (CMGG) Ghent University Hospital Medical
> Research Building (MRB), 2nd floor, room 120.050 De Pintelaan 185, B-9000
> Ghent, Belgium +32 9 332 4812 (phone) | +32 9 332 6549 (fax)
> http://medgen.ugent.be/ Bram.DeWilde at UGent.be
>
>
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20101214/cf4b8cd2/attachment.html>


More information about the Dev mailing list