[ensembl-dev] complex variant effect predition

Will McLaren wm2 at ebi.ac.uk
Wed Dec 15 10:18:17 GMT 2010


Hi Bram (recopied in dev list for others' benefit),

On 14 December 2010 22:13, Bram De Wilde <gbramdewilde at gmail.com> wrote:

> Thanx Will for a quick response,
>
> Following your (and I presume the ensembl team's) logic my assumption for
> an insertion following a substitution would then be wrong since in that case
> we just replace a single base with multiple bases resulting in a
> z 3 3 G/AC +
>

Yes this is the correct format for this variation.


> instead of my suggestion of
> z 4 3 G/AC +
>
> for this situation:
> 1 2 3 - 4 5 6 7
> ref: A C G - T A C A
> var: A C A C T A C A
>
> The vcf format does also foresee an option to encode structural variations,
> does anyone want to shed some light on how these structural variations will
> be encoded in the ensembl API? Presuming there already exists a consensus on
> that?
>

At the moment Ensembl Variation offers only limited support for structural
variations; we store some structural variation locations in our human, mouse
and dog databases, but we do not carry any further information, nor do we
currently have the ability to predict the effect of structural variations.

When we do come to address this, the issues will obviously be a lot more
complex, since the size and variation type of structural variations can lead
to so many different consequences (exon loss, gene loss, gene duplication,
loss of regulatory region etc.)

Cheers

Will


>
> Kind regards
>
> bram
>
> kind regards
>
> On 14-dec-2010, at 17:30, Will McLaren wrote:
>
> Hi Bram,
>
> You're not alone in wondering about these more complex variation types!
>
> While the software does support them (provided you get the input format
> right!), the results may not always reflect the complexity, as the
> consequence types we currently return do not cover all possible events. You
> can however get a good sense of what is happening by carefully considering
> what is returned by the coordinate methods (cds_start, cds_end, cdna_start,
> cdna_end, translation_start, translation_end) and the pep_allele_string()
> and codons() methods.
>
> In determining the input format, you need to consider what region of the
> reference sequence is being affected, and what is replacing the reference. I
> would input this variant as:
>
>         1 2 3 4 5 6 7
> ref: A C G T A G A
> var: A C A - - G A
>
> You could view this as a SNP and a deletion (two events, as you describe),
> or as an unbalanced substitution (one event)
>
> As two events, this would have input (assuming chromosome 1 and coords as
> above for simplicity):
>
> 1  3  3  G/A  +
> 1  4  5  TA/-  +
>
> As one event, which is how I would input this variation:
>
> 1  3  5  GTA/A  +
>
> So we are substituting GTA (bases 3-5 of the reference) with A.
>
> Although (and I guess this isn't a real example!), a better alignment would
> surely be:
>
>         1 2 3 4 5 6 7
> ref: A C G T A G A
> var: A C - - A G A
>
> where there's just one deletion event of bases 3 & 4. But that's just
> nit-picking!
>
> Hope this helps anyway
>
> Cheers
>
> Will McLaren
> Ensembl Variation
>
>
> On 14 December 2010 15:42, Bram De Wilde <gbramdewilde at gmail.com> wrote:
>
>> Hi everyone,
>>
>> While unraveling the complex variants that can be encoded in the vcf
>> format (
>> http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-40)
>> I came to realize that I don't know how to submit some of these complex
>> alleles to the ensembl variation API for effect prediction.
>> For simple SNP's and indels the situation is clearly described in the help
>> pages. The problem I seem to be having is with complex alleles:
>>
>> eg. when a SNP is directly followed by an deletion on a chromosome z
>> ref: ACGTAGA
>> var: ACA--GA
>>
>> this can be encoded as 2 variants:
>> chr start stop variant strand
>> z 3 3 G/A +
>> z 4 5 TA/- +
>> but clearly none of these  will have the functional consequence of the
>> true allele namely:
>> z 3 5 GTA/A +
>> unfortunately this kind of allele does not seem to return any response
>> from the variation API
>>
>> I can think of a simmilar situation for an insertion:
>> ref: ACG-TAGA
>> var: ACACTAGA
>>
>> where:
>> chr start stop variant strand
>> z 3 3 G/A +
>> z 4 3 -/C +
>> will not have the same consequence as
>> z 4 3 G/AC +
>>
>>
>> Or do I see this all wrong?
>> is there a way to submit alleles like these for effect prediction?
>>
>>
>> Kind regards,
>>
>>
>> Bram De Wilde, MD
>>  Center for Medical Genetics Ghent (CMGG) Ghent University Hospital
>> Medical Research Building (MRB), 2nd floor, room 120.050 De Pintelaan 185,
>> B-9000 Ghent, Belgium +32 9 332 4812 (phone) | +32 9 332 6549 (fax)
>> http://medgen.ugent.be/ Bram.DeWilde at UGent.be
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20101215/d3b447b6/attachment.html>


More information about the Dev mailing list