[ensembl-dev] Exon info missing for indels

Will McLaren wm2 at ebi.ac.uk
Thu Dec 4 14:35:52 GMT 2014


Hi Konrad,

This is an interesting case. The insertion in VCF notation starts at
position 141459861. When the VEP converts this internally to Ensembl
notation (see
http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcf), it
becomes an insertion between positions 141459861 and 141459862. This is
represented in the Ensembl convention with start=141459862 and end=
141459861.

The exon ends at position 141459861 and the following intron starts at
141459862.
This means that the ALT sequence is inserted between these positions,
meaning that it falls in neither the exon nor the intron according to the
reference coordinates and the algorithm the API uses to determine the
overlap.

I've illustrated this (very badly) here
http://www.ebi.ac.uk/~wm2/insertion.png

I don't know what people would consider to be the right approach here.
Possibly since depending on the sequence inserted, biologically the
inserted sequence could "join" either the exon or intron, we should report
that it falls in both? Happy to take feedback anyway.

Regards

Will



On 4 December 2014 at 02:29, Konrad Karczewski <konradk at broadinstitute.org>
wrote:

> Hi dev team,
>
> I'm getting some strange behavior when using VEP to annotate certain
> indels. In particular, the EXON field doesn't get populated for some
> reason. See below: a SNP at the same position looks fine (exon 39/91),
> while an indel doesn't get the same annotation. Any ideas about this?
>
> Thanks!
> -Konrad
>
> 2       141459861       .       C       A       2
>  VQSRTrancheINDEL99.50to99.90
> AC=1;CSQ=A|ENSG00000168702|ENST00000389484|Transcript|stop_gained&splice_region_variant|7123|6151|2051|E/*|Gaa/Taa||1||-1|LRP1B|HGNC|6693|protein_coding|YES|CCDS2182.1|ENSP00000374135|LRP1B_HUMAN|Q8WY27_HUMAN&Q8WY26_HUMAN&Q580W7_HUMAN&Q53TB8_HUMAN&Q53S76_HUMAN&Q53S73_HUMAN&Q53S26_HUMAN&Q53RL0_HUMAN&Q53RG4_HUMAN&Q53RA0_HUMAN&Q53QP5_HUMAN&Q53QM8_HUMAN&Q4ZG53_HUMAN&Q4ZFV5_HUMAN|UPI00001B045B|||39/91||PROSITE_profiles:PS51120&SMART_domains:SM00135&Superfamily_domains:SSF63825|ENST00000389484.3:c.6151G>T|ENSP00000374135.3:p.Glu2051Ter|||||||||||||||POSITION:0.445724637681159|||HC
> 2       141459861       .       C       CATAAGTATTTGAGT 363.89
> VQSRTrancheINDEL99.50to99.90
> AC=1;CSQ=ATAAGTATTTGAGT|ENSG00000168702|ENST00000389484|Transcript|frameshift_variant&splice_region_variant&feature_elongation|7122-7123|6150-6151|2050-2051|-/TQILX|-/ACTCAAATACTTAT||1||-1|LRP1B|HGNC|6693|protein_coding|YES|CCDS2182.1|ENSP00000374135|LRP1B_HUMAN|Q8WY27_HUMAN&Q8WY26_HUMAN&Q580W7_HUMAN&Q53TB8_HUMAN&Q53S76_HUMAN&Q53S73_HUMAN&Q53S26_HUMAN&Q53RL0_HUMAN&Q53RG4_HUMAN&Q53RA0_HUMAN&Q53QP5_HUMAN&Q53QM8_HUMAN&Q4ZG53_HUMAN&Q4ZFV5_HUMAN|UPI00001B045B|||||PROSITE_profiles:PS51120&SMART_domains:SM00135&Superfamily_domains:SSF63825|ENST00000389484.3:c.6151-1_6151insACTCAAATACTTAT|ENSP00000374135.3:p.Glu2051ThrfsTer19|||||||||||||||POSITION:0.445652173913043||EXON_INTRON_UNDEF|LC
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141204/30c74f81/attachment.html>


More information about the Dev mailing list