[ensembl-dev] [VEP] Bogus annotation in variant from Cosmic

Will McLaren wm2 at ebi.ac.uk
Mon Jan 30 14:06:32 GMT 2017


Hi Joao,

The variant you describe overlaps one base of an intron and four bases of
an exon. This exon in the transcripts you describe does not form part of
the coding sequence, and is upstream of the start site, so forms part of
the 5' UTR, hence why the variant is annotated as a 5' UTR variant.

No part of the variant overlaps any of the protein-coding part of the
transcript, so it would not be accurate to describe it as a
coding_sequence_variant.

You can see if you look at the Ensembl browser that in some of the
alternate isoforms this exon *does* form part of the coding sequence, so
perhaps this is where the confusion arises?

http://dec2016.archive.ensembl.org/Homo_sapiens/Share/5c4d097313cd2f5cff94c020cbf62f2e?redirect=no;mobileredirect=no

Hope that's clear

Will McLaren
Ensembl Variation

On 29 January 2017 at 17:37, João Eiras <joao.eiras at gmail.com> wrote:

> Hi.
>
> I have the following variant from the COSMIC [1] vcf files
>
> chr1    2193996 .       ACCTGT  A
>
> I'm using VEP 87 and both the ensembl plus refseq merged reference.
>
> I get 16 annotations. Two of them caught my eye. But before getting
> into them, a quick breakdown of the variant.
>
> The chunk between 2193997 and 2193401 is deleted. This location is
> shared between many transcripts for the gene
> ENSG00000162585/FAAP20/C1orf86. This is a gene (and its transcripts)
> on the reverse strand. The variant cross over from the exon into the
> intron. The exon has the range [2193998, 2194133]. So, this variant
> deletes 4 bases in the end of the exon, plus one nucleotide of the
> splice donor site (note again, reverse strand).
>
> The JSON for the two annotations that seem bogus is:
> {"cdna_end": 221,
> "cdna_start": 221,
> "consequence_terms": ["splice_donor_variant","5_prime_UTR_variant"],
> "exons": [
> [2194688, 2194775],
> [2193998, 2194133], # <- variant starts here
> [2193639, 2193910],
> [2192845, 2192975],
> [2189713, 2189781],
> [2186838, 2187206],
> [2186004, 2186249],
> [2184460, 2185513]
> ],
> "gene_id": 199990,
> "gene_name": "'C1orf86",
> "impact": "HIGH",
> "refseq_match": "rseq_mrna_nonmatch,rseq_5p_mismatch",
> "source": "RefSeq",
> "strand": -1,
> "transcript_biotype": "protein_coding",
> "transcript_id": "NM_001282671.1"}
>
> {"cdna_end": 221,
> "cdna_start": 221,
> "consequence_terms": ["splice_donor_variant","5_prime_UTR_variant"],
> "exons": [
> [2194688, 2194775],
> [2193998, 2194133], # <- variant starts here
> [2193639, 2193910],
> [2189713, 2189781],
> [2186838, 2187206],
> [2186004, 2186249],
> [2184460, 2185513]
> ],
> "gene_id": 199990,
> "gene_name": "'C1orf86",
> "impact": "HIGH",
> "refseq_match": "rseq_mrna_nonmatch,rseq_5p_mismatch",
> "source": "RefSeq",
> "strand": -1,
> "transcript_biotype": "protein_coding",
> "transcript_id": "NM_001282672.1"}
>
>  The transcript_biotype and exons is something I added with a plugin
> ($tva->transcript->get_all_Exons()). The only different between these
> two transcripts is the extra exon (2192845, 2192975).
>
> As you can see, the variant should go from the 2nd exon into the
> intron, and is then followed by a 3rd exon and so forth. As such the
> consequence terms should be ["splice_donor_variant",
> "coding_sequence_variant"] as it is for all the other non-intronic
> annotations (in transcripts that preserve the affected exon). For
> instance, this is the annotation obtained over transcript
> NM_001256946.1.
>
> I see both these annotations have the flag rseq_5p_mismatch, and none
> of the other ones.
>
> Why do the consequence terms include "5_prime_UTR_variant" ? Doesn't
> seem to make sense given it's not over the 5'UTR and this is a simple
> protein_coding transcript.
>
> Thank you for your time.
>
> [1] https://cancer.sanger.ac.uk/cosmic/download
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170130/88d7a7f5/attachment.html>


More information about the Dev mailing list