[ensembl-dev] VEP workings

Will McLaren wm2 at ebi.ac.uk
Wed May 8 15:56:16 BST 2013


Hi Venu,

On 8 May 2013 14:51, Venugopal Valmeekam <vvalmeekam at yahoo.com> wrote:

> Hi,
> We are using VEP in our organization to build a comprehensive database of
> human mutation consequences.  I am using the refseq and ensembl cache files
> to run VEP.  I would really appreciate if you could answer the following
> questions:
> 1. Is VEP using the reference genome (vs mRNA sequence) to derive the
> amino acid sequence for a particular transcript? I do see several examples
> of refseq proteins, where the amino acid sequence from VEP interpretation
> is different compared to the refseq protein sequence.
>

The source of the RefSeq transcripts is the Ensembl otherfeatures database
- this transcripts in this DB consist of sequences that have been aligned
to the genome. Because of this, it is possible that the underlying
reference genome may differ from the original RefSeq sequence that was
aligned. The VEP uses the reference genome sequence (the Ensembl
transcripts are built on the reference sequence directly), so this is a
possible source of discrepancies that you are seeing.

See http://www.ensembl.org/info/docs/variation/vep/vep_script.html#refseqfor
a bit more detail.


> 2. There are several cases of RefSeq proteins with special amino acids
> such as selenocysteine or "U" coded by a stop codon "UGA".  I see that VEP
> makes accurate calls at these positions.  Is VEP somehow using the protein
> sequence to make these calls?
>

The Ensembl API accounts for special selenocysteines. I'm afraid I don't
know the details of how this works; if you resend a question to the list
without the VEP context someone in our core or genebuild team should be
able to answer your questions.


> 3. For some refseq proteins e.g.NM_020469 (NP_065202) VEP interpretation
> has pre-terminal stop codons.  These seem to correspond to indels in the
> reference genome.  However, I do not see such instances in the Ensembl
> collection.  could you please let me know if VEP is using different
> approach for these two collections?
>

This is probably explained by the same as in 1.


> 4. What cutoffs does VEP use to establish "downstream"/"upstream"
> variants?  for e.g. 2kb upstream/2kb downstream ?
>

We use 5kb both up and downstream.

The definitions of our consequence types come from the Sequence Ontology.
You can see them all here:

http://www.ensembl.org/info/docs/variation/predicted_data.html#consequences


> 5. Since mitochondrial codons for several amino acids is different
> compared to the nuclear codons, does VEP use the mitochondrial codon table
> to translate mitochondrial transcripts?
>

Yes, the VEP uses the correct codon table depending on the source of the
transcript.


> Thanks for providing such a wonderful resource.
>

Thanks for your questions, hope this has helped you.

Will McLaren
Ensembl Variation



> Venu
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130508/a1ab9b75/attachment.html>


More information about the Dev mailing list