[ensembl-dev] VEP perl script vs web

Graham Ritchie grsr at ebi.ac.uk
Sun Jul 10 22:40:50 BST 2011


Hi Sung,

Can you give us some more details about the settings you are using when running the two tools, in particular which input file format are you selecting? If I paste in your example input line to the web VEP and leave all settings to their defaults except unchecking "Get regulatory region consequences" I get 14 lines in the result, with the same consequences called for the same transcripts as your results from running the script version. If I run the script version using the example you provide as the only line in the input file, my output matches yours exactly.

I note though that in the output from the web version you supply, the location column implies the variant is located at 1:6264301-6264302, which does not match up with your input. I also note that the format of your data does not exactly match up with any of the input formats supported by the VEP. The script version is interpreting your data as ensembl format, but you do not supply a strand for the variant in the final column and so it is assumed to lie on the forward strand. From the odd results you get from the web version I wonder if you have specified that your data is in VCF format, which it doesn't look much like either. If you ensure that your input data is in one of the formats described here:

http://www.ensembl.org/info/docs/variation/vep/index.html

you may find you get more consistent results.

WIth regards to your question about the genes found in the output, we call a consequence of UPSTREAM or DOWNSTREAM for variants that lie within 5kb up or down-stream of a transcript, and not just those that directly overlap a transcript. So, while the location in your example only directly overlaps the introns of the 2 transcripts of ENSG00000116251, this position is also upstream or downstream of the transcripts of the 2 other genes specified in the output.

Cheers,

Graham

Ensembl Variation


On 8 Jul 2011, at 18:11, Sung Gong wrote:

> Hi,
> 
> Thanks for developing VEP and make it available to the public.
> 
> I found some discrepancies between the web version of VEP and the perl
> script for the data shown below:
> Chr Start End Allele
> 1       6264301 6264301 A/G
> 
>> From the web version, it returns eight entries which are shown below:
> 
> Uploaded Variation	Location	Allele	Gene	Feature	Feature
> type	Consequence	Position in cDNA	Position in CDS	Position in
> protein	Amino acid change	Codon change	Co-located Variation	Extra
> 1_6264301_A/G	1:6264301-6264302	A	ENSG00000116251	ENST00000462296	Transcript	UPSTREAM	-	-	-	-	-	-	-
> 1_6264301_A/G	1:6264301-6264302	A	ENSG00000158286	ENST00000377948	Transcript	UPSTREAM	-	-	-	-	-	-	-
> 1_6264301_A/G	1:6264301-6264302	A	ENSG00000116251	ENST00000471204	Transcript	WITHIN_NON_CODING_GENE,INTRONIC	-	-	-	-	-	-	-
> 1_6264301_A/G	1:6264301-6264302	G	ENSG00000158286	ENST00000485539	Transcript	UPSTREAM	-	-	-	-	-	-	-
> 1_6264301_A/G	1:6264301-6264302	A	ENSG00000158286	ENST00000466994	Transcript	UPSTREAM	-	-	-	-	-	-	-
> 1_6264301_A/G	1:6264301-6264302	G	ENSG00000116251	ENST00000234875	Transcript	UPSTREAM	-	-	-	-	-	-	-
> 1_6264301_A/G	1:6264301-6264302	G	ENSG00000158286	ENST00000377948	Transcript	UPSTREAM	-	-	-	-	-	-	-
> 1_6264301_A/G	1:6264301-6264302	A	ENSG00000116251	ENST00000234875	Transcript	UPSTREAM	-	-	-	-	-	-	-
> 
> However, using the perl script, the same position is mapped onto 14
> Ensembl transcript as shown below:
> 
> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000465387
> Transcript  UPSTREAM    -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000226944 ENST00000455744
> Transcript  DOWNSTREAM  -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000234875
> Transcript  UPSTREAM    -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000158286 ENST00000377939
> Transcript  UPSTREAM    -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000497965
> Transcript  UPSTREAM    -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000158286 ENST00000485539
> Transcript  UPSTREAM    -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000158286 ENST00000484435
> Transcript  UPSTREAM    -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000480661
> Transcript  UPSTREAM    -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000158286 ENST00000377948
> Transcript  UPSTREAM    -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000462296
> Transcript  UPSTREAM    -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000158286 ENST00000466994
> Transcript  UPSTREAM    -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000471204
> Transcript  WITHIN_NON_CODING_GENE,INTRONIC -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000158286 ENST00000496676
> Transcript  UPSTREAM    -   -   -   -   -   -   -
> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000465335
> Transcript  WITHIN_NON_CODING_GENE,INTRONIC -   -   -   -   -   -   -
> 
> Strangely, the Ensembl gene, ENSG00000226944, from the second entry
> above is not even shown from the web version of VEP. Also, there are
> two allele types (A and G) from the web version whereas G from the
> perl script.
> In addition, the position which I queried (chr1:6264301-6264301) only
> belongs to the chromosome location of ENSG00000116251 amongst the
> three Ensembl gene identifiers (ENSG00000116251, ENSG00000226944, and
> ENSG00000158286) - see below:
> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=1;q=ENSG00000116251
> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=1;q=ENSG00000226944
> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=2;q=ENSG00000158286
> 
> Did I miss something?
> Any help?
> 
> Cheers,
> Sung
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list