[ensembl-dev] VEP perl script vs web

Sung Gong gong.sungsam at gmail.com
Mon Jul 11 09:57:30 BST 2011


Hi Graham,


On 10 July 2011 22:40, Graham Ritchie <grsr at ebi.ac.uk> wrote:
> Hi Sung,
>
> Can you give us some more details about the settings you are using when running the two tools, in particular which input file format are you selecting? If I paste in your example input line to the web VEP and leave all settings to their defaults except unchecking "Get regulatory region consequences" I get 14 lines in the result, with the same consequences called for the same transcripts as your results from running the script version. If I run the script version using the example you provide as the only line in the input file, my output matches yours exactly.
>

I just found that there are only 8 lines from the 'txt' output format
whereas 14 from the 'html'. Now, the 8 lines are different from those
that I've got from the previous run.
The 'html' output looks same with the result from the script.
Can you check 'txt' and 'html' results?


> I note though that in the output from the web version you supply, the location column implies the variant is located at 1:6264301-6264302, which does not match up with your input. I also note that the format of your data does not exactly match up with any of the input formats supported by the VEP. The script version is interpreting your data as ensembl format, but you do not supply a strand for the variant in the final column and so it is assumed to lie on the forward strand. From the odd results you get from the web version I wonder if you have specified that your data is in VCF format, which it doesn't look much like either. If you ensure that your input data is in one of the formats described here:
>
> http://www.ensembl.org/info/docs/variation/vep/index.html
>
> you may find you get more consistent results.
>
> WIth regards to your question about the genes found in the output, we call a consequence of UPSTREAM or DOWNSTREAM for variants that lie within 5kb up or down-stream of a transcript, and not just those that directly overlap a transcript. So, while the location in your example only directly overlaps the introns of the 2 transcripts of ENSG00000116251, this position is also upstream or downstream of the transcripts of the 2 other genes specified in the output.
>

Thanks for this information.


> Cheers,
>
> Graham
>
> Ensembl Variation
>
>
> On 8 Jul 2011, at 18:11, Sung Gong wrote:
>
>> Hi,
>>
>> Thanks for developing VEP and make it available to the public.
>>
>> I found some discrepancies between the web version of VEP and the perl
>> script for the data shown below:
>> Chr Start End Allele
>> 1       6264301 6264301 A/G
>>
>>> From the web version, it returns eight entries which are shown below:
>>
>> Uploaded Variation    Location        Allele  Gene    Feature Feature
>> type  Consequence     Position in cDNA        Position in CDS Position in
>> protein       Amino acid change       Codon change    Co-located Variation    Extra
>> 1_6264301_A/G 1:6264301-6264302       A       ENSG00000116251 ENST00000462296 Transcript      UPSTREAM        -       -       -       -       -       -       -
>> 1_6264301_A/G 1:6264301-6264302       A       ENSG00000158286 ENST00000377948 Transcript      UPSTREAM        -       -       -       -       -       -       -
>> 1_6264301_A/G 1:6264301-6264302       A       ENSG00000116251 ENST00000471204 Transcript      WITHIN_NON_CODING_GENE,INTRONIC -       -       -       -       -       -       -
>> 1_6264301_A/G 1:6264301-6264302       G       ENSG00000158286 ENST00000485539 Transcript      UPSTREAM        -       -       -       -       -       -       -
>> 1_6264301_A/G 1:6264301-6264302       A       ENSG00000158286 ENST00000466994 Transcript      UPSTREAM        -       -       -       -       -       -       -
>> 1_6264301_A/G 1:6264301-6264302       G       ENSG00000116251 ENST00000234875 Transcript      UPSTREAM        -       -       -       -       -       -       -
>> 1_6264301_A/G 1:6264301-6264302       G       ENSG00000158286 ENST00000377948 Transcript      UPSTREAM        -       -       -       -       -       -       -
>> 1_6264301_A/G 1:6264301-6264302       A       ENSG00000116251 ENST00000234875 Transcript      UPSTREAM        -       -       -       -       -       -       -
>>
>> However, using the perl script, the same position is mapped onto 14
>> Ensembl transcript as shown below:
>>
>> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000465387
>> Transcript  UPSTREAM    -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000226944 ENST00000455744
>> Transcript  DOWNSTREAM  -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000234875
>> Transcript  UPSTREAM    -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000158286 ENST00000377939
>> Transcript  UPSTREAM    -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000497965
>> Transcript  UPSTREAM    -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000158286 ENST00000485539
>> Transcript  UPSTREAM    -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000158286 ENST00000484435
>> Transcript  UPSTREAM    -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000480661
>> Transcript  UPSTREAM    -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000158286 ENST00000377948
>> Transcript  UPSTREAM    -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000462296
>> Transcript  UPSTREAM    -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000158286 ENST00000466994
>> Transcript  UPSTREAM    -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000471204
>> Transcript  WITHIN_NON_CODING_GENE,INTRONIC -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000158286 ENST00000496676
>> Transcript  UPSTREAM    -   -   -   -   -   -   -
>> 1_6264301_A/G   1:6264301   G   ENSG00000116251 ENST00000465335
>> Transcript  WITHIN_NON_CODING_GENE,INTRONIC -   -   -   -   -   -   -
>>
>> Strangely, the Ensembl gene, ENSG00000226944, from the second entry
>> above is not even shown from the web version of VEP. Also, there are
>> two allele types (A and G) from the web version whereas G from the
>> perl script.
>> In addition, the position which I queried (chr1:6264301-6264301) only
>> belongs to the chromosome location of ENSG00000116251 amongst the
>> three Ensembl gene identifiers (ENSG00000116251, ENSG00000226944, and
>> ENSG00000158286) - see below:
>> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=1;q=ENSG00000116251
>> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=1;q=ENSG00000226944
>> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=2;q=ENSG00000158286
>>
>> Did I miss something?
>> Any help?
>>
>> Cheers,
>> Sung
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>




More information about the Dev mailing list