[ensembl-dev] VEP perl script vs web
Sung Gong
gong.sungsam at gmail.com
Mon Jul 11 11:34:06 BST 2011
You're right - I should have checked more carefully.
My apologies.
-Sung
On 11 July 2011 10:42, Graham Ritchie <grsr at ebi.ac.uk> wrote:
> Hi Sung,
>
> On the website we only show the first 10 lines of output in the text view, if you click on the link to the "Data.txt" file above the results you can download the full set of results. Does that include all the results you are expecting?
>
> Cheers,
>
> Graham
>
>
> On 11 Jul 2011, at 09:57, Sung Gong wrote:
>
>> Hi Graham,
>>
>>
>> On 10 July 2011 22:40, Graham Ritchie <grsr at ebi.ac.uk> wrote:
>>> Hi Sung,
>>>
>>> Can you give us some more details about the settings you are using when running the two tools, in particular which input file format are you selecting? If I paste in your example input line to the web VEP and leave all settings to their defaults except unchecking "Get regulatory region consequences" I get 14 lines in the result, with the same consequences called for the same transcripts as your results from running the script version. If I run the script version using the example you provide as the only line in the input file, my output matches yours exactly.
>>>
>>
>> I just found that there are only 8 lines from the 'txt' output format
>> whereas 14 from the 'html'. Now, the 8 lines are different from those
>> that I've got from the previous run.
>> The 'html' output looks same with the result from the script.
>> Can you check 'txt' and 'html' results?
>>
>>
>>> I note though that in the output from the web version you supply, the location column implies the variant is located at 1:6264301-6264302, which does not match up with your input. I also note that the format of your data does not exactly match up with any of the input formats supported by the VEP. The script version is interpreting your data as ensembl format, but you do not supply a strand for the variant in the final column and so it is assumed to lie on the forward strand. From the odd results you get from the web version I wonder if you have specified that your data is in VCF format, which it doesn't look much like either. If you ensure that your input data is in one of the formats described here:
>>>
>>> http://www.ensembl.org/info/docs/variation/vep/index.html
>>>
>>> you may find you get more consistent results.
>>>
>>> WIth regards to your question about the genes found in the output, we call a consequence of UPSTREAM or DOWNSTREAM for variants that lie within 5kb up or down-stream of a transcript, and not just those that directly overlap a transcript. So, while the location in your example only directly overlaps the introns of the 2 transcripts of ENSG00000116251, this position is also upstream or downstream of the transcripts of the 2 other genes specified in the output.
>>>
>>
>> Thanks for this information.
>>
>>
>>> Cheers,
>>>
>>> Graham
>>>
>>> Ensembl Variation
>>>
>>>
>>> On 8 Jul 2011, at 18:11, Sung Gong wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks for developing VEP and make it available to the public.
>>>>
>>>> I found some discrepancies between the web version of VEP and the perl
>>>> script for the data shown below:
>>>> Chr Start End Allele
>>>> 1 6264301 6264301 A/G
>>>>
>>>>> From the web version, it returns eight entries which are shown below:
>>>>
>>>> Uploaded Variation Location Allele Gene Feature Feature
>>>> type Consequence Position in cDNA Position in CDS Position in
>>>> protein Amino acid change Codon change Co-located Variation Extra
>>>> 1_6264301_A/G 1:6264301-6264302 A ENSG00000116251 ENST00000462296 Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301-6264302 A ENSG00000158286 ENST00000377948 Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301-6264302 A ENSG00000116251 ENST00000471204 Transcript WITHIN_NON_CODING_GENE,INTRONIC - - - - - - -
>>>> 1_6264301_A/G 1:6264301-6264302 G ENSG00000158286 ENST00000485539 Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301-6264302 A ENSG00000158286 ENST00000466994 Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301-6264302 G ENSG00000116251 ENST00000234875 Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301-6264302 G ENSG00000158286 ENST00000377948 Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301-6264302 A ENSG00000116251 ENST00000234875 Transcript UPSTREAM - - - - - - -
>>>>
>>>> However, using the perl script, the same position is mapped onto 14
>>>> Ensembl transcript as shown below:
>>>>
>>>> 1_6264301_A/G 1:6264301 G ENSG00000116251 ENST00000465387
>>>> Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000226944 ENST00000455744
>>>> Transcript DOWNSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000116251 ENST00000234875
>>>> Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000158286 ENST00000377939
>>>> Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000116251 ENST00000497965
>>>> Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000158286 ENST00000485539
>>>> Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000158286 ENST00000484435
>>>> Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000116251 ENST00000480661
>>>> Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000158286 ENST00000377948
>>>> Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000116251 ENST00000462296
>>>> Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000158286 ENST00000466994
>>>> Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000116251 ENST00000471204
>>>> Transcript WITHIN_NON_CODING_GENE,INTRONIC - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000158286 ENST00000496676
>>>> Transcript UPSTREAM - - - - - - -
>>>> 1_6264301_A/G 1:6264301 G ENSG00000116251 ENST00000465335
>>>> Transcript WITHIN_NON_CODING_GENE,INTRONIC - - - - - - -
>>>>
>>>> Strangely, the Ensembl gene, ENSG00000226944, from the second entry
>>>> above is not even shown from the web version of VEP. Also, there are
>>>> two allele types (A and G) from the web version whereas G from the
>>>> perl script.
>>>> In addition, the position which I queried (chr1:6264301-6264301) only
>>>> belongs to the chromosome location of ENSG00000116251 amongst the
>>>> three Ensembl gene identifiers (ENSG00000116251, ENSG00000226944, and
>>>> ENSG00000158286) - see below:
>>>> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=1;q=ENSG00000116251
>>>> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=1;q=ENSG00000226944
>>>> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=2;q=ENSG00000158286
>>>>
>>>> Did I miss something?
>>>> Any help?
>>>>
>>>> Cheers,
>>>> Sung
>>>>
>>>> _______________________________________________
>>>> Dev mailing list Dev at ensembl.org
>>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>
>
More information about the Dev
mailing list