[ensembl-dev] output I get for my input file different when I use the web VEP and command line VEP

Mahmut Uludag mahmut.uludag at kaust.edu.sa
Tue Oct 29 14:28:00 GMT 2019


I have tried writing an example UniProt api query [1] to retrieve 
Ensembl transcript references in Uniprot/swissprot

   - 
https://www.uniprot.org/uniprot/?query=reviewed:yes+AND+organism:9606+AND+insulin&columns=database(Ensembl)&format=tab

Regards,

--mahmut

[1] https://www.uniprot.org/help/api_queries

On 10/29/19 4:47 PM, Anja Thormann wrote:
> Hi Dayana,
>
> you could fetch the XML file from uniprot, e.g. 
> https://www.uniprot.org/uniprot/Q96MT7.xml
>
> Then find the <dbReference> tag with type="ensembl" attribute. The 
> section looks like this:
>
> <dbReference type="Ensembl" id="ENST00000295868">
>  <molecule id="Q96MT7-1"/>
>  <property type="protein sequence ID" value="ENSP00000295868"/>
>  <property type="gene ID" value="ENSG00000206530"/>
> </dbReference>
>
> A txt version is also available https://www.uniprot.org/uniprot/Q96MT7.txt
>
> Best wishes,
> Anja
>
>> On 29 Oct 2019, at 12:38, Dayana Yahalomi 
>> <dayana.yahalomi at weizmann.ac.il 
>> <mailto:dayana.yahalomi at weizmann.ac.il>> wrote:
>>
>> Hi Anja,
>> This is great thanks for all the explanations and the examples.
>> Now I understand.
>> Specifically for me, and maybe others, I do need to know the isoform 
>> suffix. Is there a way to know/import for each Ensembl transcript the 
>> cross-reference in Uniprot/swissprot including the isoform suffix?
>> Thanks,
>> Dayana
>> *From:*Dev <dev-bounces at ensembl.org 
>> <mailto:dev-bounces at ensembl.org>>*On Behalf Of*Anja Thormann
>> *Sent:*Tuesday, 29 October 2019 14:12
>> *To:*Ensembl developers list <dev at ensembl.org <mailto:dev at ensembl.org>>
>> *Subject:*Re: [ensembl-dev] output I get for my input file different 
>> when I use the web VEP and command line VEP
>> Hi Dayana,
>> we import our protein cross-references from UniProt. As part of the 
>> import we do alignments to provide identity scores. They can be seen 
>> on the website, and are usually 100% or close to.
>> There can be two different Ensembel proteins sequences pointing to 
>> the same swissprot id.
>> For example:
>> https://www.uniprot.org/uniprot/Q96MT7
>> You can see that UniProt associates both isoforms with one gene, but 
>> different proteins. In the source data we see:
>> DR   Ensembl; ENST00000295868; ENSP00000295868; ENSG00000206530. 
>> [Q96MT7-1]
>> DR   Ensembl; ENST00000393845; ENSP00000377428; ENSG00000206530. 
>> [Q96MT7-2]
>> In Ensembl we see that the isoform suffix is not imported:
>> https://www.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000206530;r=3:113362865-113441610;t=ENST00000295868
>> Please let me know if you have any further questions.
>> Best wishes,
>> Anja
>>
>>
>>     On 29 Oct 2019, at 06:57, Dayana Yahalomi
>>     <dayana.yahalomi at weizmann.ac.il
>>     <mailto:dayana.yahalomi at weizmann.ac.il>> wrote:
>>     Hi Anja,
>>     Thanks for the response and the information.
>>     I am so sorry!!! I just noticed that the problem is in my script
>>     while parsing the two files.
>>     But it was very helpful to find out that the coordinates are for
>>     the ENSP.. and not swissprot. Thanks again for this information.
>>     I hope it is O.K to bother you with one last question. isn’t the
>>     Ensembel protein supposed to be 100% match to the swissprot? Can
>>     there be two different Ensembel proteins sequences, two different
>>     ENSP.. id’s (maybe isoforms), pointing to the same swissprot id?
>>     All the best,
>>     Dayana
>>     *From:*Dev <dev-bounces at ensembl.org
>>     <mailto:dev-bounces at ensembl.org>>*On Behalf Of*Anja Thormann
>>     *Sent:*Monday, 28 October 2019 16:19
>>     *To:*Ensembl developers list <dev at ensembl.org
>>     <mailto:dev at ensembl.org>>
>>     *Subject:*Re: [ensembl-dev] output I get for my input file
>>     different when I use the web VEP and command line VEP
>>     Hi Dayana,
>>     thank you for the input file and web tool command line. I
>>     annotated your input file with both the web tool and command line
>>     VEP option and get for both output files 5519 SWISSPROT
>>     annotations if I grep for SWISSPROT. This is not the most in
>>     depth comparison. Could you please point me to some examples
>>     where you see a difference between the two annotation options?
>>     The coordinates are always given for the Ensembl protein
>>     (ENSP...) which means you cannot use the protein position to look
>>     up the position in the SwissProt protein.
>>     However, you shouldn’t see any differences between running with
>>     the VEP command line tool or the web online tool.
>>     Thanks,
>>     Anja
>>
>>
>>
>>         On 28 Oct 2019, at 11:18, Dayana Yahalomi
>>         <dayana.yahalomi at weizmann.ac.il
>>         <mailto:dayana.yahalomi at weizmann.ac.il>> wrote:
>>         Hi Anja,
>>         This is the command from the web:
>>
>>         ./vep --af --appris --biotype --buffer_size 500
>>         --check_existing --distance 5000 --mane --polyphen b --pubmed
>>         --regulatory --sift b --species homo_sapiens --symbol
>>         --transcript_version --tsl --uniprot --cache --input_file
>>         [input_data] --output_file [output_file]
>>
>>         Attached is a vcf file.
>>         Thanks,
>>         Dayana
>>         *From:*Dev <dev-bounces at ensembl.org
>>         <mailto:dev-bounces at ensembl.org>>*On Behalf Of*Anja Thormann
>>         *Sent:*Monday, 28 October 2019 11:01
>>         *To:*Ensembl developers list <dev at ensembl.org
>>         <mailto:dev at ensembl.org>>
>>         *Subject:*Re: [ensembl-dev] output I get for my input file
>>         different when I use the web VEP and command line VEP
>>         Dear Dayana,
>>         could you please share which options you are using for the
>>         web tool? You can copy the command line equivalent from the
>>         job details section. Could you also please share an example
>>         for which you are seeing different annotations?
>>         Thank you,
>>         Anja
>>
>>
>>
>>
>>             On 28 Oct 2019, at 07:06, Dayana Yahalomi
>>             <dayana.yahalomi at weizmann.ac.il
>>             <mailto:dayana.yahalomi at weizmann.ac.il>> wrote:
>>             Dear Ensembl dev,
>>             I have installed the following vep program
>>             Versions:
>>             ensembl              : 98.e98e194
>>             ensembl-funcgen      : 98.36eef94
>>             ensembl-io           : 98.052d23b
>>             ensembl-variation    : 98.7b96c96
>>             ensembl-vep          : 98.2
>>             And When I run the following command (offline):
>>             ./vep --verbose --species homo_sapiens --assembly GRCh38
>>             --offline --dir_cache=/bio/db/vep98 --input_file ex2.vcf
>>             --format vcf --output_file
>>             outputfile_uniprot.vep98.2_dayana2.vcf --vcf --uniprot
>>             I don’t get the same results as running the same file
>>             example in your web.
>>             I am interested in the protein changes and I look at the
>>             SwissProt flag. I used the protein name from SWISSPROT
>>             flag and go to the position indicated (Protein_position)
>>             and look if it is correct.
>>             In this case I get fewer protein changes and 30% are
>>             incorrect comparing to the web outfile where I get twice
>>             as much protein changes and only 10% are incorrect (this
>>             is probably due to different isoforms than the one in
>>             Swissprot).
>>             Do you know why I see these differences in the vcf outfile?
>>             Thanks in advance,
>>             Dayana
>>             _______________________________________________
>>             Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>             Posting guidelines and subscribe/unsubscribe
>>             info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>             Ensembl Blog:http://www.ensembl.info/
>>
>>         <ex2m.vcf>_______________________________________________
>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>         Posting guidelines and subscribe/unsubscribe
>>         info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>         Ensembl Blog:http://www.ensembl.info/
>>
>>     _______________________________________________
>>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe
>>     info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>     Ensembl Blog:http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info: 
>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20191029/3dabc56a/attachment.html>


More information about the Dev mailing list