[ensembl-dev] output I get for my input file different when I use the web VEP and command line VEP

Dayana Yahalomi dayana.yahalomi at weizmann.ac.il
Wed Oct 30 06:31:03 GMT 2019


Thanks.
It is great.
Best regards,
Dayana

From: Dev <dev-bounces at ensembl.org> On Behalf Of Mahmut Uludag
Sent: Tuesday, 29 October 2019 16:28
To: Ensembl developers list <dev at ensembl.org>
Subject: Re: [ensembl-dev] output I get for my input file different when I use the web VEP and command line VEP


I have tried writing an example UniProt api query [1] to retrieve Ensembl transcript references in Uniprot/swissprot

  - https://www.uniprot.org/uniprot/?query=reviewed:yes+AND+organism:9606+AND+insulin&columns=database(Ensembl)&format=tab

Regards,

--mahmut

[1] https://www.uniprot.org/help/api_queries
On 10/29/19 4:47 PM, Anja Thormann wrote:
Hi Dayana,

you could fetch the XML file from uniprot, e.g. https://www.uniprot.org/uniprot/Q96MT7.xml

Then find the <dbReference> tag with type="ensembl" attribute. The section looks like this:

<dbReference type="Ensembl" id="ENST00000295868">
 <molecule id="Q96MT7-1"/>
 <property type="protein sequence ID" value="ENSP00000295868"/>
 <property type="gene ID" value="ENSG00000206530"/>
</dbReference>

A txt version is also available https://www.uniprot.org/uniprot/Q96MT7.txt

Best wishes,
Anja


On 29 Oct 2019, at 12:38, Dayana Yahalomi <dayana.yahalomi at weizmann.ac.il<mailto:dayana.yahalomi at weizmann.ac.il>> wrote:

Hi Anja,
This is great thanks for all the explanations and the examples.
Now I understand.
Specifically for me, and maybe others, I do need to know the isoform suffix. Is there a way to know/import for each Ensembl transcript the cross-reference in Uniprot/swissprot including the isoform suffix?

Thanks,
Dayana

From: Dev <dev-bounces at ensembl.org<mailto:dev-bounces at ensembl.org>> On Behalf Of Anja Thormann
Sent: Tuesday, 29 October 2019 14:12
To: Ensembl developers list <dev at ensembl.org<mailto:dev at ensembl.org>>
Subject: Re: [ensembl-dev] output I get for my input file different when I use the web VEP and command line VEP

Hi Dayana,

we import our protein cross-references from UniProt. As part of the import we do alignments to provide identity scores. They can be seen on the website, and are usually 100% or close to.

There can be two different Ensembel proteins sequences pointing to the same swissprot id.

For example:

https://www.uniprot.org/uniprot/Q96MT7

You can see that UniProt associates both isoforms with one gene, but different proteins. In the source data we see:

DR   Ensembl; ENST00000295868; ENSP00000295868; ENSG00000206530. [Q96MT7-1]
DR   Ensembl; ENST00000393845; ENSP00000377428; ENSG00000206530. [Q96MT7-2]

In Ensembl we see that the isoform suffix is not imported:

https://www.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000206530;r=3:113362865-113441610;t=ENST00000295868


Please let me know if you have any further questions.

Best wishes,
Anja




On 29 Oct 2019, at 06:57, Dayana Yahalomi <dayana.yahalomi at weizmann.ac.il<mailto:dayana.yahalomi at weizmann.ac.il>> wrote:

Hi Anja,
Thanks for the response and the information.
I am so sorry!!! I just noticed that the problem is in my script while parsing the two files.

But it was very helpful to find out that the coordinates are for the ENSP.. and not swissprot. Thanks again for this information.

I hope it is O.K to bother you with one last question. isn’t the Ensembel protein supposed to be 100% match to the swissprot? Can there be two different Ensembel proteins sequences, two different ENSP.. id’s (maybe isoforms), pointing to the same swissprot id?

All the best,
Dayana
From: Dev <dev-bounces at ensembl.org<mailto:dev-bounces at ensembl.org>> On Behalf Of Anja Thormann
Sent: Monday, 28 October 2019 16:19
To: Ensembl developers list <dev at ensembl.org<mailto:dev at ensembl.org>>
Subject: Re: [ensembl-dev] output I get for my input file different when I use the web VEP and command line VEP

Hi Dayana,

thank you for the input file and web tool command line. I annotated your input file with both the web tool and command line VEP option and get for both output files 5519 SWISSPROT annotations if I grep for SWISSPROT. This is not the most in depth comparison. Could you please point me to some examples where you see a difference between the two annotation options?

The coordinates are always given for the Ensembl protein (ENSP...) which means you cannot use the protein position to look up the position in the SwissProt protein.

However, you shouldn’t see any differences between running with the VEP command line tool or the web online tool.

Thanks,
Anja




On 28 Oct 2019, at 11:18, Dayana Yahalomi <dayana.yahalomi at weizmann.ac.il<mailto:dayana.yahalomi at weizmann.ac.il>> wrote:

Hi Anja,
This is the command from the web:
./vep --af --appris --biotype --buffer_size 500 --check_existing --distance 5000 --mane --polyphen b --pubmed --regulatory --sift b --species homo_sapiens --symbol --transcript_version --tsl --uniprot --cache --input_file [input_data] --output_file [output_file]

Attached is a vcf file.

Thanks,
Dayana

From: Dev <dev-bounces at ensembl.org<mailto:dev-bounces at ensembl.org>> On Behalf Of Anja Thormann
Sent: Monday, 28 October 2019 11:01
To: Ensembl developers list <dev at ensembl.org<mailto:dev at ensembl.org>>
Subject: Re: [ensembl-dev] output I get for my input file different when I use the web VEP and command line VEP

Dear Dayana,

could you please share which options you are using for the web tool? You can copy the command line equivalent from the job details section. Could you also please share an example for which you are seeing different annotations?

Thank you,
Anja





On 28 Oct 2019, at 07:06, Dayana Yahalomi <dayana.yahalomi at weizmann.ac.il<mailto:dayana.yahalomi at weizmann.ac.il>> wrote:

Dear Ensembl dev,
I have installed the following vep program
Versions:
  ensembl              : 98.e98e194
  ensembl-funcgen      : 98.36eef94
  ensembl-io           : 98.052d23b
  ensembl-variation    : 98.7b96c96
  ensembl-vep          : 98.2

And When I run the following command (offline):
./vep --verbose --species homo_sapiens --assembly GRCh38 --offline --dir_cache=/bio/db/vep98 --input_file ex2.vcf --format vcf --output_file outputfile_uniprot.vep98.2_dayana2.vcf  --vcf --uniprot

I don’t get the same results as running the same file example in your web.
I am interested in the protein changes and I look at the SwissProt flag. I used the protein name from SWISSPROT flag and go to the position indicated (Protein_position) and look if it is correct.
In this case I get fewer protein changes and 30% are incorrect comparing to the web outfile where I get twice as much protein changes and only 10% are incorrect (this is probably due to different isoforms than the one in Swissprot).
Do you know why I see these differences in the vcf outfile?

Thanks in advance,
Dayana

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
Ensembl Blog: http://www.ensembl.info/

<ex2m.vcf>_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
Ensembl Blog: http://www.ensembl.info/

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
Ensembl Blog: http://www.ensembl.info/

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
Ensembl Blog: http://www.ensembl.info/




_______________________________________________

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>

Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org

Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20191030/c42396fc/attachment.html>


More information about the Dev mailing list