[ensembl-dev] output I get for my input file different when I use the web VEP and command line VEP

Anja Thormann anja at ebi.ac.uk
Tue Oct 29 13:47:12 GMT 2019


Hi Dayana,

you could fetch the XML file from uniprot, e.g. https://www.uniprot.org/uniprot/Q96MT7.xml <https://www.uniprot.org/uniprot/Q96MT7.xml>

Then find the <dbReference> tag with type="ensembl" attribute. The section looks like this:

<dbReference type="Ensembl" id="ENST00000295868">
 <molecule id="Q96MT7-1"/>
 <property type="protein sequence ID" value="ENSP00000295868"/>
 <property type="gene ID" value="ENSG00000206530"/>
</dbReference>

A txt version is also available https://www.uniprot.org/uniprot/Q96MT7.txt <https://www.uniprot.org/uniprot/Q96MT7.txt>

Best wishes,
Anja

> On 29 Oct 2019, at 12:38, Dayana Yahalomi <dayana.yahalomi at weizmann.ac.il> wrote:
> 
> Hi Anja,
> This is great thanks for all the explanations and the examples.
> Now I understand.
> Specifically for me, and maybe others, I do need to know the isoform suffix. Is there a way to know/import for each Ensembl transcript the cross-reference in Uniprot/swissprot including the isoform suffix?
>  
> Thanks,
> Dayana
>  
> From: Dev <dev-bounces at ensembl.org> On Behalf Of Anja Thormann
> Sent: Tuesday, 29 October 2019 14:12
> To: Ensembl developers list <dev at ensembl.org>
> Subject: Re: [ensembl-dev] output I get for my input file different when I use the web VEP and command line VEP
>  
> Hi Dayana,
>  
> we import our protein cross-references from UniProt. As part of the import we do alignments to provide identity scores. They can be seen on the website, and are usually 100% or close to.
>  
> There can be two different Ensembel proteins sequences pointing to the same swissprot id.
>  
> For example:
>  
> https://www.uniprot.org/uniprot/Q96MT7 <https://www.uniprot.org/uniprot/Q96MT7>
>  
> You can see that UniProt associates both isoforms with one gene, but different proteins. In the source data we see:
>  
> DR   Ensembl; ENST00000295868; ENSP00000295868; ENSG00000206530. [Q96MT7-1]
> DR   Ensembl; ENST00000393845; ENSP00000377428; ENSG00000206530. [Q96MT7-2]
>  
> In Ensembl we see that the isoform suffix is not imported:
>  
> https://www.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000206530;r=3:113362865-113441610;t=ENST00000295868 <https://www.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000206530;r=3:113362865-113441610;t=ENST00000295868>
>  
>  
> Please let me know if you have any further questions.
>  
> Best wishes,
> Anja
>  
> 
> 
> On 29 Oct 2019, at 06:57, Dayana Yahalomi <dayana.yahalomi at weizmann.ac.il <mailto:dayana.yahalomi at weizmann.ac.il>> wrote:
>  
> Hi Anja,
> Thanks for the response and the information.
> I am so sorry!!! I just noticed that the problem is in my script while parsing the two files.
>  
> But it was very helpful to find out that the coordinates are for the ENSP.. and not swissprot. Thanks again for this information.
>  
> I hope it is O.K to bother you with one last question. isn’t the Ensembel protein supposed to be 100% match to the swissprot? Can there be two different Ensembel proteins sequences, two different ENSP.. id’s (maybe isoforms), pointing to the same swissprot id?
>  
> All the best,
> Dayana
> From: Dev <dev-bounces at ensembl.org <mailto:dev-bounces at ensembl.org>> On Behalf Of Anja Thormann
> Sent: Monday, 28 October 2019 16:19
> To: Ensembl developers list <dev at ensembl.org <mailto:dev at ensembl.org>>
> Subject: Re: [ensembl-dev] output I get for my input file different when I use the web VEP and command line VEP
>  
> Hi Dayana,
>  
> thank you for the input file and web tool command line. I annotated your input file with both the web tool and command line VEP option and get for both output files 5519 SWISSPROT annotations if I grep for SWISSPROT. This is not the most in depth comparison. Could you please point me to some examples where you see a difference between the two annotation options?
>  
> The coordinates are always given for the Ensembl protein (ENSP...) which means you cannot use the protein position to look up the position in the SwissProt protein.
>  
> However, you shouldn’t see any differences between running with the VEP command line tool or the web online tool.
>  
> Thanks,
> Anja
> 
> 
> 
> On 28 Oct 2019, at 11:18, Dayana Yahalomi <dayana.yahalomi at weizmann.ac.il <mailto:dayana.yahalomi at weizmann.ac.il>> wrote:
>  
> Hi Anja,
> This is the command from the web:
> ./vep --af --appris --biotype --buffer_size 500 --check_existing --distance 5000 --mane --polyphen b --pubmed --regulatory --sift b --species homo_sapiens --symbol --transcript_version --tsl --uniprot --cache --input_file [input_data] --output_file [output_file]
>  
> Attached is a vcf file.
>  
> Thanks,
> Dayana
>  
> From: Dev <dev-bounces at ensembl.org <mailto:dev-bounces at ensembl.org>> On Behalf Of Anja Thormann
> Sent: Monday, 28 October 2019 11:01
> To: Ensembl developers list <dev at ensembl.org <mailto:dev at ensembl.org>>
> Subject: Re: [ensembl-dev] output I get for my input file different when I use the web VEP and command line VEP
>  
> Dear Dayana,
>  
> could you please share which options you are using for the web tool? You can copy the command line equivalent from the job details section. Could you also please share an example for which you are seeing different annotations?
>  
> Thank you,
> Anja
> 
> 
> 
> 
> On 28 Oct 2019, at 07:06, Dayana Yahalomi <dayana.yahalomi at weizmann.ac.il <mailto:dayana.yahalomi at weizmann.ac.il>> wrote:
>  
> Dear Ensembl dev,
> I have installed the following vep program
> Versions:
>   ensembl              : 98.e98e194
>   ensembl-funcgen      : 98.36eef94
>   ensembl-io           : 98.052d23b
>   ensembl-variation    : 98.7b96c96
>   ensembl-vep          : 98.2
>  
> And When I run the following command (offline):
> ./vep --verbose --species homo_sapiens --assembly GRCh38 --offline --dir_cache=/bio/db/vep98 --input_file ex2.vcf --format vcf --output_file outputfile_uniprot.vep98.2_dayana2.vcf  --vcf --uniprot
>  
> I don’t get the same results as running the same file example in your web.
> I am interested in the protein changes and I look at the SwissProt flag. I used the protein name from SWISSPROT flag and go to the position indicated (Protein_position) and look if it is correct.
> In this case I get fewer protein changes and 30% are incorrect comparing to the web outfile where I get twice as much protein changes and only 10% are incorrect (this is probably due to different isoforms than the one in Swissprot).
> Do you know why I see these differences in the vcf outfile?
>  
> Thanks in advance,
> Dayana
>  
> _______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org <https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>  
> <ex2m.vcf>_______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org <https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>  
> _______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org <https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>  
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20191029/d3dd38b3/attachment.html>


More information about the Dev mailing list