[ensembl-dev] output I get for my input file different when I use the web VEP and command line VEP
Mahmut Uludag
mahmut.uludag at kaust.edu.sa
Tue Oct 29 14:28:00 GMT 2019
I have tried writing an example UniProt api query [1] to retrieve
Ensembl transcript references in Uniprot/swissprot
-
https://www.uniprot.org/uniprot/?query=reviewed:yes+AND+organism:9606+AND+insulin&columns=database(Ensembl)&format=tab
Regards,
--mahmut
[1] https://www.uniprot.org/help/api_queries
On 10/29/19 4:47 PM, Anja Thormann wrote:
> Hi Dayana,
>
> you could fetch the XML file from uniprot, e.g.
> https://www.uniprot.org/uniprot/Q96MT7.xml
>
> Then find the <dbReference> tag with type="ensembl" attribute. The
> section looks like this:
>
> <dbReference type="Ensembl" id="ENST00000295868">
> <molecule id="Q96MT7-1"/>
> <property type="protein sequence ID" value="ENSP00000295868"/>
> <property type="gene ID" value="ENSG00000206530"/>
> </dbReference>
>
> A txt version is also available https://www.uniprot.org/uniprot/Q96MT7.txt
>
> Best wishes,
> Anja
>
>> On 29 Oct 2019, at 12:38, Dayana Yahalomi
>> <dayana.yahalomi at weizmann.ac.il
>> <mailto:dayana.yahalomi at weizmann.ac.il>> wrote:
>>
>> Hi Anja,
>> This is great thanks for all the explanations and the examples.
>> Now I understand.
>> Specifically for me, and maybe others, I do need to know the isoform
>> suffix. Is there a way to know/import for each Ensembl transcript the
>> cross-reference in Uniprot/swissprot including the isoform suffix?
>> Thanks,
>> Dayana
>> *From:*Dev <dev-bounces at ensembl.org
>> <mailto:dev-bounces at ensembl.org>>*On Behalf Of*Anja Thormann
>> *Sent:*Tuesday, 29 October 2019 14:12
>> *To:*Ensembl developers list <dev at ensembl.org <mailto:dev at ensembl.org>>
>> *Subject:*Re: [ensembl-dev] output I get for my input file different
>> when I use the web VEP and command line VEP
>> Hi Dayana,
>> we import our protein cross-references from UniProt. As part of the
>> import we do alignments to provide identity scores. They can be seen
>> on the website, and are usually 100% or close to.
>> There can be two different Ensembel proteins sequences pointing to
>> the same swissprot id.
>> For example:
>> https://www.uniprot.org/uniprot/Q96MT7
>> You can see that UniProt associates both isoforms with one gene, but
>> different proteins. In the source data we see:
>> DR Ensembl; ENST00000295868; ENSP00000295868; ENSG00000206530.
>> [Q96MT7-1]
>> DR Ensembl; ENST00000393845; ENSP00000377428; ENSG00000206530.
>> [Q96MT7-2]
>> In Ensembl we see that the isoform suffix is not imported:
>> https://www.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000206530;r=3:113362865-113441610;t=ENST00000295868
>> Please let me know if you have any further questions.
>> Best wishes,
>> Anja
>>
>>
>> On 29 Oct 2019, at 06:57, Dayana Yahalomi
>> <dayana.yahalomi at weizmann.ac.il
>> <mailto:dayana.yahalomi at weizmann.ac.il>> wrote:
>> Hi Anja,
>> Thanks for the response and the information.
>> I am so sorry!!! I just noticed that the problem is in my script
>> while parsing the two files.
>> But it was very helpful to find out that the coordinates are for
>> the ENSP.. and not swissprot. Thanks again for this information.
>> I hope it is O.K to bother you with one last question. isn’t the
>> Ensembel protein supposed to be 100% match to the swissprot? Can
>> there be two different Ensembel proteins sequences, two different
>> ENSP.. id’s (maybe isoforms), pointing to the same swissprot id?
>> All the best,
>> Dayana
>> *From:*Dev <dev-bounces at ensembl.org
>> <mailto:dev-bounces at ensembl.org>>*On Behalf Of*Anja Thormann
>> *Sent:*Monday, 28 October 2019 16:19
>> *To:*Ensembl developers list <dev at ensembl.org
>> <mailto:dev at ensembl.org>>
>> *Subject:*Re: [ensembl-dev] output I get for my input file
>> different when I use the web VEP and command line VEP
>> Hi Dayana,
>> thank you for the input file and web tool command line. I
>> annotated your input file with both the web tool and command line
>> VEP option and get for both output files 5519 SWISSPROT
>> annotations if I grep for SWISSPROT. This is not the most in
>> depth comparison. Could you please point me to some examples
>> where you see a difference between the two annotation options?
>> The coordinates are always given for the Ensembl protein
>> (ENSP...) which means you cannot use the protein position to look
>> up the position in the SwissProt protein.
>> However, you shouldn’t see any differences between running with
>> the VEP command line tool or the web online tool.
>> Thanks,
>> Anja
>>
>>
>>
>> On 28 Oct 2019, at 11:18, Dayana Yahalomi
>> <dayana.yahalomi at weizmann.ac.il
>> <mailto:dayana.yahalomi at weizmann.ac.il>> wrote:
>> Hi Anja,
>> This is the command from the web:
>>
>> ./vep --af --appris --biotype --buffer_size 500
>> --check_existing --distance 5000 --mane --polyphen b --pubmed
>> --regulatory --sift b --species homo_sapiens --symbol
>> --transcript_version --tsl --uniprot --cache --input_file
>> [input_data] --output_file [output_file]
>>
>> Attached is a vcf file.
>> Thanks,
>> Dayana
>> *From:*Dev <dev-bounces at ensembl.org
>> <mailto:dev-bounces at ensembl.org>>*On Behalf Of*Anja Thormann
>> *Sent:*Monday, 28 October 2019 11:01
>> *To:*Ensembl developers list <dev at ensembl.org
>> <mailto:dev at ensembl.org>>
>> *Subject:*Re: [ensembl-dev] output I get for my input file
>> different when I use the web VEP and command line VEP
>> Dear Dayana,
>> could you please share which options you are using for the
>> web tool? You can copy the command line equivalent from the
>> job details section. Could you also please share an example
>> for which you are seeing different annotations?
>> Thank you,
>> Anja
>>
>>
>>
>>
>> On 28 Oct 2019, at 07:06, Dayana Yahalomi
>> <dayana.yahalomi at weizmann.ac.il
>> <mailto:dayana.yahalomi at weizmann.ac.il>> wrote:
>> Dear Ensembl dev,
>> I have installed the following vep program
>> Versions:
>> ensembl : 98.e98e194
>> ensembl-funcgen : 98.36eef94
>> ensembl-io : 98.052d23b
>> ensembl-variation : 98.7b96c96
>> ensembl-vep : 98.2
>> And When I run the following command (offline):
>> ./vep --verbose --species homo_sapiens --assembly GRCh38
>> --offline --dir_cache=/bio/db/vep98 --input_file ex2.vcf
>> --format vcf --output_file
>> outputfile_uniprot.vep98.2_dayana2.vcf --vcf --uniprot
>> I don’t get the same results as running the same file
>> example in your web.
>> I am interested in the protein changes and I look at the
>> SwissProt flag. I used the protein name from SWISSPROT
>> flag and go to the position indicated (Protein_position)
>> and look if it is correct.
>> In this case I get fewer protein changes and 30% are
>> incorrect comparing to the web outfile where I get twice
>> as much protein changes and only 10% are incorrect (this
>> is probably due to different isoforms than the one in
>> Swissprot).
>> Do you know why I see these differences in the vcf outfile?
>> Thanks in advance,
>> Dayana
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe
>> info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog:http://www.ensembl.info/
>>
>> <ex2m.vcf>_______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe
>> info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog:http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe
>> info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog:http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:
>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20191029/3dabc56a/attachment.html>
More information about the Dev
mailing list