[ensembl-dev] GRCH37 biomart very slow
Wolf Beat
Beat.Wolf at hefr.ch
Sun Sep 5 14:28:43 BST 2021
Hi there,
I have an issue when submitting requests to biomart on grch37. The requests take a very long time, to the point where its not really useable in some circumstances.
So one example is, getting the swissprot ID for all transcripts on a particular chromosome.
Here is the query:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >
<Dataset name = "hsapiens_gene_ensembl" interface = "default" >
<Filter name = "with_uniprotswissprot" excluded = "0"/>
<Filter name = "chromosome_name" value = "16"/>
<Attribute name = "uniprotswissprot" />
<Attribute name = "ensembl_transcript_id" />
<Attribute name = "ensembl_gene_id" />
<Attribute name = "ensembl_gene_id_version" />
</Dataset>
</Query>
here is the grch37 url for this query:
http://grch37.ensembl.org/biomart/martservice?query=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22UTF-8%22%3F%3E%3C%21DOCTYPE+Query%3E%3CQuery++virtualSchemaName+%3D+%22default%22+formatter+%3D+%22TSV%22+header+%3D+%220%22+uniqueRows+%3D+%220%22+count+%3D+%22%22+datasetConfigVersion+%3D+%220.6%22+%3E%3CDataset+name+%3D+%22hsapiens_gene_ensembl%22+interface+%3D+%22default%22+%3E%3CFilter+name+%3D+%22chromosome_name%22+value+%3D+%2216%22%2F%3E%3CFilter+name+%3D+%22with_uniprotswissprot%22+excluded+%3D+%220%22%2F%3E%3CAttribute+name+%3D+%22ensembl_gene_id%22+%2F%3E%3CAttribute+name+%3D+%22ensembl_transcript_id%22+%2F%3E%3CAttribute+name+%3D+%22uniprotswissprot%22+%2F%3E%3C%2FDataset%3E%3C%2FQuery%3E
and here the one using grch38:
http://ensembl.org/biomart/martservice?query=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22UTF-8%22%3F%3E%3C%21DOCTYPE+Query%3E%3CQuery++virtualSchemaName+%3D+%22default%22+formatter+%3D+%22TSV%22+header+%3D+%220%22+uniqueRows+%3D+%220%22+count+%3D+%22%22+datasetConfigVersion+%3D+%220.6%22+%3E%3CDataset+name+%3D+%22hsapiens_gene_ensembl%22+interface+%3D+%22default%22+%3E%3CFilter+name+%3D+%22chromosome_name%22+value+%3D+%2216%22%2F%3E%3CFilter+name+%3D+%22with_uniprotswissprot%22+excluded+%3D+%220%22%2F%3E%3CAttribute+name+%3D+%22ensembl_gene_id%22+%2F%3E%3CAttribute+name+%3D+%22ensembl_transcript_id%22+%2F%3E%3CAttribute+name+%3D+%22uniprotswissprot%22+%2F%3E%3C%2FDataset%3E%3C%2FQuery%3E
The difference is massive, 4 seconds (GRCH38) vs 130 seconds (GRCH37) (if it doesn't time out).
I'm doing something wrong? Is there any fix for this problem?
Kind regards
Beat Wolf
Dr Beat Wolf, PhD ▪ Assistant Professor ▪ Member of iCoSys Institute
University of Applied Sciences and Arts Western Switzerland ▪ Pérolles 80 ▪ CH-1700 Fribourg
Member of University of Applied Sciences of Western Switzerland
https://www.heia-fr.ch<https://eia-fr.ch/> ▪ <http://humantech.institute/> https://icosys.ch/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20210905/d9d8954f/attachment.html>
More information about the Dev
mailing list