[ensembl-dev] GRCH37 biomart very slow

Wolf Beat Beat.Wolf at hefr.ch
Sun Sep 5 14:28:43 BST 2021


Hi there,

I have an issue when submitting requests to biomart on grch37. The requests take a very long time, to the point where its not really useable in some circumstances.

So one example is, getting the swissprot ID for all transcripts on a particular chromosome.

Here is the query:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query  virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >

        <Dataset name = "hsapiens_gene_ensembl" interface = "default" >
                <Filter name = "with_uniprotswissprot" excluded = "0"/>
                <Filter name = "chromosome_name" value = "16"/>
                <Attribute name = "uniprotswissprot" />
                <Attribute name = "ensembl_transcript_id" />
                <Attribute name = "ensembl_gene_id" />
                <Attribute name = "ensembl_gene_id_version" />
        </Dataset>
</Query>

here is the grch37 url for this query:
 http://grch37.ensembl.org/biomart/martservice?query=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22UTF-8%22%3F%3E%3C%21DOCTYPE+Query%3E%3CQuery++virtualSchemaName+%3D+%22default%22+formatter+%3D+%22TSV%22+header+%3D+%220%22+uniqueRows+%3D+%220%22+count+%3D+%22%22+datasetConfigVersion+%3D+%220.6%22+%3E%3CDataset+name+%3D+%22hsapiens_gene_ensembl%22+interface+%3D+%22default%22+%3E%3CFilter+name+%3D+%22chromosome_name%22+value+%3D+%2216%22%2F%3E%3CFilter+name+%3D+%22with_uniprotswissprot%22+excluded+%3D+%220%22%2F%3E%3CAttribute+name+%3D+%22ensembl_gene_id%22+%2F%3E%3CAttribute+name+%3D+%22ensembl_transcript_id%22+%2F%3E%3CAttribute+name+%3D+%22uniprotswissprot%22+%2F%3E%3C%2FDataset%3E%3C%2FQuery%3E

and here the one using grch38:
 http://ensembl.org/biomart/martservice?query=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22UTF-8%22%3F%3E%3C%21DOCTYPE+Query%3E%3CQuery++virtualSchemaName+%3D+%22default%22+formatter+%3D+%22TSV%22+header+%3D+%220%22+uniqueRows+%3D+%220%22+count+%3D+%22%22+datasetConfigVersion+%3D+%220.6%22+%3E%3CDataset+name+%3D+%22hsapiens_gene_ensembl%22+interface+%3D+%22default%22+%3E%3CFilter+name+%3D+%22chromosome_name%22+value+%3D+%2216%22%2F%3E%3CFilter+name+%3D+%22with_uniprotswissprot%22+excluded+%3D+%220%22%2F%3E%3CAttribute+name+%3D+%22ensembl_gene_id%22+%2F%3E%3CAttribute+name+%3D+%22ensembl_transcript_id%22+%2F%3E%3CAttribute+name+%3D+%22uniprotswissprot%22+%2F%3E%3C%2FDataset%3E%3C%2FQuery%3E

The difference is massive, 4 seconds (GRCH38) vs 130 seconds (GRCH37) (if it doesn't time out).

I'm doing something wrong? Is there any fix for this problem?

Kind regards

Beat Wolf




Dr Beat Wolf, PhD ▪ Assistant Professor ▪ Member of iCoSys Institute
University of Applied Sciences and Arts Western Switzerland ▪ Pérolles 80 ▪ CH-1700 Fribourg
Member of University of Applied Sciences of Western Switzerland
https://www.heia-fr.ch<https://eia-fr.ch/> ▪ <http://humantech.institute/> https://icosys.ch/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20210905/d9d8954f/attachment.html>


More information about the Dev mailing list