[ensembl-dev] Biomart downloading inconsistencies

Venetia Pliatsika Venetia.Pliatsika at jefferson.edu
Tue Feb 3 19:55:45 GMT 2015


Hello,

I'm sending because I found a discrepancy between the http://useast.ensembl.org/biomart/martview/ and  <http://www.ensembl.org/biomart/martservice> http://www.ensembl.org/biomart/<http://www.ensembl.org/biomart/martview>martview<http://www.ensembl.org/biomart/martview> web services and would like to confirm that the data I have are complete.

I submitted the following search on both the UK and the US East servers:

Dataset
Homo sapiens genes (GRCh38)
Filters
With protein(Genbank) ID(s): Excluded
Attributes
Ensembl Gene ID
Ensembl Transcript ID
Chromosome Name
Strand
Unspliced (Transcript)
Associated Gene Name

But I got different results. The difference is quite big as they include 67,482 and 78,820 sequences accordingly. As far as I checked the file from ensembl.org contained all entries that useast.ensembl.org had, including some more.

Here are some genes that weren't included in the useast.ensmbl.org but were present in ensembl.org:
ENSG00000001084
ENSG00000090989
ENSG00000104723
ENSG00000108846
ENSG00000109158
ENSG00000109171
ENSG00000109180
ENSG00000109182
ENSG00000109184
ENSG00000109452
ENSG00000118564
ENSG00000118579
ENSG00000120708
ENSG00000123415
ENSG00000129187
ENSG00000133835
ENSG00000138678
ENSG00000145216
ENSG00000145868
ENSG00000151466
ENSG00000163138
ENSG00000170365
ENSG00000180104
ENSG00000196353
ENSG00000213347
ENSG00000234492
ENSG00000245526
ENSG00000250328
ENSG00000278610

After noticing this I re-downloaded from useast.ensembl.org several times. Each time the file had a different size and none of the files had the same size as the ensembl.org one.

I would like to know whether the data I downloaded from ensembl.org include all results or if you suggest getting them again in a different way.

Thank you in advance,
Venetia

The information contained in this transmission contains privileged and confidential information. It is intended only for the use of the person named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

CAUTION: Intended recipients should NOT use email communication for emergent or urgent health care matters.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150203/181456a2/attachment.html>


More information about the Dev mailing list