[ensembl-dev] different figures between ensembl and biomart

Patrick Meidl pmeidl at cemm.oeaw.ac.at
Fri Nov 19 10:10:09 GMT 2010


On Thu, Nov 18 2010, Andrea Edwards <edwardsa at cs.man.ac.uk> wrote:

> I have some code (see below) to get all of the exons in ensembl for
> cow database release 58. I got 225838 using this method. However a
> colleague of mine accessed all of the cow  exons using biomart and
> obtained 257029.
[...]
> I can't imagine where an extra 25000 exons can come from unless (and I
> know this is speculation) the biomart script has duplicates for an
> exon when it appears in multiple transcripts whereas i only get unique
> exons.

BioMart gives you a denormalised view of the data, so you will get
duplicates in some situations due to the different database model.

just a guess: does cow have a PAR region? if so, I would expect that you
get genes (and therefore also exons) from this region twice from
BioMart (experts to correct me if I'm wrong), whereas your Ensembl
script will not duplicate them.

without knowing the script which was used to get the data from BioMart
one can only speculate though...

cheers

    patrick

-- 
Patrick Meidl, Mag.
Bioinformatician

Ce-M-M-
Research Centre for Molecular Medicine
of the Austrian Academy of Science

Lazarettgasse 14 / AKH BT 25.3
Vienna, Austria

room 02.205
phone +43 1 40160 70016
email pmeidl at cemm.oeaw.ac.at
web http://www.cemm.at/





More information about the Dev mailing list