[ensembl-dev] different figures between ensembl and biomart
Patrick Meidl
pmeidl at cemm.oeaw.ac.at
Fri Nov 19 10:10:09 GMT 2010
On Thu, Nov 18 2010, Andrea Edwards <edwardsa at cs.man.ac.uk> wrote:
> I have some code (see below) to get all of the exons in ensembl for
> cow database release 58. I got 225838 using this method. However a
> colleague of mine accessed all of the cow exons using biomart and
> obtained 257029.
[...]
> I can't imagine where an extra 25000 exons can come from unless (and I
> know this is speculation) the biomart script has duplicates for an
> exon when it appears in multiple transcripts whereas i only get unique
> exons.
BioMart gives you a denormalised view of the data, so you will get
duplicates in some situations due to the different database model.
just a guess: does cow have a PAR region? if so, I would expect that you
get genes (and therefore also exons) from this region twice from
BioMart (experts to correct me if I'm wrong), whereas your Ensembl
script will not duplicate them.
without knowing the script which was used to get the data from BioMart
one can only speculate though...
cheers
patrick
--
Patrick Meidl, Mag.
Bioinformatician
Ce-M-M-
Research Centre for Molecular Medicine
of the Austrian Academy of Science
Lazarettgasse 14 / AKH BT 25.3
Vienna, Austria
room 02.205
phone +43 1 40160 70016
email pmeidl at cemm.oeaw.ac.at
web http://www.cemm.at/
More information about the Dev
mailing list