[ensembl-dev] Why is there a 3-fold size difference between Ensembl and RefSeq human transcriptome?
holgerbrandl at gmx.net
Mon Jul 9 16:29:21 BST 2012
I've compared the number of human transcripts in Ensembl vs Refseq. In Ensembl are ~195k transcripts of which 150k are tagged as KNOWN. In comparison the current RefSeq transcriptome for human contains just around 43k transcripts. What is causing this huge difference in transcriptome size? If it were just about different cutoffs in the annotation pipelines I would not expect such a dramatic difference.
When comparing gene numbers there seems to be also a 2-fold difference (54k in Ensembl; 23k in RefSeq). To some extent this seems to be due to more non-coding and more putative and predicted entries in Ensembl, but I'm still surprised about the huge difference.
Is there any way to filter (in addition to transcript-status) the Ensembl transcripts to get a more conservative set of transcripts (similar to the one from NCBI)?
Dr. Holger Brandl
Max Planck Institute of Molecular Cell Biology and Genetics
01307 Dresden, Germany
Fax: +49 351 210 2000
More information about the Dev