[ensembl-dev] Question: why can't I find GERP score in emf file?

Stephen Fitzgerald stephenf at ebi.ac.uk
Tue Oct 7 10:07:50 BST 2014


Hi Haiming, we no longer produce gerp scores for the alignments containing 
the smaller set of mammals (16 species in e76, and 17 in e77).
We do produce gerp scores for the larger set of mammal alignments 
(38 species in e76 and 39 in e77), which contain all of the (high 
quality assembly) species present in the smaller set plus extra (lower 
quality (2X) assembly) species. The EMF files for the epo_16_eutherian 
mammals (e76) contain no extra information compared with the MAF files for 
the same alignments, and have therefore not been generated for e77.

ftp://ftp.ensembl.org/pub/release-77/emf/ensembl-compara/

If you need the gerp scores for mammals (for e76) use the scores from the 
38 mammals:

ftp://ftp.ensembl.org/pub/release-76/emf/ensembl-compara/epo_38_eutherian/

The scores are generated from the same blocks present in the 16 species, 
but should give a more accurate value for the constraint at any position 
due to the extra information (provided by 22 extra species) present in the 
larger alignments. Regions under constraint (constrained elements) defined 
by the 38 mammals will be very similar to those that would have been 
defined for the 16 mammals, but should be more fine-grained in their 
boundaries.

Hope that helps,
Stephen.




On Mon, 6 Oct 2014, Tang, Haiming wrote:

> Dear group,
> 
> I tried to get the conservation scores (GERP scores) for nucleotides in whole genome alignments.
> 
> So I  followed "http://uswest.ensembl.org/Help/Faq?id=221" to download the emf files for multiple species alignments
> from ftp site: ftp://ftp.ensembl.org/pub/release-76/emf/ensembl-compara/epo_16_eutherian/
> 
> But why can't I find GERP score in these emf files?
> 
> Thanks
> Haiming
> 
>


More information about the Dev mailing list