[ensembl-dev] Question: why can't I find GERP score in emf file?
Stephen Fitzgerald
stephenf at ebi.ac.uk
Tue Oct 7 10:07:50 BST 2014
Hi Haiming, we no longer produce gerp scores for the alignments containing
the smaller set of mammals (16 species in e76, and 17 in e77).
We do produce gerp scores for the larger set of mammal alignments
(38 species in e76 and 39 in e77), which contain all of the (high
quality assembly) species present in the smaller set plus extra (lower
quality (2X) assembly) species. The EMF files for the epo_16_eutherian
mammals (e76) contain no extra information compared with the MAF files for
the same alignments, and have therefore not been generated for e77.
ftp://ftp.ensembl.org/pub/release-77/emf/ensembl-compara/
If you need the gerp scores for mammals (for e76) use the scores from the
38 mammals:
ftp://ftp.ensembl.org/pub/release-76/emf/ensembl-compara/epo_38_eutherian/
The scores are generated from the same blocks present in the 16 species,
but should give a more accurate value for the constraint at any position
due to the extra information (provided by 22 extra species) present in the
larger alignments. Regions under constraint (constrained elements) defined
by the 38 mammals will be very similar to those that would have been
defined for the 16 mammals, but should be more fine-grained in their
boundaries.
Hope that helps,
Stephen.
On Mon, 6 Oct 2014, Tang, Haiming wrote:
> Dear group,
>
> I tried to get the conservation scores (GERP scores) for nucleotides in whole genome alignments.
>
> So I followed "http://uswest.ensembl.org/Help/Faq?id=221" to download the emf files for multiple species alignments
> from ftp site: ftp://ftp.ensembl.org/pub/release-76/emf/ensembl-compara/epo_16_eutherian/
>
> But why can't I find GERP score in these emf files?
>
> Thanks
> Haiming
>
>
More information about the Dev
mailing list