[ensembl-dev] where to find GERP scores? How to read EMF files?

Anja Thormann anja at ebi.ac.uk
Mon Jul 6 19:12:10 BST 2020

Dear Julie,

the GERP score on the variant summary page is retrieved from the BigWig file: ftp://ftp.ensembl.org/pub/current_compara/conservation_scores/103_mammals.gerp_conservation_score/gerp_conservation_scores.homo_sapiens.GRCh38.bw. We are using a parser which has been written for project internal purposes and is not as well documented for use outside of the project. However, I would be happy to send you a small perl script which makes use of Ensembl's APIs to retrieve GERP scores by location from the BigWig file. Alternatively, you can also use a BigWig parser of your choice.

You can also use this script <https://github.com/Ensembl/ensembl-compara/blob/release/100/scripts/examples/dna_getConservationScores.pl> which is provided by Ensembl's compara API for retrieving GERP scores. You just need to modify the region parameters at the top of the script. 

The EMF file format is explained in more detail here: ftp://ftp.ensembl.org/pub/release-100/emf/ensembl-compara/multiple_alignments/103_mammals.epo_low_coverage/README.emf

Best regards,

> On 6 Jul 2020, at 15:09, Julie Sullivan <julie.sullivan at gmail.com> wrote:
> http://www.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=1:31789941-31789941;v=rs1286699429;vdb=variation;vf=502464213 <http://www.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=1:31789941-31789941;v=rs1286699429;vdb=variation;vf=502464213>
> On that page is a gerp score GERP: 1.27. Where can I download that GERP score?
> I found this: https://www.ensembl.org/Help/Faq?id=221 <https://www.ensembl.org/Help/Faq?id=221>
> Which says to get the gerp scores from an EMF file. I am unfamiliar with this file type. (apparently emf is also a image file type, so google didn't help!)
> ftp://ftp.ensembl.org/pub/current_emf/ensembl-compara/multiple_alignments/103_mammals.epo_low_coverage/103_mammals.epo_low_coverage.1_10.emf.gz <ftp://ftp.ensembl.org/pub/current_emf/ensembl-compara/multiple_alignments/103_mammals.epo_low_coverage/103_mammals.epo_low_coverage.1_10.emf.gz>
> The EMF file looks like this:
> ##FORMAT (compara)
> ##DATE Fri Feb 21 15:26:12 2020
> ##RELEASE 100
> # Alignments: 103 eutherian mammals EPO-Low-Coverage
> # Region: Homo sapiens chromosome:GRCh38:10:1:133797422:1
> # File 1
> SCORE 103 eutherian mammals GERP Conservation Scores
> ID 18230001887772
> aAAaaaaaaAaAAGGGGA-AAA---A--AAA-AA---CC-A -1.03 <-- which position is this score for? 
> cCCccccccCcCCCCCCC-CCC---T--CCC-TC---TT-C -1.97
> 1. is this file the right place to get the GERP scores listed on the website?
> 2. Does each line represent a position? So "-1.03" would be the score for position chr10:1? Or is there a way to find out which positions are being described? Or am I reading this file wrong? Because position 1 doesn't make a lot of sense but I don't see another number.
> Thank you!
> Julie
> =====
> I also tried the Perl API and this file but was not successful:
> ftp://ftp.ensembl.org/pub/current_compara/conservation_scores/103_mammals.gerp_conservation_score/ <ftp://ftp.ensembl.org/pub/current_compara/conservation_scores/103_mammals.gerp_conservation_score/>gerp_conservation_scores.homo_sapiens.GRCh38.bw <ftp://ftp.ensembl.org/pub/current_compara/conservation_scores/103_mammals.gerp_conservation_score/gerp_conservation_scores.homo_sapiens.GRCh38.bw>_______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200706/94c4fd8e/attachment.html>

More information about the Dev mailing list