[ensembl-dev] Fetch GERP scores of certain genes

Stephen Fitzgerald stephenf at ebi.ac.uk
Wed Aug 6 12:00:08 BST 2014


Hi Eva, we produce GERP scores for mammals, fish, and sauropsids.
I guess you are interested in scores for human (GRCh37).

See the perl script (and output) below as an example for fetching the 
scores from the 38 eutherian mammals alignment for the human region 
chr12:52696077-52696082. This is from ensembl release 75.

Please note that release 76 is due out Today and will have a new human 
reference assembly (GRCh38). The example script should work, but will give 
you different results.

If you wish to use GRCh37, you will need to specify the release number in 
the registry configuration, ie.

#Auto-configure the registry
Bio::EnsEMBL::Registry->load_registry_from_db(
         -host=>"ensembldb.ensembl.org", -user=>"anonymous",
         -port=>'5306', -db_version => 75);

Cheers,
Stephen.



##################### script begin

use strict;
use warnings;

use Bio::EnsEMBL::Registry;

#Auto-configure the registry
Bio::EnsEMBL::Registry->load_registry_from_db(
         -host=>"ensembldb.ensembl.org", -user=>"anonymous",
         -port=>'5306');


# Get the Compara Adaptor for MethodLinkSpeciesSet
my $method_link_species_set_adaptor =
     Bio::EnsEMBL::Registry->get_adaptor(
       "Multi", "compara", "MethodLinkSpeciesSet");

# Get the method_link_species_set for the conservation scores
my $cs_methodLinkSpeciseSet = $method_link_species_set_adaptor->
    fetch_by_method_link_type_species_set_name("GERP_CONSERVATION_SCORE", "mammals");

# Define the start and end positions for the alignment
my ($human_start, $human_end) = (52696077,52696082);

# Get the human *core* Adaptor for Slices
my $human_slice_adaptor =
     Bio::EnsEMBL::Registry->get_adaptor(
       "human", "core", "Slice");

# Get the slice corresponding to the region of interest
my $human_slice = $human_slice_adaptor->fetch_by_region(
     "chromosome", "12", $human_start, $human_end);

# Get the Compara Adaptor for Conservation Score
my $cs_adaptor =
     Bio::EnsEMBL::Registry->get_adaptor(
       "Multi", "compara", "ConservationScore");

# The fetch_all_by_MethodLinkSpeciesSet_Slice() returns a ref.
# to an array of Conservation score objects (human is the reference species)
my $conservationScores = $cs_adaptor->
     fetch_all_by_MethodLinkSpeciesSet_Slice(
         $cs_methodLinkSpeciseSet, $human_slice);

# print the conservation scores
foreach my $cs( @{ $conservationScores }) {
  print join(" ", $human_start++,$cs->diff_score), "\n";
}

##################### script end

Output:

52696077 -2.5
52696078 3.02999997138977
52696079 4.05999994277954
52696080 0.703000009059906
52696081 -0.223000004887581
52696082 2.83999991416931



On Wed, 6 Aug 2014, Eva Goncalves Serra wrote:

> Hi,
> 
> I am interested in getting GERP scores for bases within the coding sequence of certain genes. 
> Is there a straightforward way get this directly via the API? I.e., instead of having to map the GERP raw file against coordinates of
> genes myself.
> 
> Cheers,
> Eva 
> 
> 
>


More information about the Dev mailing list