[ensembl-dev] Getting all variations associated with gene

Graham Ritchie grsr at ebi.ac.uk
Thu Aug 25 09:47:00 BST 2011


Hi Gaurav, 

If by 'associated' you mean that the variations overlap or lie within 5kb up or downstream of your gene of interest, the following code should help you out. Essentially it fetches a gene using an external name you supply from the core database, and then fetches all transcript variations (which is how we model the overlap of a variation feature and a transcript) from the variation database that overlap any of the transcripts of this gene. The code also ensures there are no duplicates which can occur when the same variation lies in multiple transcripts, and prints out the name of each variation. Hopefully you can adapt this to requirements.

Cheers,

Graham

-----

use strict;
use warnings;

use Bio::EnsEMBL::Registry;

my $reg = 'Bio::EnsEMBL::Registry';

$reg->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
    -user => 'anonymous'
);

# fetch the adaptors we will need

my $ga = $reg->get_adaptor('human', 'core', 'gene');

my $tva = $reg->get_adaptor('human', 'variation', 'transcriptvariation');

# fetch the gene by the name passed on the command line

my $gene_name = $ARGV[0];

my $genes = $ga->fetch_all_by_external_name($gene_name);

warn "Found multiple genes for '$gene_name', using first" if @$genes > 1;

die "No gene found for '$gene_name'" unless @$genes > 0;

# fetch all transcript variations associated with the transcripts of this gene

my $tvs = $tva->fetch_all_by_Transcripts($genes->[0]->get_all_Transcripts);

# a variation feature may lie in multiple transcripts, so store the names
# in a hash to ensure uniqueness

my %associated_variations;

for my $tv (@$tvs) {
    $associated_variations{$tv->variation_feature->variation_name}++;
}

# print out the names of all variations found to overlap this gene

for my $v (sort keys %associated_variations) {
    print "$v\n";
}

 
On 25 Aug 2011, at 07:52, gaurav thareja wrote:

> Hi all,
>  
> I am trying to get all variations associated with gene name. As there is no direct way to do so, using perl API, I am using Perl API to get ensembl gene ID for a given common gene name  and then using Biomart filter as ensembl gene ID to get all variations.
>  
> 1) Is there any simpler way to do this?
> 2) Does this will provide complete list of variations or still I can miss some variations for that gene? 
>  
> 
> Regards
> 
> Gaurav Thareja
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list