[ensembl-dev] getting gene exons and transcripts that overlap only the original slice

Alison Meynert alison.meynert at hgu.mrc.ac.uk
Wed Jan 12 09:50:06 GMT 2011


Hi Andrea,

I have the same issue with mapping lots of SNPs to Ensembl features. My 
solution, at least for exons or other reasonably small sets, is to 
invert the problem by iterating over all features and asking which of my 
SNPs overlap a given feature. Also, I have a local installation of some 
Ensembl databases, which really helps.

To do the comparison, I load all my SNPs into a MySQL database table 
indexed on the sequence region/chromosome and position fields, and query 
that table with given feature coordinates. Example Perl code below.

I'd be interested to hear if anyone has more efficient/faster solutions 
as this is something that seems good enough for my purposes at the 
moment but may not scale well.

Cheers,
Alison

# connect to the database
my $dbh = DBI->connect("DBI:mysql:database=$dbname;host=$hostname", 
$user, $pass) or die "Cannot connect to database\n$!";

# set up the select statement
my $sth = $dbh->prepare(SELECT * FROM snp WHERE seq_region = ? AND pos 
 >= ? AND pos <= ?);

# iterate over exons or other features
my $exons = $exon_adaptor->fetch_all ... ;
while (my $exon = shift @{ $exons })
{
     # select SNPs
     $sth->execute($exon->seq_region_name, $exon->start, $exon->end);
     while (my $ref = $sth->fetchrow_hashref())
     {
         # do something with SNPs
         my $snp_id = $ref->{'snp_id'};
         ...
     }
}

On 11/01/2011 21:14, Andrea Edwards wrote:
> Sorry, missed the last bit off my last message. This was what i did but
> was wondering if there was more efficient way as I have such a lot of data.
> I have just given an example for one locus as i'm sure you can imagine
> this code will be looped millions of times
>
> $slice = $slice_adaptor->fetch_by_region( 'chromosome', '9', 21816758,
> 21816758 );
> $exons = $slice->get_all_Exons;
> while (my $exon = shift @{$exons}) {
>
> my $gene = $gene_adaptor->fetch_by_exon_stable_id($exon->stable_id);
> #process gene
> my $transcripts = $transcript_adaptor->fetch_all_by_exon_stable_id
> ($exon->stable_id);
>
> while (my $transcript = shift @{$transcripts}) {
> #process transcript
> }
>
> }
>
>
>
>
> On 11/01/2011 19:38, Andrea Edwards wrote:
>> Hello
>>
>> i have this code below taken from the core api tutorial which gets me
>> all the exons and transcripts for the gene(s) that overlap a slice.
>>
>> I was hoping for an easy way to get those features of the gene that
>> only overlap the original one bp slice; this code gets all exons and
>> transcripts
>> associated with the gene
>>
>> I thought you might be able to call 'get_all_Object' methods with a
>> parameter which represents a region of sequence overlap but it seems not.
>> I also thought they might be filtered automatically based on the
>> underlying slice but it seems not.
>>
>> Naturally i can filter the features in the list based on their start
>> and end positions but for speed it would be easier not to retrieve
>> them all at.
>> I have a lot of data so speed is important. Please can you advise the
>> best way to do this.
>>
>> $slice = $slice_adaptor->fetch_by_region( 'chromosome', '9', 21816758,
>> 21816758 );
>>
>> my $genes = $slice->get_all_Genes();
>> while ( my $gene = shift @{$genes} ) {
>> my $gstring = feature2string($gene);
>> print "$gstring\n";
>>
>> my $transcripts = $gene->get_all_Transcripts();
>> while ( my $transcript = shift @{$transcripts} ) {
>> my $tstring = feature2string($transcript);
>> print "\t$tstring\n";
>>
>> foreach my $exon ( @{ $transcript->get_all_Exons() } ) {
>> my $estring = feature2string($exon);
>> print "\t\t$estring\n";
>> }
>> }
>> }
>>
>> print "done\n";
>>
>> Many thanks
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev

-- 
Alison Meynert
MRC Human Genetics Unit, Edinburgh
alison.meynert at hgu.mrc.ac.uk




More information about the Dev mailing list