[ensembl-dev] getting gene exons and transcripts that overlap only the original slice

Andrea Edwards edwardsa at cs.man.ac.uk
Wed Jan 12 16:20:49 GMT 2011


Hello

For general questions you can try biostar.stackexchange.com and 
seqanswers (i've never used the this forum but i've heard of it)

For issues relating to ensembl you can email their mailing lists
http://www.ensembl.org/info/about/contact/mailing.html

Other people might know some others?

andrea

On 12/01/2011 16:10, shaabana abdo wrote:
> Dear collage
>
> I am a PhD.d student in university de Montreal Canada in biomedical 
> science program ,
>  I would like to know some scientific Internet site to discuss in 
> techniques and the problem of experiments.
> like , if i have technical problem i will propose my problem and then 
> i will get discussion with other, which they are expert more than me
>
> i wish you help me,
>
> with my best wishes
>
> shoby
>
> On Wed, Jan 12, 2011 at 9:50 AM, Andrea Edwards <edwardsa at cs.man.ac.uk 
> <mailto:edwardsa at cs.man.ac.uk>> wrote:
>
>     Hello
>
>     When i was looking at just exons I used to use exactly the same
>     approach as Alison. Now i want to annotate my snps to store their
>     relationships to the exons / genes / transcripts they affect I
>     have decided to approach the problem from the other side as it were.
>
>     Pablo, the idea about flattening the data once per release is
>     brilliant. I shall defininitely adopt that approach in the long
>     term. Would you be willing to post your script to the group? I'm
>     glad I asked now. I bet flattening the data takes hours off the
>     run time?
>
>     Many thanks :)
>
>
>
>
>
>     On 12/01/2011 10:36, Pablo Marin-Garcia wrote:
>
>         On Wed, 12 Jan 2011, Alison Meynert wrote:
>
>             Hi Andrea,
>
>             I have the same issue with mapping lots of SNPs to Ensembl
>             features. My solution, at least for exons or other
>             reasonably small sets, is to invert the problem by
>             iterating over all features and asking which of my SNPs
>             overlap a given feature. Also, I have a local installation
>             of some Ensembl databases, which really helps.
>
>             To do the comparison, I load all my SNPs into a MySQL
>             database table indexed on the sequence region/chromosome
>             and position fields, and query that table with given
>             feature coordinates. Example Perl code below.
>
>             I'd be interested to hear if anyone has more
>             efficient/faster solutions as this is something that seems
>             good enough for my purposes at the moment but may not
>             scale well.
>
>
>
>         Hello Alison,
>
>         I follow a similar approach when I want to describe all snps
>         in a gene, but also when my starting point, like Andrea, are
>         SNPs (usually a sparse set in my case) not genes I do the
>         inverse than you, because a SNP could lay in several exons or
>         genes so is more natural for me going the other way around.
>
>         For each ensembl release I run a script that loop all
>         genes-transcript-exons and flattens the data by exons and then
>         load the tab file to mysql into a table [exon_id exon_start
>         exon_end transcript_id gen_id gen_name] being the
>         exon_id-transcript_id the uniq key, and exon_start, exon_end,
>         and gene_id being indexed.
>
>         That way I can quickly find which genes, transcript, exons
>         overlap my SNP. And also I am able to find exons/genes n-bases
>         nearby my SNP so I can look for other LD SNPs in this genes.
>
>         So my scripts and SQL queries are basically like yours but
>         starting the problem from the other end (looking which exons
>         overlap the SNP).
>
>         I think that for a whole genome approach this is faster than
>         connecting directly to ensembl each time, and instantiating
>         all the time ensembl objects. The drawback is that this is a
>         specific solution, so you loose flexibility in order to gain
>         efficiency, but this is how life is. Finally, as you said,
>         hearing about other approaches is welcome.
>
>
>           -Pablo
>
>
>
>             Cheers,
>             Alison
>
>             # connect to the database
>             my $dbh =
>             DBI->connect("DBI:mysql:database=$dbname;host=$hostname",
>             $user, $pass) or die "Cannot connect to database\n$!";
>
>             # set up the select statement
>             my $sth = $dbh->prepare(SELECT * FROM snp WHERE seq_region
>             = ? AND pos >= ? AND pos <= ?);
>
>             # iterate over exons or other features
>             my $exons = $exon_adaptor->fetch_all ... ;
>             while (my $exon = shift @{ $exons })
>             {
>               # select SNPs
>               $sth->execute($exon->seq_region_name, $exon->start,
>             $exon->end);
>               while (my $ref = $sth->fetchrow_hashref())
>               {
>                   # do something with SNPs
>                   my $snp_id = $ref->{'snp_id'};
>                   ...
>               }
>             }
>
>             On 11/01/2011 21:14, Andrea Edwards wrote:
>
>                 Sorry, missed the last bit off my last message. This
>                 was what i did but
>                 was wondering if there was more efficient way as I
>                 have such a lot of data.
>                 I have just given an example for one locus as i'm sure
>                 you can imagine
>                 this code will be looped millions of times
>
>                 $slice = $slice_adaptor->fetch_by_region(
>                 'chromosome', '9', 21816758,
>                 21816758 );
>                 $exons = $slice->get_all_Exons;
>                 while (my $exon = shift @{$exons}) {
>
>                 my $gene =
>                 $gene_adaptor->fetch_by_exon_stable_id($exon->stable_id);
>                 #process gene
>                 my $transcripts =
>                 $transcript_adaptor->fetch_all_by_exon_stable_id
>                 ($exon->stable_id);
>
>                 while (my $transcript = shift @{$transcripts}) {
>                 #process transcript
>                 }
>
>                 }
>
>
>
>
>                 On 11/01/2011 19:38, Andrea Edwards wrote:
>
>                     Hello
>
>                     i have this code below taken from the core api
>                     tutorial which gets me
>                     all the exons and transcripts for the gene(s) that
>                     overlap a slice.
>
>                     I was hoping for an easy way to get those features
>                     of the gene that
>                     only overlap the original one bp slice; this code
>                     gets all exons and
>                     transcripts
>                     associated with the gene
>
>                     I thought you might be able to call
>                     'get_all_Object' methods with a
>                     parameter which represents a region of sequence
>                     overlap but it seems not.
>                     I also thought they might be filtered
>                     automatically based on the
>                     underlying slice but it seems not.
>
>                     Naturally i can filter the features in the list
>                     based on their start
>                     and end positions but for speed it would be easier
>                     not to retrieve
>                     them all at.
>                     I have a lot of data so speed is important. Please
>                     can you advise the
>                     best way to do this.
>
>                     $slice = $slice_adaptor->fetch_by_region(
>                     'chromosome', '9', 21816758,
>                     21816758 );
>
>                     my $genes = $slice->get_all_Genes();
>                     while ( my $gene = shift @{$genes} ) {
>                     my $gstring = feature2string($gene);
>                     print "$gstring\n";
>
>                     my $transcripts = $gene->get_all_Transcripts();
>                     while ( my $transcript = shift @{$transcripts} ) {
>                     my $tstring = feature2string($transcript);
>                     print "\t$tstring\n";
>
>                     foreach my $exon ( @{ $transcript->get_all_Exons()
>                     } ) {
>                     my $estring = feature2string($exon);
>                     print "\t\t$estring\n";
>                     }
>                     }
>                     }
>
>                     print "done\n";
>
>                     Many thanks
>
>                     _______________________________________________
>                     Dev mailing list
>                     Dev at ensembl.org <mailto:Dev at ensembl.org>
>                     http://lists.ensembl.org/mailman/listinfo/dev
>
>
>
>                 _______________________________________________
>                 Dev mailing list
>                 Dev at ensembl.org <mailto:Dev at ensembl.org>
>                 http://lists.ensembl.org/mailman/listinfo/dev
>
>
>             -- 
>             Alison Meynert
>             MRC Human Genetics Unit, Edinburgh
>             alison.meynert at hgu.mrc.ac.uk
>             <mailto:alison.meynert at hgu.mrc.ac.uk>
>
>             _______________________________________________
>             Dev mailing list
>             Dev at ensembl.org <mailto:Dev at ensembl.org>
>             http://lists.ensembl.org/mailman/listinfo/dev
>
>
>
>         -----
>
>          Pablo Marin-Garcia
>
>
>         _______________________________________________
>         Dev mailing list
>         Dev at ensembl.org <mailto:Dev at ensembl.org>
>         http://lists.ensembl.org/mailman/listinfo/dev
>
>
>
>     _______________________________________________
>     Dev mailing list
>     Dev at ensembl.org <mailto:Dev at ensembl.org>
>     http://lists.ensembl.org/mailman/listinfo/dev
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110112/f9ca4dcc/attachment.html>


More information about the Dev mailing list