[ensembl-dev] Getting antisense transcript of gene using sql / biomart

Jose M. Gonzalez jmg at sanger.ac.uk
Tue Dec 3 15:20:28 GMT 2013


On 03/12/13 13:44, mag wrote:
> Hi Henrikki,
>
> On 28/11/2013 19:34, Henrikki Almusa wrote:
>> On 2013-11-28 16:45, mag wrote:
>>> Hi Henrikki,
>>>
>>> Exons can be shared across transcripts and genes, hence they do not 
>>> have
>>> a specific gene object attached to them.
>>
>> True, they are not directly connected. However they are connected by 
>> couple of steps. Exon is connected to transcript using many to many 
>> relationship via exon_transcript table. Gene is connected to 
>> transcript with one to many relationship. I was hoping that perl API 
>> would allow backtracking this somehow. So from an exon to list of 
>> trancripts to list of genes.
>>
>> Method like $exon->get_all_Transcripts() is the missing link for me. 
>> There is a $transcript->get_Gene() for the second step.
>
> It is true, there is no method currently provided in the API to 
> address this issue, probably because we had not realised the need for 
> it existed.
> We will attempt to correct this in future releases of the API.
>

Hi Henrikki,

Just to point out that there is a different approach that will give you 
the same result:

  $transcript_adaptor->fetch_all_by_exon_stable_id

Cheers,
Jose


>>
>>> There is however a method called get_nearest_Gene available.
>>> This will return the closes overlapping gene in that region.
>>
>> I don't think that would always give me correct answer.
> The get_nearest_Gene method will return all the genes which overlap a 
> particular exon.
> In your example case, it returns both the gene you are looking for and 
> the one you started on.
> It you filter on strand, you should get only the gene you want.
>
>
> Hope that helps,
> Magali
>
>>
>> I could ask from overlapping exons to give me all overlapping 
>> transcripts. Then get list of exons in those transcripts. This would 
>> allow me to map the two exon list together using exon identifiers. 
>> However that feels a bit hacky trick to do.
>>
>> Alternatively I could query a list of all genes, transcripts and 
>> exons and do the mapping separately using the exons. But I'm hoping I 
>> wouldn't need to use extra scripts to do this.
>>
>>>
>>> Hope that helps,
>>> Magali
>>>
>>> On 28/11/2013 14:26, Henrikki Almusa wrote:
>>>> First off, sorry for a bit slow reply as I didn't notice the mail at
>>>> first. However I'm happy that it got a reply.
>>>>
>>>> On 2013-11-20 16:11, Denise Carvalho-Silva wrote:
>>>>> Dear Henrikki,
>>>>>
>>>>> Thanks for reporting the slightly discrepant definitions of antisense
>>>>> on the Vega and GENCODE websites.
>>>>>
>>>>> The GENCODE description is in fact speculative and it should be
>>>>> updated and consistent with the Vega definition.
>>>>>
>>>>> Our colleagues in the Havana team at the WTSI have agreed to change
>>>>> this. Both sources should give the following definition:
>>>>>
>>>>> Antisense. Has transcripts that overlap the genomic span (i.e. exon
>>>>> or introns) of a protein-coding locus on the opposite strand.
>>>>>
>>>>> I should clarify that Ensembl does not make any inferences on whether
>>>>> or not a given gene is regulated by its antisense locus and therefore
>>>>> we are not claiming that the antisense transcript regulates the
>>>>> expression of the gene on the opposite strand.
>>>>
>>>> Thanks for clearing that up.
>>>>
>>>>> You can get the coordinates of the antisense transcript and use them
>>>>> to get any overlapping genes on the same or opposite strand.
>>>>>
>>>>> If this is something that you would be interested in getting, we
>>>>> would strongly recommend that you use the APIs (REST or Perl) rather
>>>>> than SQL queries.
>>>>
>>>> We have a local copy of ensembl core and variation for human and using
>>>> SQL with that is easier. Thus I went with that. However looking at the
>>>> two, I decided to try with perl API and modified #2 option to suit the
>>>> aim better. I did hit a problem though.
>>>>
>>>> How do I get transcripts and gene for a given exon? The script that I
>>>> use is below. It currently, for testing, just outputs first gene
>>>> result. Does it have something to do with the project_to_slice()?
>>>>
>>>> #!/usr/bin/env perl
>>>>
>>>> use strict;
>>>> use warnings;
>>>>
>>>> use Bio::EnsEMBL::Registry;
>>>> Bio::EnsEMBL::Registry->load_registry_from_db(
>>>>   '-HOST' => 'ensembldb.ensembl.org',
>>>>   '-PORT' => 3306,
>>>>   '-USER' => 'anonymous',
>>>> );
>>>>
>>>> sub get_overlap {
>>>>   my @exons = @_;
>>>>   local $_;
>>>>   my %tmp = ();
>>>>   my @ret = ();
>>>>   foreach my $ex (@exons) {
>>>>     my @overlap_exons = get_overlapping_Exons($ex);
>>>>     my $antisense_stable_id = $ex->stable_id();
>>>>     foreach my $oex (@overlap_exons) {
>>>>       my $overlapping_stable_id = $oex->stable_id();
>>>>       $antisense_stable_id eq $overlapping_stable_id && next;
>>>>       exists($tmp{$overlapping_stable_id}) && next;
>>>>       $tmp{$overlapping_stable_id} = 1;
>>>>       push(@ret,$oex);
>>>>     }
>>>>   }
>>>>   return @ret;
>>>> }
>>>>
>>>> sub get_overlapping_Exons{
>>>>   my $slice = $_[0]->feature_Slice;
>>>>   return @{$slice->get_all_Exons()};
>>>> }
>>>>
>>>> my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
>>>> my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
>>>> my $overlap = 0;
>>>> my $last = 0;
>>>> foreach my $ag (@{$antisense_genes}) {
>>>>   my @exons = @{ $ag->get_all_Exons };
>>>>   my @overlap_exons = get_overlap(@exons);
>>>>   foreach my $ex (@overlap_exons) {
>>>>     my @print = (
>>>>       $ag->stable_id(), $ex->stable_id(), $ex->seq_region_name(),
>>>>       $ex->seq_region_start(), $ex->seq_region_end()
>>>>     );
>>>>     # get gene and transcript for this exon
>>>>     print join("\t", at print),"\n";
>>>>     $last=1;
>>>>   }
>>>>   $last && exit;
>>>> }
>>>>
>>>>> Please see below Andy Yates' suggestions:
>>>>>
>>>>> 1) Using the REST API:
>>>>> <snip>
>>>>>
>>>>> 2) Using the Perl API.
>>>>> This would be the most efficient way to go as you can get all the
>>>>> antisense genes and their coordinates at once.
>>>>> (If you have not used the Core Perl API, please have a look at
>>>>> http://www.ensembl.org/info/docs/api/core/index.html#api):
>>>>>
>>>>> #!/usr/bin/env perl
>>>>>
>>>>> use strict;
>>>>> use warnings;
>>>>>
>>>>> use Bio::EnsEMBL::Registry;
>>>>> Bio::EnsEMBL::Registry->load_registry_from_db(
>>>>> -HOST => 'ensembldb.ensembl.org',
>>>>> -PORT => 3306,
>>>>> -USER => 'anonymous',
>>>>> );
>>>>>
>>>>> my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
>>>>> my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
>>>>> my $overlap = 0;
>>>>> foreach my $ag (@{$antisense_genes}) {
>>>>>   my $antisense_stable_id = $ag->stable_id();
>>>>>   my $overlapping_genes = $ag->get_overlapping_Genes();
>>>>>   foreach my $og (@{$overlapping_genes}) {
>>>>>     my $overlapping_stable_id = $og->stable_id();
>>>>>     next if $antisense_stable_id eq $overlapping_stable_id;
>>>>>     printf "%s (%s) overlaps our antisense gene %s\n",
>>>>> $overlapping_stable_id, $og->feature_Slice()->name(),
>>>>> $antisense_stable_id;
>>>>>   }
>>>>> }
>>>>>
>>>>> Hope it helps.
>>>>>
>>>>> Regards,
>>>>> Denise
>>>>>
>>>>>
>>>>> On 8 Nov 2013, at 08:59, Henrikki Almusa wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'm delving into the antisense transcript world. I've found two
>>>>>> slightly different descriptions for this.
>>>>>>
>>>>>> First from
>>>>>> http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html
>>>>>>
>>>>>> Gene
>>>>>> Antisense. Has transcripts that overlap any coding exon of a locus
>>>>>> on the opposite strand, or for published instances of antisense
>>>>>> regulation of a coding gene.
>>>>>> Transcript
>>>>>> Antisense. Transcripts that overlap any coding exon of a locus on
>>>>>> the opposite strand, or for published instances of antisense
>>>>>> regulation of a coding gene.
>>>>>>
>>>>>> Second from:
>>>>>> http://www.gencodegenes.org/gencode_biotypes.html
>>>>>> antisense Transcript believed to be an antisense product used in the
>>>>>> regulation of the gene to which it belongs.
>>>>>>
>>>>>> Now the second implies that gene connected to antisense transcript
>>>>>> would be the gene that it blocks. But this conflicts which the other
>>>>>> pages description.
>>>>>>
>>>>>> So onto the main thing. I was asked to get a list of exons which
>>>>>> have antisense transcript. Output would have gene name, transcript
>>>>>> name, antisense transcript name and exon name and coordinates. I
>>>>>> wrote following sql to retrieve this, but it does not give me the
>>>>>> gene which is "blocked" but the "gene" of antisense transcript. Any
>>>>>> help in getting the blocked transcript as well.
>>>>>>
>>>>>> SELECT g.stable_id gene, tr.stable_id transcript, e.stable_id exon,
>>>>>> e.seq_region_start, e.seq_region_end, ext.rank
>>>>>> FROM transcript tr
>>>>>>   JOIN exon_transcript ext USING(transcript_id)
>>>>>>   JOIN exon e USING(exon_id)
>>>>>>   JOIN gene g USING(gene_id)
>>>>>>   JOIN seq_region sr ON e.seq_region_id=sr.seq_region_id
>>>>>>   JOIN coord_system coord USING (coord_system_id)
>>>>>> WHERE coord.version = 'GRCh37' AND tr.biotype = 'antisense'
>>>>>>
>>>>>> Thanks,
>>>>>> -- 
>>>>>> Henrikki Almusa
>>>>>> Bioinformatician
>>>>>> Institute for Molecular Medicine Finland FIMM
>>>>>>
>>>>>> -- 
>>>>>> Henrikki Almusa
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: 
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list