[ensembl-dev] Getting antisense transcript of gene using sql / biomart

Henrikki Almusa henrikki.almusa at helsinki.fi
Thu Nov 28 19:34:04 GMT 2013


On 2013-11-28 16:45, mag wrote:
> Hi Henrikki,
>
> Exons can be shared across transcripts and genes, hence they do not have
> a specific gene object attached to them.

True, they are not directly connected. However they are connected by 
couple of steps. Exon is connected to transcript using many to many 
relationship via exon_transcript table. Gene is connected to transcript 
with one to many relationship. I was hoping that perl API would allow 
backtracking this somehow. So from an exon to list of trancripts to list 
of genes.

Method like $exon->get_all_Transcripts() is the missing link for me. 
There is a $transcript->get_Gene() for the second step.

> There is however a method called get_nearest_Gene available.
> This will return the closes overlapping gene in that region.

I don't think that would always give me correct answer.

I could ask from overlapping exons to give me all overlapping 
transcripts. Then get list of exons in those transcripts. This would 
allow me to map the two exon list together using exon identifiers. 
However that feels a bit hacky trick to do.

Alternatively I could query a list of all genes, transcripts and exons 
and do the mapping separately using the exons. But I'm hoping I wouldn't 
need to use extra scripts to do this.

>
> Hope that helps,
> Magali
>
> On 28/11/2013 14:26, Henrikki Almusa wrote:
>> First off, sorry for a bit slow reply as I didn't notice the mail at
>> first. However I'm happy that it got a reply.
>>
>> On 2013-11-20 16:11, Denise Carvalho-Silva wrote:
>>> Dear Henrikki,
>>>
>>> Thanks for reporting the slightly discrepant definitions of antisense
>>> on the Vega and GENCODE websites.
>>>
>>> The GENCODE description is in fact speculative and it should be
>>> updated and consistent with the Vega definition.
>>>
>>> Our colleagues in the Havana team at the WTSI have agreed to change
>>> this. Both sources should give the following definition:
>>>
>>> Antisense. Has transcripts that overlap the genomic span (i.e. exon
>>> or introns) of a protein-coding locus on the opposite strand.
>>>
>>> I should clarify that Ensembl does not make any inferences on whether
>>> or not a given gene is regulated by its antisense locus and therefore
>>> we are not claiming that the antisense transcript regulates the
>>> expression of the gene on the opposite strand.
>>
>> Thanks for clearing that up.
>>
>>> You can get the coordinates of the antisense transcript and use them
>>> to get any overlapping genes on the same or opposite strand.
>>>
>>> If this is something that you would be interested in getting, we
>>> would strongly recommend that you use the APIs (REST or Perl) rather
>>> than SQL queries.
>>
>> We have a local copy of ensembl core and variation for human and using
>> SQL with that is easier. Thus I went with that. However looking at the
>> two, I decided to try with perl API and modified #2 option to suit the
>> aim better. I did hit a problem though.
>>
>> How do I get transcripts and gene for a given exon? The script that I
>> use is below. It currently, for testing, just outputs first gene
>> result. Does it have something to do with the project_to_slice()?
>>
>> #!/usr/bin/env perl
>>
>> use strict;
>> use warnings;
>>
>> use Bio::EnsEMBL::Registry;
>> Bio::EnsEMBL::Registry->load_registry_from_db(
>>   '-HOST' => 'ensembldb.ensembl.org',
>>   '-PORT' => 3306,
>>   '-USER' => 'anonymous',
>> );
>>
>> sub get_overlap {
>>   my @exons = @_;
>>   local $_;
>>   my %tmp = ();
>>   my @ret = ();
>>   foreach my $ex (@exons) {
>>     my @overlap_exons = get_overlapping_Exons($ex);
>>     my $antisense_stable_id = $ex->stable_id();
>>     foreach my $oex (@overlap_exons) {
>>       my $overlapping_stable_id = $oex->stable_id();
>>       $antisense_stable_id eq $overlapping_stable_id && next;
>>       exists($tmp{$overlapping_stable_id}) && next;
>>       $tmp{$overlapping_stable_id} = 1;
>>       push(@ret,$oex);
>>     }
>>   }
>>   return @ret;
>> }
>>
>> sub get_overlapping_Exons{
>>   my $slice = $_[0]->feature_Slice;
>>   return @{$slice->get_all_Exons()};
>> }
>>
>> my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
>> my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
>> my $overlap = 0;
>> my $last = 0;
>> foreach my $ag (@{$antisense_genes}) {
>>   my @exons = @{ $ag->get_all_Exons };
>>   my @overlap_exons = get_overlap(@exons);
>>   foreach my $ex (@overlap_exons) {
>>     my @print = (
>>       $ag->stable_id(), $ex->stable_id(), $ex->seq_region_name(),
>>       $ex->seq_region_start(), $ex->seq_region_end()
>>     );
>>     # get gene and transcript for this exon
>>     print join("\t", at print),"\n";
>>     $last=1;
>>   }
>>   $last && exit;
>> }
>>
>>> Please see below Andy Yates' suggestions:
>>>
>>> 1) Using the REST API:
>>> <snip>
>>>
>>> 2) Using the Perl API.
>>> This would be the most efficient way to go as you can get all the
>>> antisense genes and their coordinates at once.
>>> (If you have not used the Core Perl API, please have a look at
>>> http://www.ensembl.org/info/docs/api/core/index.html#api):
>>>
>>> #!/usr/bin/env perl
>>>
>>> use strict;
>>> use warnings;
>>>
>>> use Bio::EnsEMBL::Registry;
>>> Bio::EnsEMBL::Registry->load_registry_from_db(
>>> -HOST => 'ensembldb.ensembl.org',
>>> -PORT => 3306,
>>> -USER => 'anonymous',
>>> );
>>>
>>> my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
>>> my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
>>> my $overlap = 0;
>>> foreach my $ag (@{$antisense_genes}) {
>>>   my $antisense_stable_id = $ag->stable_id();
>>>   my $overlapping_genes = $ag->get_overlapping_Genes();
>>>   foreach my $og (@{$overlapping_genes}) {
>>>     my $overlapping_stable_id = $og->stable_id();
>>>     next if $antisense_stable_id eq $overlapping_stable_id;
>>>     printf "%s (%s) overlaps our antisense gene %s\n",
>>> $overlapping_stable_id, $og->feature_Slice()->name(),
>>> $antisense_stable_id;
>>>   }
>>> }
>>>
>>> Hope it helps.
>>>
>>> Regards,
>>> Denise
>>>
>>>
>>> On 8 Nov 2013, at 08:59, Henrikki Almusa wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm delving into the antisense transcript world. I've found two
>>>> slightly different descriptions for this.
>>>>
>>>> First from
>>>> http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html
>>>>
>>>> Gene
>>>> Antisense. Has transcripts that overlap any coding exon of a locus
>>>> on the opposite strand, or for published instances of antisense
>>>> regulation of a coding gene.
>>>> Transcript
>>>> Antisense. Transcripts that overlap any coding exon of a locus on
>>>> the opposite strand, or for published instances of antisense
>>>> regulation of a coding gene.
>>>>
>>>> Second from:
>>>> http://www.gencodegenes.org/gencode_biotypes.html
>>>> antisense Transcript believed to be an antisense product used in the
>>>> regulation of the gene to which it belongs.
>>>>
>>>> Now the second implies that gene connected to antisense transcript
>>>> would be the gene that it blocks. But this conflicts which the other
>>>> pages description.
>>>>
>>>> So onto the main thing. I was asked to get a list of exons which
>>>> have antisense transcript. Output would have gene name, transcript
>>>> name, antisense transcript name and exon name and coordinates. I
>>>> wrote following sql to retrieve this, but it does not give me the
>>>> gene which is "blocked" but the "gene" of antisense transcript. Any
>>>> help in getting the blocked transcript as well.
>>>>
>>>> SELECT g.stable_id gene, tr.stable_id transcript, e.stable_id exon,
>>>> e.seq_region_start, e.seq_region_end, ext.rank
>>>> FROM transcript tr
>>>>   JOIN exon_transcript ext USING(transcript_id)
>>>>   JOIN exon e USING(exon_id)
>>>>   JOIN gene g USING(gene_id)
>>>>   JOIN seq_region sr ON e.seq_region_id=sr.seq_region_id
>>>>   JOIN coord_system coord USING (coord_system_id)
>>>> WHERE coord.version = 'GRCh37' AND tr.biotype = 'antisense'
>>>>
>>>> Thanks,
>>>> --
>>>> Henrikki Almusa
>>>> Bioinformatician
>>>> Institute for Molecular Medicine Finland FIMM
>>>>
>>>> --
>>>> Henrikki Almusa
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>
>>
>


-- 
Henrikki Almusa




More information about the Dev mailing list