[ensembl-dev] Getting antisense transcript of gene using sql / biomart

mag mr6 at ebi.ac.uk
Thu Nov 28 14:45:25 GMT 2013


Hi Henrikki,

Exons can be shared across transcripts and genes, hence they do not have 
a specific gene object attached to them.

There is however a method called get_nearest_Gene available.
This will return the closes overlapping gene in that region.


Hope that helps,
Magali

On 28/11/2013 14:26, Henrikki Almusa wrote:
> First off, sorry for a bit slow reply as I didn't notice the mail at 
> first. However I'm happy that it got a reply.
>
> On 2013-11-20 16:11, Denise Carvalho-Silva wrote:
>> Dear Henrikki,
>>
>> Thanks for reporting the slightly discrepant definitions of antisense 
>> on the Vega and GENCODE websites.
>>
>> The GENCODE description is in fact speculative and it should be 
>> updated and consistent with the Vega definition.
>>
>> Our colleagues in the Havana team at the WTSI have agreed to change 
>> this. Both sources should give the following definition:
>>
>> Antisense. Has transcripts that overlap the genomic span (i.e. exon 
>> or introns) of a protein-coding locus on the opposite strand.
>>
>> I should clarify that Ensembl does not make any inferences on whether 
>> or not a given gene is regulated by its antisense locus and therefore 
>> we are not claiming that the antisense transcript regulates the 
>> expression of the gene on the opposite strand.
>
> Thanks for clearing that up.
>
>> You can get the coordinates of the antisense transcript and use them 
>> to get any overlapping genes on the same or opposite strand.
>>
>> If this is something that you would be interested in getting, we 
>> would strongly recommend that you use the APIs (REST or Perl) rather 
>> than SQL queries.
>
> We have a local copy of ensembl core and variation for human and using 
> SQL with that is easier. Thus I went with that. However looking at the 
> two, I decided to try with perl API and modified #2 option to suit the 
> aim better. I did hit a problem though.
>
> How do I get transcripts and gene for a given exon? The script that I 
> use is below. It currently, for testing, just outputs first gene 
> result. Does it have something to do with the project_to_slice()?
>
> #!/usr/bin/env perl
>
> use strict;
> use warnings;
>
> use Bio::EnsEMBL::Registry;
> Bio::EnsEMBL::Registry->load_registry_from_db(
>   '-HOST' => 'ensembldb.ensembl.org',
>   '-PORT' => 3306,
>   '-USER' => 'anonymous',
> );
>
> sub get_overlap {
>   my @exons = @_;
>   local $_;
>   my %tmp = ();
>   my @ret = ();
>   foreach my $ex (@exons) {
>     my @overlap_exons = get_overlapping_Exons($ex);
>     my $antisense_stable_id = $ex->stable_id();
>     foreach my $oex (@overlap_exons) {
>       my $overlapping_stable_id = $oex->stable_id();
>       $antisense_stable_id eq $overlapping_stable_id && next;
>       exists($tmp{$overlapping_stable_id}) && next;
>       $tmp{$overlapping_stable_id} = 1;
>       push(@ret,$oex);
>     }
>   }
>   return @ret;
> }
>
> sub get_overlapping_Exons{
>   my $slice = $_[0]->feature_Slice;
>   return @{$slice->get_all_Exons()};
> }
>
> my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
> my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
> my $overlap = 0;
> my $last = 0;
> foreach my $ag (@{$antisense_genes}) {
>   my @exons = @{ $ag->get_all_Exons };
>   my @overlap_exons = get_overlap(@exons);
>   foreach my $ex (@overlap_exons) {
>     my @print = (
>       $ag->stable_id(), $ex->stable_id(), $ex->seq_region_name(),
>       $ex->seq_region_start(), $ex->seq_region_end()
>     );
>     # get gene and transcript for this exon
>     print join("\t", at print),"\n";
>     $last=1;
>   }
>   $last && exit;
> }
>
>> Please see below Andy Yates' suggestions:
>>
>> 1) Using the REST API:
>> <snip>
>>
>> 2) Using the Perl API.
>> This would be the most efficient way to go as you can get all the 
>> antisense genes and their coordinates at once.
>> (If you have not used the Core Perl API, please have a look at 
>> http://www.ensembl.org/info/docs/api/core/index.html#api):
>>
>> #!/usr/bin/env perl
>>
>> use strict;
>> use warnings;
>>
>> use Bio::EnsEMBL::Registry;
>> Bio::EnsEMBL::Registry->load_registry_from_db(
>> -HOST => 'ensembldb.ensembl.org',
>> -PORT => 3306,
>> -USER => 'anonymous',
>> );
>>
>> my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
>> my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
>> my $overlap = 0;
>> foreach my $ag (@{$antisense_genes}) {
>>   my $antisense_stable_id = $ag->stable_id();
>>   my $overlapping_genes = $ag->get_overlapping_Genes();
>>   foreach my $og (@{$overlapping_genes}) {
>>     my $overlapping_stable_id = $og->stable_id();
>>     next if $antisense_stable_id eq $overlapping_stable_id;
>>     printf "%s (%s) overlaps our antisense gene %s\n", 
>> $overlapping_stable_id, $og->feature_Slice()->name(), 
>> $antisense_stable_id;
>>   }
>> }
>>
>> Hope it helps.
>>
>> Regards,
>> Denise
>>
>>
>> On 8 Nov 2013, at 08:59, Henrikki Almusa wrote:
>>
>>> Hi all,
>>>
>>> I'm delving into the antisense transcript world. I've found two 
>>> slightly different descriptions for this.
>>>
>>> First from
>>> http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html
>>>
>>> Gene
>>> Antisense. Has transcripts that overlap any coding exon of a locus 
>>> on the opposite strand, or for published instances of antisense 
>>> regulation of a coding gene.
>>> Transcript
>>> Antisense. Transcripts that overlap any coding exon of a locus on 
>>> the opposite strand, or for published instances of antisense 
>>> regulation of a coding gene.
>>>
>>> Second from:
>>> http://www.gencodegenes.org/gencode_biotypes.html
>>> antisense Transcript believed to be an antisense product used in the 
>>> regulation of the gene to which it belongs.
>>>
>>> Now the second implies that gene connected to antisense transcript 
>>> would be the gene that it blocks. But this conflicts which the other 
>>> pages description.
>>>
>>> So onto the main thing. I was asked to get a list of exons which 
>>> have antisense transcript. Output would have gene name, transcript 
>>> name, antisense transcript name and exon name and coordinates. I 
>>> wrote following sql to retrieve this, but it does not give me the 
>>> gene which is "blocked" but the "gene" of antisense transcript. Any 
>>> help in getting the blocked transcript as well.
>>>
>>> SELECT g.stable_id gene, tr.stable_id transcript, e.stable_id exon, 
>>> e.seq_region_start, e.seq_region_end, ext.rank
>>> FROM transcript tr
>>>   JOIN exon_transcript ext USING(transcript_id)
>>>   JOIN exon e USING(exon_id)
>>>   JOIN gene g USING(gene_id)
>>>   JOIN seq_region sr ON e.seq_region_id=sr.seq_region_id
>>>   JOIN coord_system coord USING (coord_system_id)
>>> WHERE coord.version = 'GRCh37' AND tr.biotype = 'antisense'
>>>
>>> Thanks,
>>> -- 
>>> Henrikki Almusa
>>> Bioinformatician
>>> Institute for Molecular Medicine Finland FIMM
>>>
>>> -- 
>>> Henrikki Almusa
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: 
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: 
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
>





More information about the Dev mailing list