[ensembl-dev] Getting antisense transcript of gene using sql / biomart

Henrikki Almusa henrikki.almusa at helsinki.fi
Thu Nov 28 14:26:54 GMT 2013


First off, sorry for a bit slow reply as I didn't notice the mail at 
first. However I'm happy that it got a reply.

On 2013-11-20 16:11, Denise Carvalho-Silva wrote:
> Dear Henrikki,
>
> Thanks for reporting the slightly discrepant definitions of antisense on the Vega and GENCODE websites.
>
> The GENCODE description is in fact speculative and it should be updated and consistent with the Vega definition.
>
> Our colleagues in the Havana team at the WTSI have agreed to change this. Both sources should give the following definition:
>
> Antisense. Has transcripts that overlap the genomic span (i.e. exon or introns) of a protein-coding locus on the opposite strand.
>
> I should clarify that Ensembl does not make any inferences on whether or not a given gene is regulated by its antisense locus and therefore we are not claiming that the antisense transcript regulates the expression of the gene on the opposite strand.

Thanks for clearing that up.

> You can get the coordinates of the antisense transcript and use them to get any overlapping genes on the same or opposite strand.
>
> If this is something that you would be interested in getting, we would strongly recommend that you use the APIs (REST or Perl) rather than SQL queries.

We have a local copy of ensembl core and variation for human and using 
SQL with that is easier. Thus I went with that. However looking at the 
two, I decided to try with perl API and modified #2 option to suit the 
aim better. I did hit a problem though.

How do I get transcripts and gene for a given exon? The script that I 
use is below. It currently, for testing, just outputs first gene result. 
Does it have something to do with the project_to_slice()?

#!/usr/bin/env perl

use strict;
use warnings;

use Bio::EnsEMBL::Registry;
Bio::EnsEMBL::Registry->load_registry_from_db(
   '-HOST' => 'ensembldb.ensembl.org',
   '-PORT' => 3306,
   '-USER' => 'anonymous',
);

sub get_overlap {
   my @exons = @_;
   local $_;
   my %tmp = ();
   my @ret = ();
   foreach my $ex (@exons) {
     my @overlap_exons = get_overlapping_Exons($ex);
     my $antisense_stable_id = $ex->stable_id();
     foreach my $oex (@overlap_exons) {
       my $overlapping_stable_id = $oex->stable_id();
       $antisense_stable_id eq $overlapping_stable_id && next;
       exists($tmp{$overlapping_stable_id}) && next;
       $tmp{$overlapping_stable_id} = 1;
       push(@ret,$oex);
     }
   }
   return @ret;
}

sub get_overlapping_Exons{
   my $slice = $_[0]->feature_Slice;
   return @{$slice->get_all_Exons()};
}

my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
my $overlap = 0;
my $last = 0;
foreach my $ag (@{$antisense_genes}) {
   my @exons = @{ $ag->get_all_Exons };
   my @overlap_exons = get_overlap(@exons);
   foreach my $ex (@overlap_exons) {
     my @print = (
       $ag->stable_id(), $ex->stable_id(), $ex->seq_region_name(),
       $ex->seq_region_start(), $ex->seq_region_end()
     );
     # get gene and transcript for this exon
     print join("\t", at print),"\n";
     $last=1;
   }
   $last && exit;
}

> Please see below Andy Yates' suggestions:
>
> 1) Using the REST API:
><snip>
>
> 2) Using the Perl API.
> This would be the most efficient way to go as you can get all the antisense genes and their coordinates at once.
> (If you have not used the Core Perl API, please have a look at http://www.ensembl.org/info/docs/api/core/index.html#api):
>
> #!/usr/bin/env perl
>
> use strict;
> use warnings;
>
> use Bio::EnsEMBL::Registry;
> Bio::EnsEMBL::Registry->load_registry_from_db(
> -HOST => 'ensembldb.ensembl.org',
> -PORT => 3306,
> -USER => 'anonymous',
> );
>
> my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
> my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
> my $overlap = 0;
> foreach my $ag (@{$antisense_genes}) {
>   my $antisense_stable_id = $ag->stable_id();
>   my $overlapping_genes = $ag->get_overlapping_Genes();
>   foreach my $og (@{$overlapping_genes}) {
>     my $overlapping_stable_id = $og->stable_id();
>     next if $antisense_stable_id eq $overlapping_stable_id;
>     printf "%s (%s) overlaps our antisense gene %s\n", $overlapping_stable_id, $og->feature_Slice()->name(), $antisense_stable_id;
>   }
> }
>
> Hope it helps.
>
> Regards,
> Denise
>
>
> On 8 Nov 2013, at 08:59, Henrikki Almusa wrote:
>
>> Hi all,
>>
>> I'm delving into the antisense transcript world. I've found two slightly different descriptions for this.
>>
>> First from
>> http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html
>>
>> Gene
>> Antisense. Has transcripts that overlap any coding exon of a locus on the opposite strand, or for published instances of antisense regulation of a coding gene.
>> Transcript
>> Antisense. Transcripts that overlap any coding exon of a locus on the opposite strand, or for published instances of antisense regulation of a coding gene.
>>
>> Second from:
>> http://www.gencodegenes.org/gencode_biotypes.html
>> antisense Transcript believed to be an antisense product used in the regulation of the gene to which it belongs.
>>
>> Now the second implies that gene connected to antisense transcript would be the gene that it blocks. But this conflicts which the other pages description.
>>
>> So onto the main thing. I was asked to get a list of exons which have antisense transcript. Output would have gene name, transcript name, antisense transcript name and exon name and coordinates. I wrote following sql to retrieve this, but it does not give me the gene which is "blocked" but the "gene" of antisense transcript. Any help in getting the blocked transcript as well.
>>
>> SELECT g.stable_id gene, tr.stable_id transcript, e.stable_id exon, e.seq_region_start, e.seq_region_end, ext.rank
>> FROM transcript tr
>>   JOIN exon_transcript ext USING(transcript_id)
>>   JOIN exon e USING(exon_id)
>>   JOIN gene g USING(gene_id)
>>   JOIN seq_region sr ON e.seq_region_id=sr.seq_region_id
>>   JOIN coord_system coord USING (coord_system_id)
>> WHERE coord.version = 'GRCh37' AND tr.biotype = 'antisense'
>>
>> Thanks,
>> --
>> Henrikki Almusa
>> Bioinformatician
>> Institute for Molecular Medicine Finland FIMM
>>
>> --
>> Henrikki Almusa
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>


-- 
Henrikki Almusa




More information about the Dev mailing list