[ensembl-dev] Getting antisense transcript of gene using sql / biomart

Denise Carvalho-Silva denise at ebi.ac.uk
Wed Nov 20 14:11:44 GMT 2013


Dear Henrikki,

Thanks for reporting the slightly discrepant definitions of antisense on the Vega and GENCODE websites.

The GENCODE description is in fact speculative and it should be updated and consistent with the Vega definition.

Our colleagues in the Havana team at the WTSI have agreed to change this. Both sources should give the following definition:

Antisense. Has transcripts that overlap the genomic span (i.e. exon or introns) of a protein-coding locus on the opposite strand.

I should clarify that Ensembl does not make any inferences on whether or not a given gene is regulated by its antisense locus and therefore we are not claiming that the antisense transcript regulates the expression of the gene on the opposite strand.

You can get the coordinates of the antisense transcript and use them to get any overlapping genes on the same or opposite strand.

If this is something that you would be interested in getting, we would strongly recommend that you use the APIs (REST or Perl) rather than SQL queries.

Please see below Andy Yates' suggestions:

1) Using the REST API:

http://beta.rest.ensembl.org/feature/region/human/7:1..5000000?feature=gene;biotype=antisense

--- 
- 
 ID: ENSG00000240093
 biotype: antisense
 description: ~
 end: 194180
 external_name: AC093627.12
 feature_type: gene
 logic_name: havana
 seq_region_name: 7
 source: havana
 start: 182935
 strand: -1

(there are more)

http://beta.rest.ensembl.org/feature/id/ENSG00000240093?feature=gene

--- 
- 
 ID: ENSG00000240093
 biotype: antisense
 description: ~
 end: 194180
 external_name: AC093627.12
 feature_type: gene
 logic_name: havana
 seq_region_name: 7
 source: havana
 start: 182935
 strand: -1
- 
 ID: ENSG00000177706
 biotype: protein_coding
 description: family with sequence similarity 20, member C [Source:HGNC Symbol;Acc:22140]
 end: 300711
 external_name: FAM20C
 feature_type: gene
 logic_name: ensembl_havana_gene
 seq_region_name: 7
 source: ensembl
 start: 192969
 strand: 1

2) Using the Perl API.
This would be the most efficient way to go as you can get all the antisense genes and their coordinates at once.
(If you have not used the Core Perl API, please have a look at http://www.ensembl.org/info/docs/api/core/index.html#api):

#!/usr/bin/env perl

use strict;
use warnings;

use Bio::EnsEMBL::Registry;
Bio::EnsEMBL::Registry->load_registry_from_db(
-HOST => 'ensembldb.ensembl.org',
-PORT => 3306,
-USER => 'anonymous',
);

my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
my $overlap = 0;
foreach my $ag (@{$antisense_genes}) {
 my $antisense_stable_id = $ag->stable_id();
 my $overlapping_genes = $ag->get_overlapping_Genes();
 foreach my $og (@{$overlapping_genes}) {
   my $overlapping_stable_id = $og->stable_id();
   next if $antisense_stable_id eq $overlapping_stable_id;
   printf "%s (%s) overlaps our antisense gene %s\n", $overlapping_stable_id, $og->feature_Slice()->name(), $antisense_stable_id;
 }
}

Hope it helps.

Regards,
Denise


On 8 Nov 2013, at 08:59, Henrikki Almusa wrote:

> Hi all,
> 
> I'm delving into the antisense transcript world. I've found two slightly different descriptions for this.
> 
> First from
> http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html
> 
> Gene
> Antisense. Has transcripts that overlap any coding exon of a locus on the opposite strand, or for published instances of antisense regulation of a coding gene.
> Transcript
> Antisense. Transcripts that overlap any coding exon of a locus on the opposite strand, or for published instances of antisense regulation of a coding gene.
> 
> Second from:
> http://www.gencodegenes.org/gencode_biotypes.html
> antisense Transcript believed to be an antisense product used in the regulation of the gene to which it belongs.
> 
> Now the second implies that gene connected to antisense transcript would be the gene that it blocks. But this conflicts which the other pages description.
> 
> So onto the main thing. I was asked to get a list of exons which have antisense transcript. Output would have gene name, transcript name, antisense transcript name and exon name and coordinates. I wrote following sql to retrieve this, but it does not give me the gene which is "blocked" but the "gene" of antisense transcript. Any help in getting the blocked transcript as well.
> 
> SELECT g.stable_id gene, tr.stable_id transcript, e.stable_id exon, e.seq_region_start, e.seq_region_end, ext.rank
> FROM transcript tr
>  JOIN exon_transcript ext USING(transcript_id)
>  JOIN exon e USING(exon_id)
>  JOIN gene g USING(gene_id)
>  JOIN seq_region sr ON e.seq_region_id=sr.seq_region_id
>  JOIN coord_system coord USING (coord_system_id)
> WHERE coord.version = 'GRCh37' AND tr.biotype = 'antisense'
> 
> Thanks,
> -- 
> Henrikki Almusa
> Bioinformatician
> Institute for Molecular Medicine Finland FIMM
> 
> -- 
> Henrikki Almusa
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list