[ensembl-dev] Getting antisense transcript of gene using sql / biomart

Henrikki Almusa henrikki.almusa at helsinki.fi
Tue Dec 3 09:36:59 GMT 2013


On 2013-11-29 15:47, Brett Thomas wrote:
> Is it true that an exon can be in multiple genes? I thought the
> definition of a gene was the group of transcripts with overlapping
> coding sequences:
> http://useast.ensembl.org/info/genome/genebuild/genome_annotation.html

 From technical point of view nothing seems to prevent it in the 
database. Since exon can be connected to any transcript and through that 
exon is connected to gene.

Whether the ensembl gene build system is allowed to do this, I don't 
know. Do any such situation exist, I don't know either. I was just 
looking how to I retrieve exon -> transcript and gene information from 
the database using perl API.

>
> On Thu, Nov 28, 2013 at 2:34 PM, Henrikki Almusa
> <henrikki.almusa at helsinki.fi <mailto:henrikki.almusa at helsinki.fi>> wrote:
>
>     On 2013-11-28 16:45, mag wrote:
>
>         Hi Henrikki,
>
>         Exons can be shared across transcripts and genes, hence they do
>         not have
>         a specific gene object attached to them.
>
>
>     True, they are not directly connected. However they are connected by
>     couple of steps. Exon is connected to transcript using many to many
>     relationship via exon_transcript table. Gene is connected to
>     transcript with one to many relationship. I was hoping that perl API
>     would allow backtracking this somehow. So from an exon to list of
>     trancripts to list of genes.
>
>     Method like $exon->get_all_Transcripts() is the missing link for me.
>     There is a $transcript->get_Gene() for the second step.
>
>
>         There is however a method called get_nearest_Gene available.
>         This will return the closes overlapping gene in that region.
>
>
>     I don't think that would always give me correct answer.
>
>     I could ask from overlapping exons to give me all overlapping
>     transcripts. Then get list of exons in those transcripts. This would
>     allow me to map the two exon list together using exon identifiers.
>     However that feels a bit hacky trick to do.
>
>     Alternatively I could query a list of all genes, transcripts and
>     exons and do the mapping separately using the exons. But I'm hoping
>     I wouldn't need to use extra scripts to do this.
>
>
>
>         Hope that helps,
>         Magali
>
>         On 28/11/2013 14:26, Henrikki Almusa wrote:
>
>             First off, sorry for a bit slow reply as I didn't notice the
>             mail at
>             first. However I'm happy that it got a reply.
>
>             On 2013-11-20 16:11, Denise Carvalho-Silva wrote:
>
>                 Dear Henrikki,
>
>                 Thanks for reporting the slightly discrepant definitions
>                 of antisense
>                 on the Vega and GENCODE websites.
>
>                 The GENCODE description is in fact speculative and it
>                 should be
>                 updated and consistent with the Vega definition.
>
>                 Our colleagues in the Havana team at the WTSI have
>                 agreed to change
>                 this. Both sources should give the following definition:
>
>                 Antisense. Has transcripts that overlap the genomic span
>                 (i.e. exon
>                 or introns) of a protein-coding locus on the opposite
>                 strand.
>
>                 I should clarify that Ensembl does not make any
>                 inferences on whether
>                 or not a given gene is regulated by its antisense locus
>                 and therefore
>                 we are not claiming that the antisense transcript
>                 regulates the
>                 expression of the gene on the opposite strand.
>
>
>             Thanks for clearing that up.
>
>                 You can get the coordinates of the antisense transcript
>                 and use them
>                 to get any overlapping genes on the same or opposite strand.
>
>                 If this is something that you would be interested in
>                 getting, we
>                 would strongly recommend that you use the APIs (REST or
>                 Perl) rather
>                 than SQL queries.
>
>
>             We have a local copy of ensembl core and variation for human
>             and using
>             SQL with that is easier. Thus I went with that. However
>             looking at the
>             two, I decided to try with perl API and modified #2 option
>             to suit the
>             aim better. I did hit a problem though.
>
>             How do I get transcripts and gene for a given exon? The
>             script that I
>             use is below. It currently, for testing, just outputs first gene
>             result. Does it have something to do with the
>             project_to_slice()?
>
>             #!/usr/bin/env perl
>
>             use strict;
>             use warnings;
>
>             use Bio::EnsEMBL::Registry;
>             Bio::EnsEMBL::Registry->load___registry_from_db(
>                '-HOST' => 'ensembldb.ensembl.org
>             <http://ensembldb.ensembl.org>',
>                '-PORT' => 3306,
>                '-USER' => 'anonymous',
>             );
>
>             sub get_overlap {
>                my @exons = @_;
>                local $_;
>                my %tmp = ();
>                my @ret = ();
>                foreach my $ex (@exons) {
>                  my @overlap_exons = get_overlapping_Exons($ex);
>                  my $antisense_stable_id = $ex->stable_id();
>                  foreach my $oex (@overlap_exons) {
>                    my $overlapping_stable_id = $oex->stable_id();
>                    $antisense_stable_id eq $overlapping_stable_id && next;
>                    exists($tmp{$overlapping___stable_id}) && next;
>                    $tmp{$overlapping_stable_id} = 1;
>                    push(@ret,$oex);
>                  }
>                }
>                return @ret;
>             }
>
>             sub get_overlapping_Exons{
>                my $slice = $_[0]->feature_Slice;
>                return @{$slice->get_all_Exons()};
>             }
>
>             my $ga =
>             Bio::EnsEMBL::Registry->get___adaptor('human','core','gene')__;
>             my $antisense_genes = $ga->fetch_all_by_biotype('__antisense');
>             my $overlap = 0;
>             my $last = 0;
>             foreach my $ag (@{$antisense_genes}) {
>                my @exons = @{ $ag->get_all_Exons };
>                my @overlap_exons = get_overlap(@exons);
>                foreach my $ex (@overlap_exons) {
>                  my @print = (
>                    $ag->stable_id(), $ex->stable_id(),
>             $ex->seq_region_name(),
>                    $ex->seq_region_start(), $ex->seq_region_end()
>                  );
>                  # get gene and transcript for this exon
>                  print join("\t", at print),"\n";
>                  $last=1;
>                }
>                $last && exit;
>             }
>
>                 Please see below Andy Yates' suggestions:
>
>                 1) Using the REST API:
>                 <snip>
>
>                 2) Using the Perl API.
>                 This would be the most efficient way to go as you can
>                 get all the
>                 antisense genes and their coordinates at once.
>                 (If you have not used the Core Perl API, please have a
>                 look at
>                 http://www.ensembl.org/info/__docs/api/core/index.html#api
>                 <http://www.ensembl.org/info/docs/api/core/index.html#api>):
>
>                 #!/usr/bin/env perl
>
>                 use strict;
>                 use warnings;
>
>                 use Bio::EnsEMBL::Registry;
>                 Bio::EnsEMBL::Registry->load___registry_from_db(
>                 -HOST => 'ensembldb.ensembl.org
>                 <http://ensembldb.ensembl.org>',
>                 -PORT => 3306,
>                 -USER => 'anonymous',
>                 );
>
>                 my $ga =
>                 Bio::EnsEMBL::Registry->get___adaptor('human','core','gene')__;
>                 my $antisense_genes =
>                 $ga->fetch_all_by_biotype('__antisense');
>                 my $overlap = 0;
>                 foreach my $ag (@{$antisense_genes}) {
>                    my $antisense_stable_id = $ag->stable_id();
>                    my $overlapping_genes = $ag->get_overlapping_Genes();
>                    foreach my $og (@{$overlapping_genes}) {
>                      my $overlapping_stable_id = $og->stable_id();
>                      next if $antisense_stable_id eq $overlapping_stable_id;
>                      printf "%s (%s) overlaps our antisense gene %s\n",
>                 $overlapping_stable_id, $og->feature_Slice()->name(),
>                 $antisense_stable_id;
>                    }
>                 }
>
>                 Hope it helps.
>
>                 Regards,
>                 Denise
>
>
>                 On 8 Nov 2013, at 08:59, Henrikki Almusa wrote:
>
>                     Hi all,
>
>                     I'm delving into the antisense transcript world.
>                     I've found two
>                     slightly different descriptions for this.
>
>                     First from
>                     http://vega.sanger.ac.uk/info/__about/gene_and_transcript___types.html
>                     <http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html>
>
>                     Gene
>                     Antisense. Has transcripts that overlap any coding
>                     exon of a locus
>                     on the opposite strand, or for published instances
>                     of antisense
>                     regulation of a coding gene.
>                     Transcript
>                     Antisense. Transcripts that overlap any coding exon
>                     of a locus on
>                     the opposite strand, or for published instances of
>                     antisense
>                     regulation of a coding gene.
>
>                     Second from:
>                     http://www.gencodegenes.org/__gencode_biotypes.html
>                     <http://www.gencodegenes.org/gencode_biotypes.html>
>                     antisense Transcript believed to be an antisense
>                     product used in the
>                     regulation of the gene to which it belongs.
>
>                     Now the second implies that gene connected to
>                     antisense transcript
>                     would be the gene that it blocks. But this conflicts
>                     which the other
>                     pages description.
>
>                     So onto the main thing. I was asked to get a list of
>                     exons which
>                     have antisense transcript. Output would have gene
>                     name, transcript
>                     name, antisense transcript name and exon name and
>                     coordinates. I
>                     wrote following sql to retrieve this, but it does
>                     not give me the
>                     gene which is "blocked" but the "gene" of antisense
>                     transcript. Any
>                     help in getting the blocked transcript as well.
>
>                     SELECT g.stable_id gene, tr.stable_id transcript,
>                     e.stable_id exon,
>                     e.seq_region_start, e.seq_region_end, ext.rank
>                     FROM transcript tr
>                        JOIN exon_transcript ext USING(transcript_id)
>                        JOIN exon e USING(exon_id)
>                        JOIN gene g USING(gene_id)
>                        JOIN seq_region sr ON
>                     e.seq_region_id=sr.seq_region___id
>                        JOIN coord_system coord USING (coord_system_id)
>                     WHERE coord.version = 'GRCh37' AND tr.biotype =
>                     'antisense'
>
>                     Thanks,
>                     --
>                     Henrikki Almusa
>                     Bioinformatician
>                     Institute for Molecular Medicine Finland FIMM
>
>                     --
>                     Henrikki Almusa
>
>                     _________________________________________________
>                     Dev mailing list Dev at ensembl.org
>                     <mailto:Dev at ensembl.org>
>                     Posting guidelines and subscribe/unsubscribe info:
>                     http://lists.ensembl.org/__mailman/listinfo/dev
>                     <http://lists.ensembl.org/mailman/listinfo/dev>
>                     Ensembl Blog: http://www.ensembl.info/
>
>
>
>                 _________________________________________________
>                 Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>                 Posting guidelines and subscribe/unsubscribe info:
>                 http://lists.ensembl.org/__mailman/listinfo/dev
>                 <http://lists.ensembl.org/mailman/listinfo/dev>
>                 Ensembl Blog: http://www.ensembl.info/
>
>
>
>
>
>
>     --
>     Henrikki Almusa
>
>     _________________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/__mailman/listinfo/dev
>     <http://lists.ensembl.org/mailman/listinfo/dev>
>     Ensembl Blog: http://www.ensembl.info/
>
>
>     The information in this e-mail is intended only for the person to
>     whom it is
>     addressed. If you believe this e-mail was sent to you in error and
>     the e-mail
>     contains patient information, please contact the Partners Compliance
>     HelpLine at
>     http://www.partners.org/__complianceline
>     <http://www.partners.org/complianceline> . If the e-mail was sent to
>     you in error
>     but does not contain patient information, please contact the sender
>     and properly
>     dispose of the e-mail.
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>


-- 
Henrikki Almusa




More information about the Dev mailing list