[ensembl-dev] Getting antisense transcript of gene using sql / biomart

Brett Thomas bthomas at atgu.mgh.harvard.edu
Fri Nov 29 13:47:28 GMT 2013


Is it true that an exon can be in multiple genes? I thought the definition
of a gene was the group of transcripts with overlapping coding sequences:
http://useast.ensembl.org/info/genome/genebuild/genome_annotation.html


On Thu, Nov 28, 2013 at 2:34 PM, Henrikki Almusa <
henrikki.almusa at helsinki.fi> wrote:

> On 2013-11-28 16:45, mag wrote:
>
>> Hi Henrikki,
>>
>> Exons can be shared across transcripts and genes, hence they do not have
>> a specific gene object attached to them.
>>
>
> True, they are not directly connected. However they are connected by
> couple of steps. Exon is connected to transcript using many to many
> relationship via exon_transcript table. Gene is connected to transcript
> with one to many relationship. I was hoping that perl API would allow
> backtracking this somehow. So from an exon to list of trancripts to list of
> genes.
>
> Method like $exon->get_all_Transcripts() is the missing link for me. There
> is a $transcript->get_Gene() for the second step.
>
>
>  There is however a method called get_nearest_Gene available.
>> This will return the closes overlapping gene in that region.
>>
>
> I don't think that would always give me correct answer.
>
> I could ask from overlapping exons to give me all overlapping transcripts.
> Then get list of exons in those transcripts. This would allow me to map the
> two exon list together using exon identifiers. However that feels a bit
> hacky trick to do.
>
> Alternatively I could query a list of all genes, transcripts and exons and
> do the mapping separately using the exons. But I'm hoping I wouldn't need
> to use extra scripts to do this.
>
>
>
>> Hope that helps,
>> Magali
>>
>> On 28/11/2013 14:26, Henrikki Almusa wrote:
>>
>>> First off, sorry for a bit slow reply as I didn't notice the mail at
>>> first. However I'm happy that it got a reply.
>>>
>>> On 2013-11-20 16:11, Denise Carvalho-Silva wrote:
>>>
>>>> Dear Henrikki,
>>>>
>>>> Thanks for reporting the slightly discrepant definitions of antisense
>>>> on the Vega and GENCODE websites.
>>>>
>>>> The GENCODE description is in fact speculative and it should be
>>>> updated and consistent with the Vega definition.
>>>>
>>>> Our colleagues in the Havana team at the WTSI have agreed to change
>>>> this. Both sources should give the following definition:
>>>>
>>>> Antisense. Has transcripts that overlap the genomic span (i.e. exon
>>>> or introns) of a protein-coding locus on the opposite strand.
>>>>
>>>> I should clarify that Ensembl does not make any inferences on whether
>>>> or not a given gene is regulated by its antisense locus and therefore
>>>> we are not claiming that the antisense transcript regulates the
>>>> expression of the gene on the opposite strand.
>>>>
>>>
>>> Thanks for clearing that up.
>>>
>>>  You can get the coordinates of the antisense transcript and use them
>>>> to get any overlapping genes on the same or opposite strand.
>>>>
>>>> If this is something that you would be interested in getting, we
>>>> would strongly recommend that you use the APIs (REST or Perl) rather
>>>> than SQL queries.
>>>>
>>>
>>> We have a local copy of ensembl core and variation for human and using
>>> SQL with that is easier. Thus I went with that. However looking at the
>>> two, I decided to try with perl API and modified #2 option to suit the
>>> aim better. I did hit a problem though.
>>>
>>> How do I get transcripts and gene for a given exon? The script that I
>>> use is below. It currently, for testing, just outputs first gene
>>> result. Does it have something to do with the project_to_slice()?
>>>
>>> #!/usr/bin/env perl
>>>
>>> use strict;
>>> use warnings;
>>>
>>> use Bio::EnsEMBL::Registry;
>>> Bio::EnsEMBL::Registry->load_registry_from_db(
>>>   '-HOST' => 'ensembldb.ensembl.org',
>>>   '-PORT' => 3306,
>>>   '-USER' => 'anonymous',
>>> );
>>>
>>> sub get_overlap {
>>>   my @exons = @_;
>>>   local $_;
>>>   my %tmp = ();
>>>   my @ret = ();
>>>   foreach my $ex (@exons) {
>>>     my @overlap_exons = get_overlapping_Exons($ex);
>>>     my $antisense_stable_id = $ex->stable_id();
>>>     foreach my $oex (@overlap_exons) {
>>>       my $overlapping_stable_id = $oex->stable_id();
>>>       $antisense_stable_id eq $overlapping_stable_id && next;
>>>       exists($tmp{$overlapping_stable_id}) && next;
>>>       $tmp{$overlapping_stable_id} = 1;
>>>       push(@ret,$oex);
>>>     }
>>>   }
>>>   return @ret;
>>> }
>>>
>>> sub get_overlapping_Exons{
>>>   my $slice = $_[0]->feature_Slice;
>>>   return @{$slice->get_all_Exons()};
>>> }
>>>
>>> my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
>>> my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
>>> my $overlap = 0;
>>> my $last = 0;
>>> foreach my $ag (@{$antisense_genes}) {
>>>   my @exons = @{ $ag->get_all_Exons };
>>>   my @overlap_exons = get_overlap(@exons);
>>>   foreach my $ex (@overlap_exons) {
>>>     my @print = (
>>>       $ag->stable_id(), $ex->stable_id(), $ex->seq_region_name(),
>>>       $ex->seq_region_start(), $ex->seq_region_end()
>>>     );
>>>     # get gene and transcript for this exon
>>>     print join("\t", at print),"\n";
>>>     $last=1;
>>>   }
>>>   $last && exit;
>>> }
>>>
>>>  Please see below Andy Yates' suggestions:
>>>>
>>>> 1) Using the REST API:
>>>> <snip>
>>>>
>>>> 2) Using the Perl API.
>>>> This would be the most efficient way to go as you can get all the
>>>> antisense genes and their coordinates at once.
>>>> (If you have not used the Core Perl API, please have a look at
>>>> http://www.ensembl.org/info/docs/api/core/index.html#api):
>>>>
>>>> #!/usr/bin/env perl
>>>>
>>>> use strict;
>>>> use warnings;
>>>>
>>>> use Bio::EnsEMBL::Registry;
>>>> Bio::EnsEMBL::Registry->load_registry_from_db(
>>>> -HOST => 'ensembldb.ensembl.org',
>>>> -PORT => 3306,
>>>> -USER => 'anonymous',
>>>> );
>>>>
>>>> my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
>>>> my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
>>>> my $overlap = 0;
>>>> foreach my $ag (@{$antisense_genes}) {
>>>>   my $antisense_stable_id = $ag->stable_id();
>>>>   my $overlapping_genes = $ag->get_overlapping_Genes();
>>>>   foreach my $og (@{$overlapping_genes}) {
>>>>     my $overlapping_stable_id = $og->stable_id();
>>>>     next if $antisense_stable_id eq $overlapping_stable_id;
>>>>     printf "%s (%s) overlaps our antisense gene %s\n",
>>>> $overlapping_stable_id, $og->feature_Slice()->name(),
>>>> $antisense_stable_id;
>>>>   }
>>>> }
>>>>
>>>> Hope it helps.
>>>>
>>>> Regards,
>>>> Denise
>>>>
>>>>
>>>> On 8 Nov 2013, at 08:59, Henrikki Almusa wrote:
>>>>
>>>>  Hi all,
>>>>>
>>>>> I'm delving into the antisense transcript world. I've found two
>>>>> slightly different descriptions for this.
>>>>>
>>>>> First from
>>>>> http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html
>>>>>
>>>>> Gene
>>>>> Antisense. Has transcripts that overlap any coding exon of a locus
>>>>> on the opposite strand, or for published instances of antisense
>>>>> regulation of a coding gene.
>>>>> Transcript
>>>>> Antisense. Transcripts that overlap any coding exon of a locus on
>>>>> the opposite strand, or for published instances of antisense
>>>>> regulation of a coding gene.
>>>>>
>>>>> Second from:
>>>>> http://www.gencodegenes.org/gencode_biotypes.html
>>>>> antisense Transcript believed to be an antisense product used in the
>>>>> regulation of the gene to which it belongs.
>>>>>
>>>>> Now the second implies that gene connected to antisense transcript
>>>>> would be the gene that it blocks. But this conflicts which the other
>>>>> pages description.
>>>>>
>>>>> So onto the main thing. I was asked to get a list of exons which
>>>>> have antisense transcript. Output would have gene name, transcript
>>>>> name, antisense transcript name and exon name and coordinates. I
>>>>> wrote following sql to retrieve this, but it does not give me the
>>>>> gene which is "blocked" but the "gene" of antisense transcript. Any
>>>>> help in getting the blocked transcript as well.
>>>>>
>>>>> SELECT g.stable_id gene, tr.stable_id transcript, e.stable_id exon,
>>>>> e.seq_region_start, e.seq_region_end, ext.rank
>>>>> FROM transcript tr
>>>>>   JOIN exon_transcript ext USING(transcript_id)
>>>>>   JOIN exon e USING(exon_id)
>>>>>   JOIN gene g USING(gene_id)
>>>>>   JOIN seq_region sr ON e.seq_region_id=sr.seq_region_id
>>>>>   JOIN coord_system coord USING (coord_system_id)
>>>>> WHERE coord.version = 'GRCh37' AND tr.biotype = 'antisense'
>>>>>
>>>>> Thanks,
>>>>> --
>>>>> Henrikki Almusa
>>>>> Bioinformatician
>>>>> Institute for Molecular Medicine Finland FIMM
>>>>>
>>>>> --
>>>>> Henrikki Almusa
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>>
>>
>
> --
> Henrikki Almusa
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> The information in this e-mail is intended only for the person to whom it
> is
> addressed. If you believe this e-mail was sent to you in error and the
> e-mail
> contains patient information, please contact the Partners Compliance
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in
> error
> but does not contain patient information, please contact the sender and
> properly
> dispose of the e-mail.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20131129/828d107f/attachment.html>


More information about the Dev mailing list