[ensembl-dev] Getting antisense transcript of gene using sql / biomart
mag
mr6 at ebi.ac.uk
Thu Nov 28 14:45:25 GMT 2013
Hi Henrikki,
Exons can be shared across transcripts and genes, hence they do not have
a specific gene object attached to them.
There is however a method called get_nearest_Gene available.
This will return the closes overlapping gene in that region.
Hope that helps,
Magali
On 28/11/2013 14:26, Henrikki Almusa wrote:
> First off, sorry for a bit slow reply as I didn't notice the mail at
> first. However I'm happy that it got a reply.
>
> On 2013-11-20 16:11, Denise Carvalho-Silva wrote:
>> Dear Henrikki,
>>
>> Thanks for reporting the slightly discrepant definitions of antisense
>> on the Vega and GENCODE websites.
>>
>> The GENCODE description is in fact speculative and it should be
>> updated and consistent with the Vega definition.
>>
>> Our colleagues in the Havana team at the WTSI have agreed to change
>> this. Both sources should give the following definition:
>>
>> Antisense. Has transcripts that overlap the genomic span (i.e. exon
>> or introns) of a protein-coding locus on the opposite strand.
>>
>> I should clarify that Ensembl does not make any inferences on whether
>> or not a given gene is regulated by its antisense locus and therefore
>> we are not claiming that the antisense transcript regulates the
>> expression of the gene on the opposite strand.
>
> Thanks for clearing that up.
>
>> You can get the coordinates of the antisense transcript and use them
>> to get any overlapping genes on the same or opposite strand.
>>
>> If this is something that you would be interested in getting, we
>> would strongly recommend that you use the APIs (REST or Perl) rather
>> than SQL queries.
>
> We have a local copy of ensembl core and variation for human and using
> SQL with that is easier. Thus I went with that. However looking at the
> two, I decided to try with perl API and modified #2 option to suit the
> aim better. I did hit a problem though.
>
> How do I get transcripts and gene for a given exon? The script that I
> use is below. It currently, for testing, just outputs first gene
> result. Does it have something to do with the project_to_slice()?
>
> #!/usr/bin/env perl
>
> use strict;
> use warnings;
>
> use Bio::EnsEMBL::Registry;
> Bio::EnsEMBL::Registry->load_registry_from_db(
> '-HOST' => 'ensembldb.ensembl.org',
> '-PORT' => 3306,
> '-USER' => 'anonymous',
> );
>
> sub get_overlap {
> my @exons = @_;
> local $_;
> my %tmp = ();
> my @ret = ();
> foreach my $ex (@exons) {
> my @overlap_exons = get_overlapping_Exons($ex);
> my $antisense_stable_id = $ex->stable_id();
> foreach my $oex (@overlap_exons) {
> my $overlapping_stable_id = $oex->stable_id();
> $antisense_stable_id eq $overlapping_stable_id && next;
> exists($tmp{$overlapping_stable_id}) && next;
> $tmp{$overlapping_stable_id} = 1;
> push(@ret,$oex);
> }
> }
> return @ret;
> }
>
> sub get_overlapping_Exons{
> my $slice = $_[0]->feature_Slice;
> return @{$slice->get_all_Exons()};
> }
>
> my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
> my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
> my $overlap = 0;
> my $last = 0;
> foreach my $ag (@{$antisense_genes}) {
> my @exons = @{ $ag->get_all_Exons };
> my @overlap_exons = get_overlap(@exons);
> foreach my $ex (@overlap_exons) {
> my @print = (
> $ag->stable_id(), $ex->stable_id(), $ex->seq_region_name(),
> $ex->seq_region_start(), $ex->seq_region_end()
> );
> # get gene and transcript for this exon
> print join("\t", at print),"\n";
> $last=1;
> }
> $last && exit;
> }
>
>> Please see below Andy Yates' suggestions:
>>
>> 1) Using the REST API:
>> <snip>
>>
>> 2) Using the Perl API.
>> This would be the most efficient way to go as you can get all the
>> antisense genes and their coordinates at once.
>> (If you have not used the Core Perl API, please have a look at
>> http://www.ensembl.org/info/docs/api/core/index.html#api):
>>
>> #!/usr/bin/env perl
>>
>> use strict;
>> use warnings;
>>
>> use Bio::EnsEMBL::Registry;
>> Bio::EnsEMBL::Registry->load_registry_from_db(
>> -HOST => 'ensembldb.ensembl.org',
>> -PORT => 3306,
>> -USER => 'anonymous',
>> );
>>
>> my $ga = Bio::EnsEMBL::Registry->get_adaptor('human','core','gene');
>> my $antisense_genes = $ga->fetch_all_by_biotype('antisense');
>> my $overlap = 0;
>> foreach my $ag (@{$antisense_genes}) {
>> my $antisense_stable_id = $ag->stable_id();
>> my $overlapping_genes = $ag->get_overlapping_Genes();
>> foreach my $og (@{$overlapping_genes}) {
>> my $overlapping_stable_id = $og->stable_id();
>> next if $antisense_stable_id eq $overlapping_stable_id;
>> printf "%s (%s) overlaps our antisense gene %s\n",
>> $overlapping_stable_id, $og->feature_Slice()->name(),
>> $antisense_stable_id;
>> }
>> }
>>
>> Hope it helps.
>>
>> Regards,
>> Denise
>>
>>
>> On 8 Nov 2013, at 08:59, Henrikki Almusa wrote:
>>
>>> Hi all,
>>>
>>> I'm delving into the antisense transcript world. I've found two
>>> slightly different descriptions for this.
>>>
>>> First from
>>> http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html
>>>
>>> Gene
>>> Antisense. Has transcripts that overlap any coding exon of a locus
>>> on the opposite strand, or for published instances of antisense
>>> regulation of a coding gene.
>>> Transcript
>>> Antisense. Transcripts that overlap any coding exon of a locus on
>>> the opposite strand, or for published instances of antisense
>>> regulation of a coding gene.
>>>
>>> Second from:
>>> http://www.gencodegenes.org/gencode_biotypes.html
>>> antisense Transcript believed to be an antisense product used in the
>>> regulation of the gene to which it belongs.
>>>
>>> Now the second implies that gene connected to antisense transcript
>>> would be the gene that it blocks. But this conflicts which the other
>>> pages description.
>>>
>>> So onto the main thing. I was asked to get a list of exons which
>>> have antisense transcript. Output would have gene name, transcript
>>> name, antisense transcript name and exon name and coordinates. I
>>> wrote following sql to retrieve this, but it does not give me the
>>> gene which is "blocked" but the "gene" of antisense transcript. Any
>>> help in getting the blocked transcript as well.
>>>
>>> SELECT g.stable_id gene, tr.stable_id transcript, e.stable_id exon,
>>> e.seq_region_start, e.seq_region_end, ext.rank
>>> FROM transcript tr
>>> JOIN exon_transcript ext USING(transcript_id)
>>> JOIN exon e USING(exon_id)
>>> JOIN gene g USING(gene_id)
>>> JOIN seq_region sr ON e.seq_region_id=sr.seq_region_id
>>> JOIN coord_system coord USING (coord_system_id)
>>> WHERE coord.version = 'GRCh37' AND tr.biotype = 'antisense'
>>>
>>> Thanks,
>>> --
>>> Henrikki Almusa
>>> Bioinformatician
>>> Institute for Molecular Medicine Finland FIMM
>>>
>>> --
>>> Henrikki Almusa
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
>
More information about the Dev
mailing list