[ensembl-dev] Question regarding refseq transcript selector

Kieron Taylor ktaylor at ebi.ac.uk
Tue Jul 11 10:14:11 BST 2017


Hi Duarte,

CCDS data is a big part of how Ensembl currently chooses its canonical transcripts, in that we want our decisions to reflect the consensus of several resources. It is not essential to the process and you will still get reasonable decisions without it, but they might differ from what Ensembl publishes.

You can reduce the amount of warnings by only instantiating TranscriptSelector once, as opposed to for every feature or set of features. Then you get one warning at setup. If you really demand quiet output, then you always delete line 81 from TranscriptSelector. There should be no consequences for you doing this.

You can learn more about CCDS here: http://www.ensembl.org/info/genome/genebuild/ccds.html
We use CCDS data to populate a core-like database and then create a DBAdaptor which used by the TranscriptSelector.

I hope that covers the salient points.

Kieron



Kieron Taylor PhD.
Ensembl Developer

EMBL, European Bioinformatics Institute






> On 10 Jul 2017, at 09:29, Duarte Molha <duartemolha at gmail.com> wrote:
> 
> Dear Devs.
> 
> In a discussion some time ago here in the forum ( http://lists.ensembl.org/pipermail/dev/2016-July/012031.html ) regarding how I could make use of Ensembl logical canonical programming to modify a refseq gene to remove XM_ transcripts and for the selection of canonical transcripts of only NM_ transcripts,  Andy suggested that I could use something like this:
> 
> my $selector = Bio::EnsEMBL::Utils::TranscriptSelector->new();
> # get a gene from somewhere and modify to remove the XMs
> my $canonical_transcript = $selector->select_canonical_transcript_for_Gene($gene);
> 
> I followed his advise and integrated it into my code as follows:
> my $mod_gene = $gene;
> my $selector = Bio::EnsEMBL::Utils::TranscriptSelector->new();
> foreach my $transcript (@{$mod_gene->get_all_Transcripts()}){
> 	if ($options->{query} =~ /refseq/ && $transcript->stable_id() !~ /^NM_/){
> 		$mod_gene->remove_Transcript($transcript);
> 	}
> }
> my $canonical_transcript = $selector->select_canonical_transcript_for_Gene($mod_gene);
> $mod_gene->canonical_transcript($canonical_transcript);
> 	
> $gene=$mod_gene;
> 
> 
> This seems to do the job, however my script keeps issuing warnings 
> 
> -------------------- WARNING ----------------------
> MSG: Running without CCDS DB
> FILE: EnsEMBL/Utils/TranscriptSelector.pm LINE: 80
> CALLED BY: getFeatures.pl  LINE: 985
> Date (localtime)    = Fri Jul  7 17:32:54 2017
> Ensembl API version = 83
> 
> ---------------------------------------------------
> 
> I believe the problem is that I am not providing a CCDS DB on the line 
> my $selector = Bio::EnsEMBL::Utils::TranscriptSelector->new();
> I read the documentation and that seems to be an optional parameter.
> 
> Can you tell me if this is a problem and if it is how I can set the CCDS DB and if not how I can stop these warnings being issued?
> 
> Many thanks
> 
> Duarte
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list