[ensembl-dev] Filtering ncRNAs from a list of Member objects

José Afonso Guerra Assunção afonsoguerra at gmail.com
Tue Jul 10 19:12:33 BST 2012


Both genes and transcripts have a method called biotype.
you can select for "protein_coding"...

HTH,
Jose

On Tue, Jul 10, 2012 at 7:06 PM, Christopher Kelly <cpjkelly at gmail.com> wrote:
> Hello all,
>
> I am using a script to fetch all Members associated with a given genome db id, and fetch the protein family (if any)each member belongs to.
>
> This works fine. However, the comparative analysis program that uses the output of this script is producing less reliable results than would be desirable, since the human annotation seems to contain many ncRNA genes whose orthologues have not yet been identified in the annotations of many other species.
>
> In order to improve the accuracy of the analysis program, I would like to be able to filter out all ncRNA genes from my script output.
>
> The script usually fetches from ENSEMGLGENE. I have tried fetching from ENSEMBLPEP in order to filter out ncRNAs but this still reduces the quality of the output for analysis purposes.
>
> Having sifted through a good deal of the Ensembl and Ensembl Compara Doxygen documentation, I have yet to find an accurate method that would do this for me.
>
> Is there an accurate API function/method for filtering ncRNA-genes from a list of member or gene objects?
>
>
> Thanks in advance,
>
> Chris Kelly
>
>
>
> Here is the relevant section of the script code:
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> $registry->load_registry_from_db(
>     -host => 'ensembldb.ensembl.org',
>     -user => 'anonymous',
>     -port => '5306'); # add -verbose => '1' for more verbose output
>                       # add -db_version => 'version' for a specific ensembl db version, otherwise script will search most current version
>
> #get member adaptor
> my $member_adaptor = $registry->get_adaptor('Multi', 'compara', 'Member');
>
> my $family_adaptor = $registry->get_adaptor('Multi', 'compara', 'Family');
>
> my @member_list;
>
> #outside for-loop to iterate body of program for each id specified on the command line
> foreach my $species_genome_db_id (@genome_db_id_list){
>
>     @member_list = ();
>     my $file_path = "$current_directory"."/"."$species_genome_db_id";
>     mkdir $file_path, 0777;
>
>     #Fetch all members of given species (specified by genome_db_id) from given source.
>     #Source options are: 'ENSEMBLGENE', 'ENSEMBLPEP', 'Uniprot/SPTREMBL',
>     #'Uniprot/SWISSPROT', 'ENSEMBLTRANS', 'EXTERNALCDS'.
>     #Each species has a unique genome_db_id in the current ensembl compara db version.
>     sub get_members_list {
>
>         my($source, $genome_db_id, @members) = @_;
>
>         #fetch members list - returns listref of members
>         my $new_members_ref = $member_adaptor->fetch_all_by_source_genome_db_id("$source", "$genome_db_id");
>
>         #dereference members list ref
>         my @new_members = @$new_members_ref;
>
>         #join new_members list to the list of members
>         push(@members, @new_members);
>         @members;
>     }
>
>     #
>     #Get members from all sources for the given genome_db_id (denoting a specific species)
>     #
>     #@member_list = get_members_list('ENSEMBLGENE', $species_genome_db_id, @member_list);
>     #print "ENSEMBLGENE members fetched\n";
>     @member_list = get_members_list('ENSEMBLPEP', $species_genome_db_id, @member_list);
>     print "ENSEMBLPEP members fetched\n";
>     #@member_list = get_members_list('Uniprot/SPTREMBL', $species_genome_db_id, @member_list);
>     #print "Uniprot/SPTREMBL members fetched\n";
>     #@member_list = get_members_list('Uniprot/SWISSPROT', $species_genome_db_id, @member_list);
>     #print "Uniprot/SWISSPROT members fetched\n";
>     #@member_list = get_members_list('ENSEMBLTRANS', $species_genome_db_id, @member_list);
>     #print "ENSEMBLTRANS members fetched\n";
>     #@member_list = get_members_list('EXTERNALCDS', $species_genome_db_id, @member_list);
>     #print "EXTERNALCDS members fetched\n";
> }
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list