[ensembl-dev] Data differences between api query and website

Emily Wong emily.wong at sydney.edu.au
Thu Jan 6 00:40:50 GMT 2011


Dear All,

I've extracted data using a script which takes a gene of interest, gets its parent node and prints out the ids of all the leaf nodes.
I am getting discrepancies between my results through querying the database through the api and the gene trees shown on the ensembl website.

For example:
Using gene id ENSOANG00000010713 - I get a parent node with 4 children:
ENSOANG00000008298; ENSOANG00000008299; ENSOANG00000022578; ENSOANG00000010713.
However when I examine gene trees  I see that this is not the case at all - with orthologs from other species present.

(please see http://www.ensembl.org/Ornithorhynchus_anatinus/Gene/Compara_Tree?collapse=1959500%2C1959435%2C1959539%2C1959433%2C1959430%2C1959490;db=core;g=ENSOANG00000010713;r=X5:2856975-2885950)

This is not the case for all sequences - the majority (I think) do correlate with the gene trees presented on website.

Do you think this is a database version issue - they are fairly large discrepancies?


Many thanks in advance,
Emily
---
Script below:

use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
        -user => 'anonymous'
        );

my $member_adaptor =
Bio::EnsEMBL::Registry->get_adaptor("Compara", "compara", "Member");


my $proteintree_adaptor =
Bio::EnsEMBL::Registry->get_adaptor("Compara", "compara",
"ProteinTree");
----
#I then extracted for a list of genes I was interested in
foreach my $id (@data)
{
	chomp($id);
	print "$id\n";
	my $member = $member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE", 
	$id);
	next unless (defined $member);
	my $aligned_member = $proteintree_adaptor->
	fetch_AlignedMember_by_member_id_root_id
	($member->get_longest_peptide_Member->member_id);
	my $node = $aligned_member;

 while ($node->has_parent){
	#if node is a leaf then add all leaves from parent even if some are not expressed
	my $terminal = 0;
	if ($node->is_leaf){
		 $terminal = 1;
	}
        $node = $node->parent();
	if ($terminal == 1){
		print "node is leaf\n";
		my $exp = 1;
		my $proteintree =
        	$proteintree_adaptor->fetch_node_by_node_id($node->node_id);
        	my @leaves = @{$proteintree->get_all_leaves};
                print scalar(@leaves)."\n";
		foreach my $leaf (@leaves)
                {
                        my $gene = $leaf->get_Gene->stable_id;
                        print $gene;
			my $prot = $leaf->get_longest_peptide_Member->stable_id;
....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110106/7422b1b4/attachment.html>


More information about the Dev mailing list