[ensembl-dev] Data differences between api query and website
Emily Wong
emily.wong at sydney.edu.au
Thu Jan 6 00:40:50 GMT 2011
Dear All,
I've extracted data using a script which takes a gene of interest, gets its parent node and prints out the ids of all the leaf nodes.
I am getting discrepancies between my results through querying the database through the api and the gene trees shown on the ensembl website.
For example:
Using gene id ENSOANG00000010713 - I get a parent node with 4 children:
ENSOANG00000008298; ENSOANG00000008299; ENSOANG00000022578; ENSOANG00000010713.
However when I examine gene trees I see that this is not the case at all - with orthologs from other species present.
(please see http://www.ensembl.org/Ornithorhynchus_anatinus/Gene/Compara_Tree?collapse=1959500%2C1959435%2C1959539%2C1959433%2C1959430%2C1959490;db=core;g=ENSOANG00000010713;r=X5:2856975-2885950)
This is not the case for all sequences - the majority (I think) do correlate with the gene trees presented on website.
Do you think this is a database version issue - they are fairly large discrepancies?
Many thanks in advance,
Emily
---
Script below:
use Bio::EnsEMBL::Registry;
my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org',
-user => 'anonymous'
);
my $member_adaptor =
Bio::EnsEMBL::Registry->get_adaptor("Compara", "compara", "Member");
my $proteintree_adaptor =
Bio::EnsEMBL::Registry->get_adaptor("Compara", "compara",
"ProteinTree");
----
#I then extracted for a list of genes I was interested in
foreach my $id (@data)
{
chomp($id);
print "$id\n";
my $member = $member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE",
$id);
next unless (defined $member);
my $aligned_member = $proteintree_adaptor->
fetch_AlignedMember_by_member_id_root_id
($member->get_longest_peptide_Member->member_id);
my $node = $aligned_member;
while ($node->has_parent){
#if node is a leaf then add all leaves from parent even if some are not expressed
my $terminal = 0;
if ($node->is_leaf){
$terminal = 1;
}
$node = $node->parent();
if ($terminal == 1){
print "node is leaf\n";
my $exp = 1;
my $proteintree =
$proteintree_adaptor->fetch_node_by_node_id($node->node_id);
my @leaves = @{$proteintree->get_all_leaves};
print scalar(@leaves)."\n";
foreach my $leaf (@leaves)
{
my $gene = $leaf->get_Gene->stable_id;
print $gene;
my $prot = $leaf->get_longest_peptide_Member->stable_id;
....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110106/7422b1b4/attachment.html>
More information about the Dev
mailing list