[ensembl-dev] Members, GeneMembers and SeqMembers in Ensembl 76

Matthieu Muffato muffato at ebi.ac.uk
Sun Jun 1 22:36:17 BST 2014


Hi Ed,

I'm not sure I'm following you.
There already is a get_canonical_SeqMember() method. It was introduced 
in e71 to replace the three previous methods 
get_canonical_peptide_Member() get_canonical_Member() and 
get_canonical_transcript_Member(), which were all doing the same thing 
in the end, It returns the SeqMember that is used in the gene-trees.
Are you suggesting to keep get_canonical_Member() longer than scheduled 
to still allow old code to work ?

Methods such as perc_id(), perc_pos(), etc, are most useful for 
homologies. The homology object will still return SeqMembers when you 
call get_all_Members(), and they will still have those methods defined.
Those 3 methods are actually also available for gene-tree and family 
members, but I think they return undef (they've probably always had)

There is no e76 database available yet, but it is still possible to 
create your own 76-like database by patching the e75 one. All the SQL 
patches should be on git ("master" branch)

Best,
Matthieu

On 23/05/14 18:39, Ed Gray wrote:
> Thanks Matthieu, that was a very helpful response.  No need to feel
> defensive, I understand the reasons behind separating Gene and Seq
> Members and I support the change.
>
> I guess the only suggestion I might have is to implement some amount of
> backwards compatibility by using the GeneMember's
> get_canonical_SeqMember to give values for SeqMember methods that are
> not implemented in GeneMember.  At least that way, old code works,
> albeit using the default canonical seqmember.  I'd think that would
> suffice in most cases, and providing a perc_id, perc_pos, perc_cov, etc.
> etc. for GeneMembers.
>
> Frankly, I believe most folks just want the default canonical sequence
> rather than having to leaf through transcripts and try to figure out
> which is the best (e.g. all exons covered and ordered, with the
> 'longest' sequence length yet not 'too long').  That is always a pain to
> leaf through in code and the decisions we make based on the data in one
> species often do not hold in another species.  I think most of us
> comparative genomics types would prefer to accept EMBLs call on what is
> the best representation of the canonical sequence (amino acids for
> proteiin coding or mRNA for ncRNA genes) for a specific gene in a
> specific species.
>
> Is that a good idea to both provide aome backward compatibility while
> providing the use base with a real feature whereby default transcript
> sequences can be obtained?
>
>> Date: Wed, 21 May 2014 18:55:31 +0100
>> From: muffato at ebi.ac.uk
>> To: dev at ensembl.org
>> Subject: Re: [ensembl-dev] Members, GeneMembers and SeqMembers in Ensembl 76
>>
>> Dear Ed,
>>
>> Yes, we are changing the compara API to use two specialized versions of
>> Member (GeneMember and SeqMember) depending on the context (genes or
>> gene products, i.e. ncRNAs and proteins). The Member adaptor is
>> deprecated, as the preferred way will now be GeneMember and SeqMember
>> adaptors. The Member object will still there as a base class of
>> GeneMember and SeqMember, and some methods that can work with both kinds
>> of members will still be around, and have the unspecialized word Member
>> in their name (like get_all_Members, fetch_all_by_Member, etc)
>>
>> The tutorials should be already using the right adaptors. However, some
>> variables are still named $member, and it's probably clearer to use
>> $seq_member or $gene_member instead. We still have to update the
>> documentation of some methods to clearly state what kind of member
>> they're dealing with.
>>
>> You can write your code using v75 API / documentation. Only the methods
>> that print a deprecation warning may be removed in e76. If they don't
>> complain, it means that they'll still be available in e76.
>>
>> The change is declared in a very concise manner in our declaration of
>> intentions: "Split member into seq_member and gene_member + members
>> depend on dnafrags" http://admin.ensembl.org/Changelog/Summary
>> We want to split the table that holds the members into two tables
>> (seq_member and gene_member). We've first changed the object model in
>> e71 (Apr 2013) and set up all the deprecation warnings for a final
>> removal in e76. We're also using this opportunity to link the two member
>> tables to the dnafrag table, which is used for genomic alignments. The
>> goal is to provide smoother links within the compara resources. We'll
>> also add more fields in the gene_member table / object to give a summary
>> of the number of orthologues, paralogues, etc
>>
>> I've had a look at the differences between the e75 and e76 API, and I
>> cannot see further changes than the methods that are already deprecated.
>> Data-wise, the only consequence is that the Family object will not
>> directly hold genes any more. Up to e75, $family->get_all_Members() used
>> to return both genes and proteins. From e76 onwards, it will only return
>> the proteins, and $family->get_all_GeneMembers() will return the genes.
>>
>> Hope this helps,
>> Matthieu
>>
>> On 20/05/14 23:46, Ed Gray wrote:
>> > Hi All,
>> >
>> > I am writing some code right now that uses Members, GeneMembers and
>> > SeqMembers. Is it true that the Member is being deprecated in the
>> > upcoming Ensembl 76 release?
>> >
>> > If so,
>> >
>> > 1) Almost all of the relevant examples use what may soon be deprecated
>> > code. FOr instance
>> > http://useast.ensembl.org/info/docs/api/compara/compara_tutorial.html has the stanza
>> > of code below just after a stanza example that has GeneMember objects.
>> > my $homology = $homologies->[0]; # take one of the homologies and look
>> > into it
>> > foreach my $member (@{$homology->get_all_Members}) {
>> > # each AlignedMember contains both the information on the SeqMember and in
>> > # relation to the homology
>> > print (join " ", map { $member->$_ } qw(stable_id taxon_id))."\n";
>> > print (join " ", map { $member->$_ } qw(perc_id perc_pos perc_cov))."\n";
>> > }
>> >
>> > 2) Will get_all_Members still be available after r75?
>> >
>> > 3) Is there any plan on adjusting the tutorials etc. for GeneMembers and
>> > SeqMembers?
>> >
>> > 4) How different will the Compara API be in r76? Again, I am currently
>> > writing code to try to take advantage of what I understand to be more
>> > Compara data in r76 but I am concerned the API will be different and I
>> > will have to re-write.
>> >
>> > So many questions, I know, but is there some guidance you could give on
>> > the topic of Compara API changes and deprecations coming in ensembl 76?
>> >
>> > Many, many thanks,
>> > Ed
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Dev mailing list Dev at ensembl.org
>> > Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> > Ensembl Blog: http://www.ensembl.info/
>> >
>>
>> --
>> Matthieu Muffato, Ph.D.
>> Ensembl Developer and Ensembl Compara Manager
>> European Bioinformatics Institute (EMBL-EBI)
>> European Molecular Biology Laboratory
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge, CB10 1SD, United Kingdom
>> Room A3-145
>> Phone + 44 (0) 1223 49 4631
>> Fax + 44 (0) 1223 49 4468
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>

-- 
Matthieu Muffato, Ph.D.
Ensembl Developer and Ensembl Compara Manager
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-145
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468




More information about the Dev mailing list