[ensembl-dev] Members, GeneMembers and SeqMembers in Ensembl 76

Matthieu Muffato muffato at ebi.ac.uk
Mon Jun 2 12:09:19 BST 2014


Hi Ed

get_all_Members() is not deprecated and won't be removed. There 
shouldn't be any warnings to change it. It does actually return 
SeqMembers, but changing its name was too invasive, I think. There is an 
additional get_all_GeneMembers() to return their genes.

The perc_*() methods are (were) only available on the proteins / 
transcripts that are aligned, and used in our resources. Currently, they 
only return non-undef on the homology members, but we can probably 
define them to gene-tree leaves and family members as well. Not sure 
that can be done in time for e76, but I'll let you know.

As perc_*() were never available on genes, there is no old code that 
would do that, and that we should support.
Currently, it is not possible to define the perc_*() methods on genes 
generally, because we cannot know the homology to which species the use 
would refer to. But if a user does get_all_GeneMembers(), it is 
difficult to link the genes back to their sequences. There should indeed 
be a way of getting the perc_* values directly. Thank you for reporting 
that.

Also on the list of changes to make in the API: We can leave methods 
that have been renamed (like get_canonical_Member() ) a bit longer to 
facilitate the transition.

The e76 API only works with the e76 database, so I'm afraid testing it 
won't be possible until the public release of e76, then.

Matthieu

On 02/06/14 02:05, Ed Gray wrote:
> I think you understand.
>
> I guess what I am suggesting is that the GeneMembers allow for the
> perc_id() (and similar methods).  Sure they could return undef, but a
> better solution might be for them to return the
> get_canonical_SeqMember->perc_id().  Thta way all the old code that is
> on the internet will continue to work, and work generally as expected
> since many working with phylogenetics want the canonical cDNA sequence
> for coding and gDNA for non-coding genes.
>
> With the warnings in place, many will replace the get_all_Members with
> get_all_GeneMembers not get_all_SeqMembers.
>
> Regarding testing, we don't clone the database, we use the API, so if
> you are interested in having someone run the api prior to its release,
> we'd be happy to accomodate you.
>
>> Date: Sun, 1 Jun 2014 22:36:17 +0100
>> From: muffato at ebi.ac.uk
>> To: dev at ensembl.org
>> Subject: Re: [ensembl-dev] Members, GeneMembers and SeqMembers in Ensembl 76
>>
>> Hi Ed,
>>
>> I'm not sure I'm following you.
>> There already is a get_canonical_SeqMember() method. It was introduced
>> in e71 to replace the three previous methods
>> get_canonical_peptide_Member() get_canonical_Member() and
>> get_canonical_transcript_Member(), which were all doing the same thing
>> in the end, It returns the SeqMember that is used in the gene-trees.
>> Are you suggesting to keep get_canonical_Member() longer than scheduled
>> to still allow old code to work ?
>>
>> Methods such as perc_id(), perc_pos(), etc, are most useful for
>> homologies. The homology object will still return SeqMembers when you
>> call get_all_Members(), and they will still have those methods defined.
>> Those 3 methods are actually also available for gene-tree and family
>> members, but I think they return undef (they've probably always had)
>>
>> There is no e76 database available yet, but it is still possible to
>> create your own 76-like database by patching the e75 one. All the SQL
>> patches should be on git ("master" branch)
>>
>> Best,
>> Matthieu
>>
>> On 23/05/14 18:39, Ed Gray wrote:
>> > Thanks Matthieu, that was a very helpful response. No need to feel
>> > defensive, I understand the reasons behind separating Gene and Seq
>> > Members and I support the change.
>> >
>> > I guess the only suggestion I might have is to implement some amount of
>> > backwards compatibility by using the GeneMember's
>> > get_canonical_SeqMember to give values for SeqMember methods that are
>> > not implemented in GeneMember. At least that way, old code works,
>> > albeit using the default canonical seqmember. I'd think that would
>> > suffice in most cases, and providing a perc_id, perc_pos, perc_cov, etc.
>> > etc. for GeneMembers.
>> >
>> > Frankly, I believe most folks just want the default canonical sequence
>> > rather than having to leaf through transcripts and try to figure out
>> > which is the best (e.g. all exons covered and ordered, with the
>> > 'longest' sequence length yet not 'too long'). That is always a pain to
>> > leaf through in code and the decisions we make based on the data in one
>> > species often do not hold in another species. I think most of us
>> > comparative genomics types would prefer to accept EMBLs call on what is
>> > the best representation of the canonical sequence (amino acids for
>> > proteiin coding or mRNA for ncRNA genes) for a specific gene in a
>> > specific species.
>> >
>> > Is that a good idea to both provide aome backward compatibility while
>> > providing the use base with a real feature whereby default transcript
>> > sequences can be obtained?
>> >
>> >> Date: Wed, 21 May 2014 18:55:31 +0100
>> >> From: muffato at ebi.ac.uk
>> >> To: dev at ensembl.org
>> >> Subject: Re: [ensembl-dev] Members, GeneMembers and SeqMembers in Ensembl 76
>> >>
>> >> Dear Ed,
>> >>
>> >> Yes, we are changing the compara API to use two specialized versions of
>> >> Member (GeneMember and SeqMember) depending on the context (genes or
>> >> gene products, i.e. ncRNAs and proteins). The Member adaptor is
>> >> deprecated, as the preferred way will now be GeneMember and SeqMember
>> >> adaptors. The Member object will still there as a base class of
>> >> GeneMember and SeqMember, and some methods that can work with both kinds
>> >> of members will still be around, and have the unspecialized word Member
>> >> in their name (like get_all_Members, fetch_all_by_Member, etc)
>> >>
>> >> The tutorials should be already using the right adaptors. However, some
>> >> variables are still named $member, and it's probably clearer to use
>> >> $seq_member or $gene_member instead. We still have to update the
>> >> documentation of some methods to clearly state what kind of member
>> >> they're dealing with.
>> >>
>> >> You can write your code using v75 API / documentation. Only the methods
>> >> that print a deprecation warning may be removed in e76. If they don't
>> >> complain, it means that they'll still be available in e76.
>> >>
>> >> The change is declared in a very concise manner in our declaration of
>> >> intentions: "Split member into seq_member and gene_member + members
>> >> depend on dnafrags" http://admin.ensembl.org/Changelog/Summary
>> >> We want to split the table that holds the members into two tables
>> >> (seq_member and gene_member). We've first changed the object model in
>> >> e71 (Apr 2013) and set up all the deprecation warnings for a final
>> >> removal in e76. We're also using this opportunity to link the two member
>> >> tables to the dnafrag table, which is used for genomic alignments. The
>> >> goal is to provide smoother links within the compara resources. We'll
>> >> also add more fields in the gene_member table / object to give a summary
>> >> of the number of orthologues, paralogues, etc
>> >>
>> >> I've had a look at the differences between the e75 and e76 API, and I
>> >> cannot see further changes than the methods that are already deprecated.
>> >> Data-wise, the only consequence is that the Family object will not
>> >> directly hold genes any more. Up to e75, $family->get_all_Members() used
>> >> to return both genes and proteins. From e76 onwards, it will only return
>> >> the proteins, and $family->get_all_GeneMembers() will return the genes.
>> >>
>> >> Hope this helps,
>> >> Matthieu
>> >>
>> >> On 20/05/14 23:46, Ed Gray wrote:
>> >> > Hi All,
>> >> >
>> >> > I am writing some code right now that uses Members, GeneMembers and
>> >> > SeqMembers. Is it true that the Member is being deprecated in the
>> >> > upcoming Ensembl 76 release?
>> >> >
>> >> > If so,
>> >> >
>> >> > 1) Almost all of the relevant examples use what may soon be deprecated
>> >> > code. FOr instance
>> >> > http://useast.ensembl.org/info/docs/api/compara/compara_tutorial.html has the stanza
>> >> > of code below just after a stanza example that has GeneMember objects.
>> >> > my $homology = $homologies->[0]; # take one of the homologies and look
>> >> > into it
>> >> > foreach my $member (@{$homology->get_all_Members}) {
>> >> > # each AlignedMember contains both the information on the SeqMember and in
>> >> > # relation to the homology
>> >> > print (join " ", map { $member->$_ } qw(stable_id taxon_id))."\n";
>> >> > print (join " ", map { $member->$_ } qw(perc_id perc_pos perc_cov))."\n";
>> >> > }
>> >> >
>> >> > 2) Will get_all_Members still be available after r75?
>> >> >
>> >> > 3) Is there any plan on adjusting the tutorials etc. for GeneMembers and
>> >> > SeqMembers?
>> >> >
>> >> > 4) How different will the Compara API be in r76? Again, I am currently
>> >> > writing code to try to take advantage of what I understand to be more
>> >> > Compara data in r76 but I am concerned the API will be different and I
>> >> > will have to re-write.
>> >> >
>> >> > So many questions, I know, but is there some guidance you could give on
>> >> > the topic of Compara API changes and deprecations coming in ensembl 76?
>> >> >
>> >> > Many, many thanks,
>> >> > Ed
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Dev mailing list Dev at ensembl.org
>> >> > Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> >> > Ensembl Blog: http://www.ensembl.info/
>> >> >
>> >>
>> >> --
>> >> Matthieu Muffato, Ph.D.
>> >> Ensembl Developer and Ensembl Compara Manager
>> >> European Bioinformatics Institute (EMBL-EBI)
>> >> European Molecular Biology Laboratory
>> >> Wellcome Trust Genome Campus, Hinxton
>> >> Cambridge, CB10 1SD, United Kingdom
>> >> Room A3-145
>> >> Phone + 44 (0) 1223 49 4631
>> >> Fax + 44 (0) 1223 49 4468
>> >>
>> >> _______________________________________________
>> >> Dev mailing list Dev at ensembl.org
>> >> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> >> Ensembl Blog: http://www.ensembl.info/
>> >
>> >
>> > _______________________________________________
>> > Dev mailing list Dev at ensembl.org
>> > Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> > Ensembl Blog: http://www.ensembl.info/
>> >
>>
>> --
>> Matthieu Muffato, Ph.D.
>> Ensembl Developer and Ensembl Compara Manager
>> European Bioinformatics Institute (EMBL-EBI)
>> European Molecular Biology Laboratory
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge, CB10 1SD, United Kingdom
>> Room A3-145
>> Phone + 44 (0) 1223 49 4631
>> Fax + 44 (0) 1223 49 4468
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>

-- 
Matthieu Muffato, Ph.D.
Ensembl Developer and Ensembl Compara Manager
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-145
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468




More information about the Dev mailing list