[ensembl-dev] Ensembl ID History Converter (IDmapper.pl) API Mapping Score Column
Andreas Kusalananda Kähäri
ak4 at sanger.ac.uk
Mon Jun 11 11:10:55 BST 2012
On Mon, Jun 11, 2012 at 09:53:02AM +0100, Andreas Kusalananda Kähäri wrote:
> Hi people,
> Actually, I'd like to say about the stable ID mapping scores that it's
> pointless to try to assign any significance to them when comparing
> between releases since the various mapping strategies, penalties etc.
> might have been tweaked.
> Also note that we did not store the mapping scores to start with (thus
> some of the the zeroes). I seem to remember that we once we had our
> stable ID mapping pipeline in place (it wasn't there from the start, but
> I can't recall now when it came into being, probably around 2005 or so)
> we might have tried to retroactively populate the stable_id_event table.
I just looked into the CVS logs and found that the stable_id_event table
was added to the schema in April 2003, but the score field wasn't added
until June 2006. This means that the scores before release 40 wouldn't
have been kept. It also means that mappings for releases earlier than
release 15 are certainly retrofitted somehow.
I'd also like to add a general disclaimer about looking at the stable ID
histories: You should never ever make the assumption that the actual
biological object (exon, transcript, gene, translation) is identical
in structure or function when the stable ID version (or the ID itself)
has changed between releases. When the stable ID version changes on a
gene, for example, something has changed in one of its transcripts (in
the transcript model, or in the underlying reference genome sequence,
for example). If this is important for your work, you need to convince
yourself that the object with the changed stable ID is still the one you
should be working on. The stable ID mapping is an automatic process and
it is using a scoring method based on exon overlaps and sequence exon
similarities to map the stable IDs. It gets it right in the vast number
of cases, but I'm sure there are one or two interesting corner cases
that very well might confuse it still (let us know if you find them).
> Some mappings might actually have been done manually by hand since then
> and the score might have been set to 1 (hopefully) or zero depending on
> who did it.
> On Thu, Jun 07, 2012 at 07:17:05PM +0100, Kieron Taylor wrote:
> > I'll catch this question for Andy.
> > > Thanks Andy, that is great!
> > >
> > > Though I am a little confused about some results where the score is "0".
> > > For example, with the example input included with the api (./IDmapper.pl
> > > -s human -f idmapper.in), one of the results is:
> > >
> > > Old stable ID, New stable ID, Release, Mapping score
> > > ENSG00000137361.1, ENSG00000137361.1, 3, 0
> > The mapping score of zero actually implies that no alignment had to be
> > run, so there is no score. This happens when the mapper discovers
> > something exactly where it expects it to be, and skips the alignment.
> > Scores of 1 are possible, but generally these are moppped up by the
> > pre-alignment checks.
> > Regards,
> > Kieron
> > _______________________________________________
> > Dev mailing list Dev at ensembl.org
> > List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> Andreas Kusalananda Kähäri
> Ensembl Gene Annotation Team
Andreas Kusalananda Kähäri
Ensembl Gene Annotation Team
More information about the Dev