[ensembl-dev] Ensembl ID History Converter (IDmapper.pl) API inconsistent with web tool

Lucas Swanson lswanson at bcgsc.ca
Fri Jun 1 18:48:20 BST 2012


Thank you so much Andy! I will attempt to incorporate this into my 
script (and replace the hard-coded gene ID with reading a list from a file).

~Lucas

Andy Yates wrote:
> Hi Lucas,
>
> This is possible but you do have to jump through a few hoops. I've tried to stub something up but my knowledge of this section of the API is limited. Anyway it seems to return the right answer at the moment so it's probably very close to what you want. The reports that the transcript associated with ENSG00000164012 are now linked to ENSG00000243710. Looking at the archive sites against live it seems that ENSG00000164012 has been subsumed by ENSG00000243710 but you will have to do your own QC on these mappings to ensure they are believable.
>
> http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000243710;r=1:43578925-43778924
>
> http://nov2010.archive.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000164012;r=1:43597791-43797790
>
> All the best,
>
> Andy
>
> #####################################################################################
>
> use strict;
> use warnings;
>
> use Bio::EnsEMBL::Registry;
> Bio::EnsEMBL::Registry->load_registry_from_db(
> -HOST => 'ensembldb.ensembl.org',
> -PORT => 5306,
> -USER => 'anonymous',
> -VERBOSE => 0,
> -DB_VERSION => 67
> );
>
> my $dba = Bio::EnsEMBL::Registry->get_DBAdaptor('human', 'core');
> my $asia = $dba->get_ArchiveStableIdAdaptor();
> my $mc = $dba->get_MetaContainer();
> my $ta = $dba->get_TranscriptAdaptor();
>
> my $release = $mc->get_schema_version();
> my $original_id = 'ENSG00000164012';
> # Get our original ID
> my $archive_id = $asia->fetch_by_stable_id($original_id);
> # Then get the associated archive info which tells us what Transcripts it had at retirement
> my $archived_info = $archive_id->get_all_associated_archived();
> my %possible_new_genes;
> foreach my $associated (@{$archived_info}) {
>   my ($arch_gene, $arch_tr, $arch_tl, $pep_seq) = @{$associated};
>   my $successor = $arch_tr->get_latest_incarnation();
>   if($successor->release() == $release) {
>     my $live_transcript = $ta->fetch_by_stable_id($successor->stable_id());
>     my $gene = $live_transcript->get_Gene();
>     $possible_new_genes{$gene->stable_id()} = $gene;
>   }
> }
>
> foreach my $id (keys %possible_new_genes) {
>   printf("%d | %s -> %s\n", $release, $original_id, $id);
> }
>
> #####################################################################################
>
>
> Andrew Yates                   Ensembl Core Software Project Leader
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensembl.org/
>
> On 31 May 2012, at 17:21, Lucas Swanson wrote:
>
>   
>> Thank you Andy, that is very informative!
>>
>> Is there any chance that you could provide any guidance on how I could programmatically use the associated subidentifiers to discover the correct gene ID in the current release, starting from only the old gene id?
>>
>> i.e. Is there an API for going from old gene ID "ENSG00000164012" to transcript ID "ENST00000372492" to new current ID "ENSG00000243710"
>>
>> ~Thanks,
>> Lucas Swanson
>>
>> Andy Yates wrote:
>>     
>>> Hi Lucas,
>>>
>>> There is a slight difference in what these two tools are attempting to convey. When using the IDMapper.pl script we report the following output:
>>>
>>> "The history ends with either the current release or at the point when the stable ID was retired."
>>>
>>> The web interface is attempting to report the ID history of any related stable id to the one you gave it. If you look at the history for your query ID (http://www.ensembl.org/Homo_sapiens/Gene/Idhistory?db=core;g=ENSG00000164012) you can see there was a mapping from ENSG00000117395 to ENSG00000164012 back in releases 7 and 10 of Ensembl. That does not mean your ID has become ENSG00000117395 just that it was involved in its past. The IDMapper.pl script reports the history of just your stable identifier which was unable to be mapped to another gene. 
>>> Should you need to continue trying to map these retired IDs I would suggest using the gene archive data and try searching for the sub-identifiers (transcripts/translations) and you may find more success with this approach. e.g.
>>>
>>> Gene ENSG00000164012 had a transcript ID ENST00000372492 which has now been reused in the gene ENSG00000243710 (WDR65).
>>>
>>> All the best,
>>>
>>> Andy
>>>
>>> Andrew Yates                   Ensembl Core Software Project Leader
>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>> Cambridge CB10 1SD, UK         http://www.ensembl.org/
>>>
>>> On 29 May 2012, at 19:08, Lucas Swanson wrote:
>>>
>>>  
>>>       
>>>> Hello,
>>>>
>>>> I have a large number of old Ensembl gene IDs for which I want to determine the corresponding IDs in the latest release. The Ensembl FAQ pointed me to the ID History converter. However, I am noticing some inconsistencies between the web tool and the downloaded API (which I will need to use since I have more than 30 ids).
>>>>
>>>> The API version used is 67
>>>> $ echo "ENSG00000164012" | /home/lswanson/ensembl/id_history_converter/IDmapper.pl -s human
>>>> Old stable ID, New stable ID, Release, Mapping score
>>>> ENSG00000164012.1, ENSG00000164012.1, 7, 0
>>>> ENSG00000164012.1, ENSG00000164012.1, 10, 0
>>>> ENSG00000164012.1, ENSG00000164012.1, 14, 0
>>>> ENSG00000164012.1, ENSG00000164012.1, 15, 0
>>>> ENSG00000164012.1, ENSG00000164012.1, 18.2, 0
>>>> ENSG00000164012.1, ENSG00000164012.1, 18.1, 0
>>>> ENSG00000164012.1, ENSG00000164012.1, 21, 0
>>>> ENSG00000164012.1, ENSG00000164012.1, 26, 0
>>>> ENSG00000164012.1, ENSG00000164012.1, 27, 0
>>>> ENSG00000164012.1, ENSG00000164012.2, 38, 0
>>>> ENSG00000164012.2, ENSG00000164012.3, 40, 0.666667
>>>> ENSG00000164012.3, ENSG00000164012.4, 55, 0.586106
>>>> ENSG00000164012.4, ENSG00000164012.5, 56, 0.738542
>>>> ENSG00000164012.5, <retired>, 61, 0
>>>>
>>>> But when I enter ENSG00000164012 into the web tool, I get the results in the attached screenshot.
>>>>
>>>> Why does the API not return any mention of "ENSG00000117395", which is what I would like since it is actually still active, while "ENSG00000164012" has been retired?
>>>>
>>>> Also, I am uncertain about the "Mapping score" column in the API output. What does it represent, and is a higher number better, or is a lower number better?
>>>>
>>>> ~Thank you,
>>>> Lucas Swanson
>>>>
>>>> <id_history_converter.png>_______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>    
>>>>         
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>  
>>>       
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>     
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>   





More information about the Dev mailing list