[ensembl-dev] Patching an Ensembl compare database from 67 to 74

Jan Vogel jan.vogel at gmail.com
Fri Jan 24 00:56:15 GMT 2014


Hello Matthieu et al, 

thanks for the comments - I managed to patch the database, but used a slightly different SQL. 

Also thanks to the compara team and ensembl to supply these patch files - we're computing our own gene sets, however we're patching all in-house databases up to the most recent schema so we can take advantage of all the bug fixes and improvements. 

Regards, 

    Jan 



Here's my log, if someone is interested : 

Schema patch fails due to duplicated member_ids in the sequence_exon_bounded table:

    ALTER TABLE sequence_exon_bounded 
    DROP PRIMARY KEY, DROP KEY member_id, 
    DROP COLUMN sequence_exon_bounded_id, 
    ADD PRIMARY KEY (member_id)


There are some member_id duplicates in the table ... let's fix this - details in IGIS

Identify the max. number of duplications in the member_id group:

    SELECT sequence_exon_bounded_id , count(*)
    FROM sequence_exon_bounded GROUP BY member_id HAVING COUNT(*) > 1
    order by count(*) desc limit 3;

Create tmp table to store the sequence_exon_bounded_id identifiers for deletion:

    create table tmp_seb_id (sequence_exon_bounded_id int) ;  

    INSERT into tmp_seb_id 
    SELECT sequence_exon_bounded_id 
    FROM sequence_exon_bounded GROUP BY member_id HAVING COUNT(*) > 1;

    DELETE FROM seb 
    USING tmp_seb_id tmp_seb, sequence_exon_bounded seb 
    WHERE tmp_seb.sequence_exon_bounded_id = seb.sequence_exon_bounded_id;

Check:

    SELECT * from  tmp_seb_id tmp_seb, sequence_exon_bounded seb 
    WHERE tmp_seb.sequence_exon_bounded_id = seb.sequence_exon_bounded_id ; 

    DROP table tmp_seb_id; 



On Jan 17, 2014, at 9:49 AM, Matthieu Muffato <muffato at ebi.ac.uk> wrote:

> Dear Jan,
> 
> There should be a single "exon bounded" sequence for each member_id and the API would anyway select only one of them.
> You can remove the duplicated entries with this query:
> 
> DELETE FROM sequence_exon_bounded WHERE sequence_exon_bounded_id IN (SELECT sequence_exon_bounded_id FROM sequence_exon_bounded GROUP BY member_id HAVING COUNT(*) > 1);
> 
> You may have to run it several times if some member_ids have 3 or more sequences
> 
> 
> Regarding Will's comment: Compara indeed only started distributing the patches as individual patches with the according meta keys a few releases ago.
> 
> Best,
> Matthieu
> 
> On 17/01/14 10:08, Will Chow wrote:
>> Hi Jan
>> 
>> I noticed warnings as well patching from 66->73 for compara for a few
>> tables ( I don't quite remember which ones ).
>> 
>> Of course this doesn't really help you or solve your issues, but since
>> our group doesn't use those tables I just ignored the patch warnings and
>> for our purposes, it seems to work.   I do notice in the meta table the
>> patch keys for the updates are not there below e70 even though the
>> patches are in the sql directory.
>> 
>> again doesn't really help you I guess.
>> 
>> Will
>> 
>> On 17 Jan 2014, at 06:49, Jan Vogel <jan.vogel at gmail.com
>> <mailto:jan.vogel at gmail.com>> wrote:
>> 
>>> 
>>> Hi there,
>>> 
>>> i'm trying to patch our ensembl 67 compara database up to schema 74.
>>> 
>>> It fails :-( when applying the
>>> *ensembl-compara/sql/patch_67_68.sql* patch as in the 67 schema there
>>> are duplicate member ids in the sequence_exon_bounded table. Any idea
>>> how to fix this and patch my e67 up to e68 ?
>>> 
>>> Thanks,
>>> 
>>>   Jan Vogel
>>> 
>>> 
>>> ERROR 1062 (23000) at line 28 in file:
>>> '/gnet/resfiles3/resapps/apps/ensembl/ensembl-74/ensembl-compara/sql/patch_67_68.sql':
>>> Duplicate entry '2012449' for key 'PRIMARY' :
>>> 
>>> --------------
>>> SET session sql_mode='TRADITIONAL'
>>> --------------
>>> ……
>>> …..
>>> 
>>> 
>>> --------------
>>> ALTER TABLE sequence_exon_bounded
>>> DROP PRIMARY KEY, DROP KEY member_id,
>>> DROP COLUMN sequence_exon_bounded_id,
>>> ADD PRIMARY KEY (member_id)
>>> --------------
>>> 
>>> Failed to apply patch 'patch_67_68.sql' to database 'ensembl_compara_67'!
>>> In non-interative mode; aborting current run at schema_patcher.pl line
>>> 643.
>>> 
>>> 
>>> Looking into the sequence_exon_bounded table, it looks that some
>>> member ids are duplicated:
>>> 
>>> mysql> select count(distinct(member_id))  from sequence_exon_bounded
>>> limit 10 ;
>>> +----------------------------+
>>> | count(distinct(member_id)) |
>>> +----------------------------+
>>> |                    1371152 |
>>> +----------------------------+
>>> 1 row in set (10.55 sec)
>>> 
>>> mysql> select count(member_id)  from sequence_exon_bounded limit 10 ;
>>> +------------------+
>>> | count(member_id) |
>>> +------------------+
>>> |          1373719 |
>>> +------------------+
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>> 
> 
> 
> -- 
> Matthieu Muffato, Ph.D.
> Ensembl Developer and Ensembl Compara Manager
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Wellcome Trust Genome Campus, Hinxton
> Cambridge, CB10 1SD, United Kingdom
> Room  A3-145
> Phone + 44 (0) 1223 49 4631
> Fax   + 44 (0) 1223 49 4468
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140123/071f8ed9/attachment.html>


More information about the Dev mailing list