[ensembl-dev] Patching an Ensembl compare database from 67 to 74
Jan Vogel
jan.vogel at gmail.com
Fri Jan 24 00:56:15 GMT 2014
Hello Matthieu et al,
thanks for the comments - I managed to patch the database, but used a slightly different SQL.
Also thanks to the compara team and ensembl to supply these patch files - we're computing our own gene sets, however we're patching all in-house databases up to the most recent schema so we can take advantage of all the bug fixes and improvements.
Regards,
Jan
Here's my log, if someone is interested :
Schema patch fails due to duplicated member_ids in the sequence_exon_bounded table:
ALTER TABLE sequence_exon_bounded
DROP PRIMARY KEY, DROP KEY member_id,
DROP COLUMN sequence_exon_bounded_id,
ADD PRIMARY KEY (member_id)
There are some member_id duplicates in the table ... let's fix this - details in IGIS
Identify the max. number of duplications in the member_id group:
SELECT sequence_exon_bounded_id , count(*)
FROM sequence_exon_bounded GROUP BY member_id HAVING COUNT(*) > 1
order by count(*) desc limit 3;
Create tmp table to store the sequence_exon_bounded_id identifiers for deletion:
create table tmp_seb_id (sequence_exon_bounded_id int) ;
INSERT into tmp_seb_id
SELECT sequence_exon_bounded_id
FROM sequence_exon_bounded GROUP BY member_id HAVING COUNT(*) > 1;
DELETE FROM seb
USING tmp_seb_id tmp_seb, sequence_exon_bounded seb
WHERE tmp_seb.sequence_exon_bounded_id = seb.sequence_exon_bounded_id;
Check:
SELECT * from tmp_seb_id tmp_seb, sequence_exon_bounded seb
WHERE tmp_seb.sequence_exon_bounded_id = seb.sequence_exon_bounded_id ;
DROP table tmp_seb_id;
On Jan 17, 2014, at 9:49 AM, Matthieu Muffato <muffato at ebi.ac.uk> wrote:
> Dear Jan,
>
> There should be a single "exon bounded" sequence for each member_id and the API would anyway select only one of them.
> You can remove the duplicated entries with this query:
>
> DELETE FROM sequence_exon_bounded WHERE sequence_exon_bounded_id IN (SELECT sequence_exon_bounded_id FROM sequence_exon_bounded GROUP BY member_id HAVING COUNT(*) > 1);
>
> You may have to run it several times if some member_ids have 3 or more sequences
>
>
> Regarding Will's comment: Compara indeed only started distributing the patches as individual patches with the according meta keys a few releases ago.
>
> Best,
> Matthieu
>
> On 17/01/14 10:08, Will Chow wrote:
>> Hi Jan
>>
>> I noticed warnings as well patching from 66->73 for compara for a few
>> tables ( I don't quite remember which ones ).
>>
>> Of course this doesn't really help you or solve your issues, but since
>> our group doesn't use those tables I just ignored the patch warnings and
>> for our purposes, it seems to work. I do notice in the meta table the
>> patch keys for the updates are not there below e70 even though the
>> patches are in the sql directory.
>>
>> again doesn't really help you I guess.
>>
>> Will
>>
>> On 17 Jan 2014, at 06:49, Jan Vogel <jan.vogel at gmail.com
>> <mailto:jan.vogel at gmail.com>> wrote:
>>
>>>
>>> Hi there,
>>>
>>> i'm trying to patch our ensembl 67 compara database up to schema 74.
>>>
>>> It fails :-( when applying the
>>> *ensembl-compara/sql/patch_67_68.sql* patch as in the 67 schema there
>>> are duplicate member ids in the sequence_exon_bounded table. Any idea
>>> how to fix this and patch my e67 up to e68 ?
>>>
>>> Thanks,
>>>
>>> Jan Vogel
>>>
>>>
>>> ERROR 1062 (23000) at line 28 in file:
>>> '/gnet/resfiles3/resapps/apps/ensembl/ensembl-74/ensembl-compara/sql/patch_67_68.sql':
>>> Duplicate entry '2012449' for key 'PRIMARY' :
>>>
>>> --------------
>>> SET session sql_mode='TRADITIONAL'
>>> --------------
>>> ……
>>> …..
>>>
>>>
>>> --------------
>>> ALTER TABLE sequence_exon_bounded
>>> DROP PRIMARY KEY, DROP KEY member_id,
>>> DROP COLUMN sequence_exon_bounded_id,
>>> ADD PRIMARY KEY (member_id)
>>> --------------
>>>
>>> Failed to apply patch 'patch_67_68.sql' to database 'ensembl_compara_67'!
>>> In non-interative mode; aborting current run at schema_patcher.pl line
>>> 643.
>>>
>>>
>>> Looking into the sequence_exon_bounded table, it looks that some
>>> member ids are duplicated:
>>>
>>> mysql> select count(distinct(member_id)) from sequence_exon_bounded
>>> limit 10 ;
>>> +----------------------------+
>>> | count(distinct(member_id)) |
>>> +----------------------------+
>>> | 1371152 |
>>> +----------------------------+
>>> 1 row in set (10.55 sec)
>>>
>>> mysql> select count(member_id) from sequence_exon_bounded limit 10 ;
>>> +------------------+
>>> | count(member_id) |
>>> +------------------+
>>> | 1373719 |
>>> +------------------+
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
>
> --
> Matthieu Muffato, Ph.D.
> Ensembl Developer and Ensembl Compara Manager
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Wellcome Trust Genome Campus, Hinxton
> Cambridge, CB10 1SD, United Kingdom
> Room A3-145
> Phone + 44 (0) 1223 49 4631
> Fax + 44 (0) 1223 49 4468
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140123/071f8ed9/attachment.html>
More information about the Dev
mailing list