[ensembl-dev] Duplicates in allele table

Patrick Meidl pmeidl at cemm.oeaw.ac.at
Thu May 12 15:43:48 BST 2011


On Thu, May 12 2011, Will McLaren <wm2 at ebi.ac.uk> wrote:

> It is safe to delete them, yes - if you know of a clever way of doing
> this then please share (I've just spent 2 days dumping, splitting,
> unique sorting and reimporting because our server runs out of tmp
> space if I try and do a GROUP BY statement on a table this large)!

in mysql, you can do this:

ALTER IGNORE TABLE allele
ADD UNIQUE INDEX (subsnp_id, allele, frequency, count);

this will only work if none of your columns contains NULLs. also, I
haven't tested this on huge table, so can't comment on performance.

HTH

    patrick

-- 
Patrick Meidl, Mag.
Bioinformatician

Ce-M-M-
Research Centre for Molecular Medicine
of the Austrian Academy of Science

Lazarettgasse 14 / AKH BT 25.3
Vienna, Austria

room 02.205
phone +43 1 40160 70016
email pmeidl at cemm.oeaw.ac.at
web http://www.cemm.at/





More information about the Dev mailing list