[ensembl-dev] Duplicates in allele table

Stuart Meacham sm766 at cam.ac.uk
Thu May 12 15:49:24 BST 2011


On 12/05/11 15:43, Patrick Meidl wrote:
> On Thu, May 12 2011, Will McLaren<wm2 at ebi.ac.uk>  wrote:
>
>> It is safe to delete them, yes - if you know of a clever way of doing
>> this then please share (I've just spent 2 days dumping, splitting,
>> unique sorting and reimporting because our server runs out of tmp
>> space if I try and do a GROUP BY statement on a table this large)!
>
> in mysql, you can do this:
>
> ALTER IGNORE TABLE allele
> ADD UNIQUE INDEX (subsnp_id, allele, frequency, count);
>
> this will only work if none of your columns contains NULLs. also, I
> haven't tested this on huge table, so can't comment on performance.
>

Unless you have a nuclear powered server cluster I suspect this will 
take days on the allele table (>500 Million entries). But unfortunately 
I have no better suggestions!




More information about the Dev mailing list