[ensembl-dev] Duplicates in allele table
wm2 at ebi.ac.uk
Thu May 12 16:07:35 BST 2011
Unfortunately any of those columns can potentially be NULL so that
solution is a no-go.
Thanks for the input anyway.
On 12 May 2011 15:49, Stuart Meacham <sm766 at cam.ac.uk> wrote:
> On 12/05/11 15:43, Patrick Meidl wrote:
>> On Thu, May 12 2011, Will McLaren<wm2 at ebi.ac.uk> wrote:
>>> It is safe to delete them, yes - if you know of a clever way of doing
>>> this then please share (I've just spent 2 days dumping, splitting,
>>> unique sorting and reimporting because our server runs out of tmp
>>> space if I try and do a GROUP BY statement on a table this large)!
>> in mysql, you can do this:
>> ALTER IGNORE TABLE allele
>> ADD UNIQUE INDEX (subsnp_id, allele, frequency, count);
>> this will only work if none of your columns contains NULLs. also, I
>> haven't tested this on huge table, so can't comment on performance.
> Unless you have a nuclear powered server cluster I suspect this will take
> days on the allele table (>500 Million entries). But unfortunately I have no
> better suggestions!
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev