[ensembl-dev] Duplicates in allele table

Will McLaren wm2 at ebi.ac.uk
Thu May 12 16:07:35 BST 2011


Unfortunately any of those columns can potentially be NULL so that
solution is a no-go.

Thanks for the input anyway.

Will

On 12 May 2011 15:49, Stuart Meacham <sm766 at cam.ac.uk> wrote:
> On 12/05/11 15:43, Patrick Meidl wrote:
>>
>> On Thu, May 12 2011, Will McLaren<wm2 at ebi.ac.uk>  wrote:
>>
>>> It is safe to delete them, yes - if you know of a clever way of doing
>>> this then please share (I've just spent 2 days dumping, splitting,
>>> unique sorting and reimporting because our server runs out of tmp
>>> space if I try and do a GROUP BY statement on a table this large)!
>>
>> in mysql, you can do this:
>>
>> ALTER IGNORE TABLE allele
>> ADD UNIQUE INDEX (subsnp_id, allele, frequency, count);
>>
>> this will only work if none of your columns contains NULLs. also, I
>> haven't tested this on huge table, so can't comment on performance.
>>
>
> Unless you have a nuclear powered server cluster I suspect this will take
> days on the allele table (>500 Million entries). But unfortunately I have no
> better suggestions!
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>




More information about the Dev mailing list