[ensembl-dev] Duplicates in allele table

Will McLaren wm2 at ebi.ac.uk
Thu May 12 15:33:09 BST 2011


Hi Stuart,

This is an issue we're aware of and fixing for the next release.

It is safe to delete them, yes - if you know of a clever way of doing
this then please share (I've just spent 2 days dumping, splitting,
unique sorting and reimporting because our server runs out of tmp
space if I try and do a GROUP BY statement on a table this large)!

Thanks

Will

On 12 May 2011 15:25, Stuart Meacham <sm766 at cam.ac.uk> wrote:
> Hi,
>
> I wondered why there are duplicates in the allele table of the latest human
> variation database (homo_sapiens_variation_62_37g)?
>
> E.g.
>
> select allele_id, subsnp_id, allele, frequency, count from allele where
> variation_id = 25007232 and sample_id = 908;
>
> +-----------+-----------+--------+-----------+-------+
> | allele_id | subsnp_id | allele | frequency | count |
> +-----------+-----------+--------+-----------+-------+
> |  94403722 |  44080545 | A      |  0.808333 |    97 |
> | 476949391 |  44080545 | A      |  0.808333 |    97 |
> |  94403723 |  44080545 | T      |  0.191667 |    23 |
> | 476949392 |  44080545 | T      |  0.191667 |    23 |
> +-----------+-----------+--------+-----------+-------+
> 4 rows in set (0.00 sec)
>
> As you can see the only difference between the entries is the arbitrary
> allele_id. Is it 'safe' to delete duplicates where the only difference
> appears to be the allele_id?
>
> Cheers
>
> Stuart
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>




More information about the Dev mailing list