[ensembl-dev] Duplicates in allele table

Thu May 12 15:25:57 BST 2011

Hi,

I wondered why there are duplicates in the allele table of the latest 
human variation database (homo_sapiens_variation_62_37g)?

E.g.

select allele_id, subsnp_id, allele, frequency, count from allele where 
variation_id = 25007232 and sample_id = 908;

+-----------+-----------+--------+-----------+-------+
| allele_id | subsnp_id | allele | frequency | count |
+-----------+-----------+--------+-----------+-------+
|  94403722 |  44080545 | A      |  0.808333 |    97 |
| 476949391 |  44080545 | A      |  0.808333 |    97 |
|  94403723 |  44080545 | T      |  0.191667 |    23 |
| 476949392 |  44080545 | T      |  0.191667 |    23 |
+-----------+-----------+--------+-----------+-------+
4 rows in set (0.00 sec)

As you can see the only difference between the entries is the arbitrary 
allele_id. Is it 'safe' to delete duplicates where the only difference 
appears to be the allele_id?

Cheers

Stuart