[ensembl-dev] questions about variation schema

Andrea Edwards edwardsa at cs.man.ac.uk
Mon Jan 10 18:26:01 GMT 2011


Hello and Happy New Year

I have some quick questions about the variation schema.

1. Allele table

When considering population frequency data for an allele, how do you 
know which source it is from.
For example, imagine a SNP with alleles T/C that is described in say, 
dbSNP and HGMD. The source id for the variation on the variation table 
might be dbSNP and the variation would have a variation_synonym entry 
for HGMD. Lets say both dbSNP and HGMD have population frequency data 
for the variation which might look something like this.

Allele id

	

Variation id

	

Allele

	

Frequency

	

SampleID

1

	

1

	

T

	

1

	

14

2

	

1

	

C

	

0

	

14

3

	

1

	

T

	

0.5

	

15

4

	

1

	

C

	

0.5

	

15

In this case the dbSNP data is for population 14 and the HGMD is for 
population 15 but how would you know from looking?
A sample isn't linked to the source that 'created' it so you can't tell 
from the sample.

Also, what is the subsnp_id in the allele table?


2. What is subsnp_handle table?

3 Population genotype
What is the subnp_id field (might be answered by the previous question)?
Am i correct in saying this table doesn't provide the source of the data 
(might also be answered by a previous question)?

4 Variation set
What is the source of a variation set? I believe variation sets are 
defined by ensembl so i presume the source is implicitly ensembl?


I've made quite a detailed document about the variation schema which i 
think might help other people like me learning the schema from scratch. 
I'm more than happy to make it available if there is a mechanism to do so.

Thanks a lot

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110110/b2d6ce75/attachment.html>


More information about the Dev mailing list