[ensembl-dev] questions about variation schema
Andrea Edwards
edwardsa at cs.man.ac.uk
Mon Jan 10 18:26:01 GMT 2011
Hello and Happy New Year
I have some quick questions about the variation schema.
1. Allele table
When considering population frequency data for an allele, how do you
know which source it is from.
For example, imagine a SNP with alleles T/C that is described in say,
dbSNP and HGMD. The source id for the variation on the variation table
might be dbSNP and the variation would have a variation_synonym entry
for HGMD. Lets say both dbSNP and HGMD have population frequency data
for the variation which might look something like this.
Allele id
Variation id
Allele
Frequency
SampleID
1
1
T
1
14
2
1
C
0
14
3
1
T
0.5
15
4
1
C
0.5
15
In this case the dbSNP data is for population 14 and the HGMD is for
population 15 but how would you know from looking?
A sample isn't linked to the source that 'created' it so you can't tell
from the sample.
Also, what is the subsnp_id in the allele table?
2. What is subsnp_handle table?
3 Population genotype
What is the subnp_id field (might be answered by the previous question)?
Am i correct in saying this table doesn't provide the source of the data
(might also be answered by a previous question)?
4 Variation set
What is the source of a variation set? I believe variation sets are
defined by ensembl so i presume the source is implicitly ensembl?
I've made quite a detailed document about the variation schema which i
think might help other people like me learning the schema from scratch.
I'm more than happy to make it available if there is a mechanism to do so.
Thanks a lot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110110/b2d6ce75/attachment.html>
More information about the Dev
mailing list