[ensembl-dev] variation schema
Andrea Edwards
edwardsa at cs.man.ac.uk
Tue Dec 21 21:06:01 GMT 2010
Hi
I have been reading about the variation database schema here
http://www.ensembl.org/info/docs/api/variation/variation_schema.html
but there is no information in this document about the database tables
that, based on their name, look like they deal with variation sets namely
*variation_set
*variation_set_structure
*variation_set_variation
These tables aren't on the pdf schema diagram either.
I was hoping i could get an explanation of these tables.
It looks as though variation_set is simply a variation set with a name
and description.
It looks then as if variation_set_variation is a simple link table to
resolve the many to many relationship between a variation and a
variation set. But if that is the case I don't know how you model the
alleles in a variation set such as the watson set.
For example a particular variation might be triallelic overall (e.g. in
every individual looked at) but variations in the the watson variation
can only be diploid at most. The table that normally describes the
alleles of a variation and their frequencies is allele. The allele
table links to a sample id so you which alleles occur for a variation in
a population and you know the frequency of a particular allele in that
population. The allele table doesn't seem to have any link to a
variation set.
It looks like there should be a link somewhere between a variation set
and a population/sample so that the allele table can still represent the
alleles/frequencies of a variation set
Or i could be guessing this all wrong. Either way, i would really
benefit from some data about the schema that models variation sets. And
I think I need ensembl's definition of a variation set (the POD simply
says This is a class representing a set of variations that are grouped
by e.g. study, method, quality measure etc.)
Kind regards
More information about the Dev
mailing list