[ensembl-dev] variation schema

Andrea Edwards edwardsa at cs.man.ac.uk
Tue Dec 21 21:06:01 GMT 2010


Hi

I have been reading about the variation database schema here

http://www.ensembl.org/info/docs/api/variation/variation_schema.html

but there is no information in this document about the database tables 
that, based on their name, look like they deal with variation sets namely

*variation_set
*variation_set_structure
*variation_set_variation

These tables aren't on the pdf schema diagram either.

I was hoping i could get an explanation of these tables.

It looks as though variation_set is simply a variation set with a name 
and description.

It looks then as if variation_set_variation is a simple link table to 
resolve the many to many relationship between a variation and a 
variation set. But if that is the case I don't know how you model the 
alleles in a variation set such as the watson set.

For example a particular variation might be triallelic overall (e.g. in 
every individual looked at) but variations in the the watson variation 
can only be diploid at most. The table that normally describes the 
alleles of a variation and their frequencies  is allele. The allele 
table links to a sample id so you which alleles occur for a variation in 
a population and you know the frequency of a particular allele in that 
population. The allele table doesn't seem to have any link to a 
variation set.

It looks like there should be a link somewhere between a variation set 
and a population/sample so that the allele table can still represent the 
alleles/frequencies of a variation set

Or i could be guessing this all wrong. Either way, i would really 
benefit from some data about the schema that models variation sets. And 
I think I need  ensembl's definition of a variation set (the POD simply 
says This is a class representing a set of variations that are grouped 
by e.g. study, method, quality measure etc.)

Kind regards




More information about the Dev mailing list