[ensembl-dev] variation failed description

Graham Ritchie grsr at ebi.ac.uk
Sun Jan 16 15:19:05 GMT 2011


Hi Andrea

> Don't you import most of your variations from dbSNP though?
> Are you saying then, that when you import a variation from dbSNP you import the alleles and the locus but not the consequences? I've been told previously you perform checks on the alleles (e.g. make sure at least one of them matches the reference). Once you have imported the alleles you work out the consequences yourself then? So, for example, in the case of a non synonymous snp you wouldn't import out the amino acid alleles corresponding to the nucleotide alleles but would determine them yourselves?

Yes, that's right. We calculate the consequences (using the same code that the VEP uses) of the variations we import from dbSNP (and elsewhere) with respect to the ensembl gene models.

Cheers,

Graham


> That certainly suits me if you do that as it means I can trust the data if ensembl generated it. I have less confidence in other sources; ensembl seems to be the gold standard i think.
> 
> Cheers
> 
> On 16/01/2011 11:14, Graham Ritchie wrote:
>> Hi Andrea,
>> 
>> We call variations as non-synonymous and work out the alternative peptide sequences ourselves based on the alleles we import and the ensembl gene set, we don't import them from anywhere else. We only apply our sanity checks to data we import from outside sources.
>> 
>> Cheers,
>> 
>> Graham
>> 
>> 
>> On 15 Jan 2011, at 00:22, Andrea Edwards wrote:
>> 
>>> Hi
>>> 
>>> I believe when ensembl imports variations it makes certain sanity checks on the variation and classifies the variation as ok or failed. The failed descriptions in the database are:
>>> 
>>> mysql>  select distinct description from failed_description;
>>> +--------------------------------------------------------+
>>> | description                                            |
>>> +--------------------------------------------------------+
>>> | Variation maps to more than 3 different locations      |
>>> | None of the variant alleles match the reference allele |
>>> | Variation has more than 3 different alleles            |
>>> | Loci with no observed variant alleles in dbSNP         |
>>> | Variation does not map to the genome                   |
>>> | Variation has no associated sequence                   |
>>> +--------------------------------------------------------+
>>> 6 rows in set (0.06 sec)
>>> 
>>> None of these descriptions suggest that non synonymous coding snps are checked to see if:
>>> a)  the reference amino acid in the amino acid allele string is the correct amino acid at that position in the protein product
>>> b) the alternative allele would indeed produce the non synonymous amino acid specified
>>> 
>>> Is this correct? I am trying to establish if i need to perform these checks myself.
>>> 
>>> Thanks
>>> 
>>> _______________________________________________
>>> Dev mailing list
>>> Dev at ensembl.org
>>> http://lists.ensembl.org/mailman/listinfo/dev





More information about the Dev mailing list