[ensembl-dev] what are variation synonyms

Andrea Edwards edwardsa at cs.man.ac.uk
Wed Mar 9 23:40:45 GMT 2011


Hello

Can i confirm what exactly variation synonyms are? I thought they 
represented entries for the same variation in different sources but when 
i look at the variation synonym table most synonyms seem to have a 
source of dbSNP (i.e. they have the same source as the original 
variation). Variations in dbSNP from the same chromosome location are 
generally merged into one rs cluster so the notion of a synonym in dbSNP 
doesnt really exist. The only thing i have ever come across in dbSNP 
that is like a synonym is where 2 submitted variations have flanking 
sequences of different lengths and are indentical in overlapping regions 
but the snp with a shorting flanking sequence can map to multiple 
genomic locations whereas the variation with a longer sequence is mapped 
to only one location. In this case the variations have not been merged 
into one entry.
However there are 7.6 million dbSNP variation synomyms in ensembl so I 
don't think they all represent the scenario i have just described

mysql> select count(*) from variation_synonym vs inner join variation v 
on v.var
iation_id = vs.variation_id where vs.source_id = 1 and v.source_id = 1;
7.6 million results returned

Also in the biomart snp_61 database i notice some of the fields for the 
main snp table (hsapiens_snp__variation__main) are something like
variation_synonyn_OMIM_bool
which i am presuming is a boolean field to specify whether the snp has 
an entry in the variation synonym table whose source is OMIM

However many of the fields have a name like
variation_synonym_DGV.....bool (e.g. variation_synonym_DGVaestd21_bool)
And there are corresponding dimension tables for each field with this 
type of name

Could you tell me what these fields represent as I'm don't know what  
the connection is  between  SNPs and DGV variants. The terms 
'variation_synonym' in the field name also seem a bit misleading as 
there are no SNPs in DGV. Interestingly there are no filters related to 
DGV on the biomart web interface for the snp dataset within the human 
variation database so I couldnt work out what these fields might be from 
biomart.

thanks very much

On 17/02/2011 23:40, Pontus Larsson wrote:
> Hi Andrea,
>
> The data we import from OMIM are annotations of phenotypes associated 
> with dbSNP variations and as such, they are stored in the 
> variation_annotation table (it is neither independent variations nor 
> synonyms for existing variations so we don't store it in the variation 
> or variation_synonym table).
>
> There is some support in the API for working with these, you may want 
> to take a look at the VariationAnnotation and related modules.
>
> As you have noticed, there is also a variation set for variations with 
> OMIM phenotype annotations (this variation set is a subset of the 
> 'Phenotype-associated variations' set). For the task you want to do, 
> the best approach is probably what you already suggested: to get the 
> variations in this variation set and intersect it with your list of 
> variations.
>
> Best regards
> /Pontus
>
>
> 2011/2/17 Andrea Edwards <edwardsa at cs.man.ac.uk 
> <mailto:edwardsa at cs.man.ac.uk>>
>
>     hello
>
>     I am trying to find whether the human SNPs (60,000)  i have are
>     listed in OMIM. I believe most SNPs in ensembl have a primary
>     source of dbSNP. None of the human variations have a source id of
>     15 (OMIM) There is a table variation_synonym to hold data about
>     multiple sources for a snp but I can't find any entries in this
>     table which have a source_id = 15 either. What am i doing wrong?
>     There exists a variation_set called OMIM which has 11509 SNPs and
>     I investigated some of these variations at random and I don't know
>     how you have linked them to the OMIM variation set
>
>     I have seen there are methods get_all_synonyms and get
>     _all_synonym_sources on the perl api for a variation. I presume i
>     could call get_all_synonyms('OMIM') but I don't see how that can
>     work when no variation synonyms have a source of 15/OMIM
>
>     Out of general curiosity, will the following 2 approaches give the
>     same results: getting the OMIM variation set and seeing whether
>     each of my 60,000 snps is in that or getting the OMIM
>     variation_synonyms for the 60,000 snps and seeing which return an
>     actual result? I'm presuming the second option will be far faster.
>
>     thanks a lot
>
>     _______________________________________________
>     Dev mailing list
>     Dev at ensembl.org <mailto:Dev at ensembl.org>
>     http://lists.ensembl.org/mailman/listinfo/dev
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110309/3a6bcd11/attachment.html>


More information about the Dev mailing list