[Dev] [ensembl-dev] Reference SNP

Nathan Johnson njohnson at ebi.ac.uk
Wed Jul 28 09:41:34 BST 2010


Hi Will

I believe the forthcoming release now as an is_reference method on the Slice object and adaptor.

Nath


On 27 Jul 2010, at 13:52, Will Spooner wrote:

> 
> On 27 Jul 2010, at 13:25, Neil Walker wrote:
> 
>> Hi Stuart
>> 
>>> Thanks for your reply - very useful indeed! Can I add that the link you provided to dbSNP gives a list of mappings but only one has a Group term = ref_assembly. I have been told (by the biologists in my group!) that this is the mapping we are interested in - does this attribute carry over to ensembl in an API queryable way!? Or are we reading too much into that attribute on dbSNP?
>> 
>> Just on this last point.
>> 
>> While in this case, the confusion seems to stem from alternate MHC
>> assemblies, there is a general danger of circular reasoning with some
>> sources of rs numbers.
>> 
>> For example, rs3819299 is on the Illumina 1M chip. How do we know?
>> Because Illumina tell us so. If you find the SNP now maps more than once
>> (and not just to alternate MHC assemblies) you should also be asking -
>> well, what did Illumina measure then?
>> 
>> That will depend on which probes they used. They might be measuring
>> the new, best version, or, with shorter probes, they might be measuring
>> the genome at several distinct points giving garbage results. This is
>> one of the reasons people want to see intensity plots in GWAS ...
> 
> I would second (third?) a method to flag/filter Features on non-reference-assembly Slices using the API. We have resorted to filtering by seq_region name, which is far from ideal. However, if I'm missing a method, please enlighten me!
> 
> Best,
> Will
> 
> 
> 
> 
>> 
>> Cheers
>> Neil
>> 
>>> On 27/07/10 12:03, Will McLaren wrote:
>>>> Hi Stuart,
>>>> 
>>>> Unfortunately a "reference feature" does not exist. In dbSNP and
>>>> Ensembl, a SNP is defined by its alleles and a pair of flanking
>>>> sequences, not a position. Thus, in order to derive a position for a
>>>> SNP, the flanking sequence is aligned to the reference genome. If the
>>>> flanking sequence aligns equally well to more than one location, then
>>>> the SNP is said to have multiple mappings, and we store and report one
>>>> variation feature for each of those mappings.
>>>> 
>>>> You can see all the mappings stored by dbSNP for your SNP here:
>>>> 
>>>> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=3819299
>>>> 
>>>> As you can see, for build 36 (on which Ensembl 54 is based), there are 5
>>>> mappings, three of which are to what we consider "top level" contigs
>>>> (chromosome 6, and two alternative chromosome 6 haplotypes, see
>>>> http://may2009.archive.ensembl.org/Homo_sapiens/Variation/Summary?source=dbSNP;v=rs3819299 
>>>> for the mappings we store).
>>>> 
>>>> In this specific case, you could say that the mapping to chromosome 6
>>>> was the "reference" mapping, but in other cases multiple mappings are
>>>> not so easily distinguished; one might be to chromosome 6, and another
>>>> to chromosome 14, for example.
>>>> 
>>>> Hope this helps,
>>>> 
>>>> Will McLaren
>>>> Ensembl Variation
>>>> 
>>>> On 27 July 2010 11:30, Stuart Meacham <sm766 at cam.ac.uk
>>>> <mailto:sm766 at cam.ac.uk>> wrote:
>>>> 
>>>>   Hello,
>>>> 
>>>>   I am trying to distinguish the reference SNP from other Variation
>>>>   Features given the SNP ID. I am using ensembl build 54. For example:
>>>>   rs3819299 returns three variation feature objects with differing
>>>>   positions and I can't see any easy way of distinguishing the
>>>>   reference feature from the other two.
>>>> 
>>>>   Here is a snippet of code so you know what I am doing:
>>>> 
>>>> 
>>>>   # get registry
>>>>   my $reg = 'Bio::EnsEMBL::Registry';
>>>>   $reg->load_registry_from_db(
>>>>      -host => 'ensembldb.ensembl.org <http://ensembldb.ensembl.org>',
>>>>      -user => 'anonymous',
>>>>      -verbose => '1',
>>>>      -version => 54,
>>>>   );
>>>> 
>>>>   # get adaptors
>>>>   my $vfa = $reg->get_adaptor('human', 'variation', 'variationfeature');
>>>>   my $va = $reg->get_adaptor('human', 'variation', 'variation');
>>>> 
>>>>   # get Variation object
>>>>   my $var = $va->fetch_by_name('rs3819299');
>>>> 
>>>>   # get all Variation features associated with that variation
>>>>   my $vfs = $vfa->fetch_all_by_Variation($var);
>>>> 
>>>>   # get position
>>>>   # I was under the mistaken impression that even given more than one
>>>>   feature they should all be at the same position!!
>>>> 
>>>>   my $snp_pos = 0;
>>>>   foreach my $vf (@{$vfs}){
>>>>           $snp_pos = $vf->start;
>>>>   }
>>>> 
>>>>   Thanks for any help/pointers
>>>> 
>>>>   Stuart
>>>> 
>>>> 
>> 
>> 
>> -- 
>> ---------------------------------------------------------------------
>> Neil Walker                         email: neil.walker at cimr.cam.ac.uk
>> JDRF/WT Diabetes and Inflammation   tel: +44 (0)1223 763210
>> 	Laboratory		    fax: +44 (0)1223 762102
>> Cambridge, UK                    http://www-gene.cimr.cam.ac.uk/todd/
>> ---------------------------------------------------------------------
> 
> --
> William Spooner
> whs at eaglegenomics.com
> http://www.eaglegenomics.com
> 
> 

Nathan Johnson
Scientific Programmer
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
Email: njohnson at ebi.ac.uk
TelNo: (+44)1223 492629









More information about the Dev mailing list