[Dev] [ensembl-dev] Reference SNP
Nathan Johnson
njohnson at ebi.ac.uk
Wed Jul 28 09:41:34 BST 2010
Hi Will
I believe the forthcoming release now as an is_reference method on the Slice object and adaptor.
Nath
On 27 Jul 2010, at 13:52, Will Spooner wrote:
>
> On 27 Jul 2010, at 13:25, Neil Walker wrote:
>
>> Hi Stuart
>>
>>> Thanks for your reply - very useful indeed! Can I add that the link you provided to dbSNP gives a list of mappings but only one has a Group term = ref_assembly. I have been told (by the biologists in my group!) that this is the mapping we are interested in - does this attribute carry over to ensembl in an API queryable way!? Or are we reading too much into that attribute on dbSNP?
>>
>> Just on this last point.
>>
>> While in this case, the confusion seems to stem from alternate MHC
>> assemblies, there is a general danger of circular reasoning with some
>> sources of rs numbers.
>>
>> For example, rs3819299 is on the Illumina 1M chip. How do we know?
>> Because Illumina tell us so. If you find the SNP now maps more than once
>> (and not just to alternate MHC assemblies) you should also be asking -
>> well, what did Illumina measure then?
>>
>> That will depend on which probes they used. They might be measuring
>> the new, best version, or, with shorter probes, they might be measuring
>> the genome at several distinct points giving garbage results. This is
>> one of the reasons people want to see intensity plots in GWAS ...
>
> I would second (third?) a method to flag/filter Features on non-reference-assembly Slices using the API. We have resorted to filtering by seq_region name, which is far from ideal. However, if I'm missing a method, please enlighten me!
>
> Best,
> Will
>
>
>
>
>>
>> Cheers
>> Neil
>>
>>> On 27/07/10 12:03, Will McLaren wrote:
>>>> Hi Stuart,
>>>>
>>>> Unfortunately a "reference feature" does not exist. In dbSNP and
>>>> Ensembl, a SNP is defined by its alleles and a pair of flanking
>>>> sequences, not a position. Thus, in order to derive a position for a
>>>> SNP, the flanking sequence is aligned to the reference genome. If the
>>>> flanking sequence aligns equally well to more than one location, then
>>>> the SNP is said to have multiple mappings, and we store and report one
>>>> variation feature for each of those mappings.
>>>>
>>>> You can see all the mappings stored by dbSNP for your SNP here:
>>>>
>>>> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=3819299
>>>>
>>>> As you can see, for build 36 (on which Ensembl 54 is based), there are 5
>>>> mappings, three of which are to what we consider "top level" contigs
>>>> (chromosome 6, and two alternative chromosome 6 haplotypes, see
>>>> http://may2009.archive.ensembl.org/Homo_sapiens/Variation/Summary?source=dbSNP;v=rs3819299
>>>> for the mappings we store).
>>>>
>>>> In this specific case, you could say that the mapping to chromosome 6
>>>> was the "reference" mapping, but in other cases multiple mappings are
>>>> not so easily distinguished; one might be to chromosome 6, and another
>>>> to chromosome 14, for example.
>>>>
>>>> Hope this helps,
>>>>
>>>> Will McLaren
>>>> Ensembl Variation
>>>>
>>>> On 27 July 2010 11:30, Stuart Meacham <sm766 at cam.ac.uk
>>>> <mailto:sm766 at cam.ac.uk>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I am trying to distinguish the reference SNP from other Variation
>>>> Features given the SNP ID. I am using ensembl build 54. For example:
>>>> rs3819299 returns three variation feature objects with differing
>>>> positions and I can't see any easy way of distinguishing the
>>>> reference feature from the other two.
>>>>
>>>> Here is a snippet of code so you know what I am doing:
>>>>
>>>>
>>>> # get registry
>>>> my $reg = 'Bio::EnsEMBL::Registry';
>>>> $reg->load_registry_from_db(
>>>> -host => 'ensembldb.ensembl.org <http://ensembldb.ensembl.org>',
>>>> -user => 'anonymous',
>>>> -verbose => '1',
>>>> -version => 54,
>>>> );
>>>>
>>>> # get adaptors
>>>> my $vfa = $reg->get_adaptor('human', 'variation', 'variationfeature');
>>>> my $va = $reg->get_adaptor('human', 'variation', 'variation');
>>>>
>>>> # get Variation object
>>>> my $var = $va->fetch_by_name('rs3819299');
>>>>
>>>> # get all Variation features associated with that variation
>>>> my $vfs = $vfa->fetch_all_by_Variation($var);
>>>>
>>>> # get position
>>>> # I was under the mistaken impression that even given more than one
>>>> feature they should all be at the same position!!
>>>>
>>>> my $snp_pos = 0;
>>>> foreach my $vf (@{$vfs}){
>>>> $snp_pos = $vf->start;
>>>> }
>>>>
>>>> Thanks for any help/pointers
>>>>
>>>> Stuart
>>>>
>>>>
>>
>>
>> --
>> ---------------------------------------------------------------------
>> Neil Walker email: neil.walker at cimr.cam.ac.uk
>> JDRF/WT Diabetes and Inflammation tel: +44 (0)1223 763210
>> Laboratory fax: +44 (0)1223 762102
>> Cambridge, UK http://www-gene.cimr.cam.ac.uk/todd/
>> ---------------------------------------------------------------------
>
> --
> William Spooner
> whs at eaglegenomics.com
> http://www.eaglegenomics.com
>
>
Nathan Johnson
Scientific Programmer
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
Email: njohnson at ebi.ac.uk
TelNo: (+44)1223 492629
More information about the Dev
mailing list