[ensembl-dev] identifying known variants that map to multiple genome locations

Pablo Marin-Garcia pg4 at sanger.ac.uk
Thu Dec 9 12:17:28 GMT 2010


On Thu, 9 Dec 2010, Andrea Edwards wrote:

> Hi
>
> My email was very confusing - I'm sorry about that. I'll try again
>
> I am retrieving a set of variation feature objects like so
>
>
> # get all VariationFeatures in a region
> $slice = $sa->fetch_by_region('chromosome', '7');
> foreach $vf (@{$vfa->fetch_all_by_Slice($slice)}) {
>
>
> how would i check the variation feature object that i have returned maps to a 
> single genome location?
>
> I don't have a variation object, but now i have a vf object (which is what i 
> was getting at in my last email but explained badly) so I could create a 
> variation object from it and do this
>
> $v = $va->fetch_by_name(vf->variation_name);
> $count =  @{$vfa->fetch_all_by_Variation($v)}
> if $count>  1 then it maps to more than one place
>
> But this seemed circuitous and I was wondering if there was a better way.

Hello Andrea,

If you have a variation_feature you can look at $vf->map_weight to see if it has 
multiple mapping but.....

You need to create some filters yourself for the following cases:

- before $vfa->fetch_all_by_Variation($v) check that $v exist
- $vfa->fetch_all_by_Variation($v) could be void: no mapping or more than 3 
mappings
- the problem of using your $count is that you end  filtering out valid snps 
with only one mapping on the refernece but present also in the haplotypic 
regions like chr6 COX and others, or both chrX and Y (at least en previous 
APIs).
   - I deal with this by checking the count of variations with chr
     starting by number or XY:

     $count = [grep{$_->seq_region_name =~ /^[1-9XY]/} @vfeats_for_rs]

     But probably there are better options.

- questions to Will/Pontus:

Is map_weight reporting only 1 if you map only once in the reference chrs but also to the haplotypic regions?
How are snps in chrX and Y reported, you will have two variation features for a 
single variation?
And which map_weight would be assigned to these SNPs?


Finally, the reason why you should go through a variation object instead passing 
a rs to variation_feature is beacause the variarion_feature mysql table only 
have indexes for variarion_id and position, so you need to acces it passing a 
Slice or Variation object.

mysql> show index from variation_feature;
+-------------------+------------+---------------+--------------+---------------------
| Table             | Non_unique | Key_name      | Seq_in_index | Column_name
+-------------------+------------+---------------+--------------+---------------------
| variation_feature |          0 | PRIMARY       |            1 | variation_feature_id
| variation_feature |          1 | variation_idx |            1 | variation_id
| variation_feature |          1 | pos_idx       |            1 | seq_region_id
| variation_feature |          1 | pos_idx       |            2 | seq_region_start
| variation_feature |          1 | pos_idx       |            3 | seq_region_end
+-------------------+------------+---------------+--------------+----------------------

Hope this helps

   -Pablo

>
>
>
>
>
> On 09/12/2010 09:43, Will McLaren wrote:
>> Hi Andrea,
>> 
>> I'm not quite sure why you're creating a variation feature object for
>> a known variant in that way; you should be retrieving it via the
>> variation object using either
>> 
>> $v->get_all_VariationFeatures
>> 
>> or
>> 
>> $variation_feature_adaptor->fetch_all_by_Variation($v)
>> 
>> There is no way to get the other variation features from the variation
>> feature itself; the quickest way (or the least amount of code) would
>> be:
>> 
>> $vf->variation->get_all_VariationFeatures
>> 
>> Will
>> 
>> On 8 December 2010 23:38, Andrea Edwards<edwardsa at cs.man.ac.uk>  wrote:
>>> Esteemed api developers,
>>> 
>>> Is there any easy way with the perl api to tell if a known variant (by 
>>> this
>>> i mean one that has a an rsID and that i have created like this:
>>> 
>>> # Variation feature representing a single nucleotide polymorphism
>>> $vf = Bio::EnsEMBL::Variation::VariationFeature->new
>>> (-start =>  100,
>>> -end =>  100,
>>> -strand =>  1,
>>> -slice =>  $slice,
>>> -allele_string =>  'A/T',
>>> -variation_name =>  'rs635421',
>>> -map_weight =>  1,
>>> -variation =>  $v);
>>> 
>>> is known to map to another region of the genome other than the region have
>>> specified with the slice/start/end properties?
>>> 
>>> dbSNP variants can be assigned to multiple genome locations when it hasn't
>>> been possible to unambiguously assign them to a single location.
>>> 
>>> If i can't tell from the variation feature directly i believe i can do 
>>> this:
>>> 
>>> $vfa = $reg->get_adaptor("human","variation","variationfeature");
>>> $va = $reg->get_adaptor("human","variation","variation");
>>> 
>>> # fetch all genome hits for a particular variation
>>> $v = $va->fetch_by_name('rs56');
>>> $count =  @{$vfa->fetch_all_by_Variation($v)}
>>> if $count>  1 then it maps to more than one place?
>>> 
>>> I was hoping to be able to do it directly from teh variation feature
>>> 
>>> thanks
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list
>>> Dev at ensembl.org
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> 
>>> 
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
>


-----

   Pablo Marin-Garcia





More information about the Dev mailing list