[ensembl-dev] Why do I get duplicated Variation Features

Benoît Ballester benoit at ebi.ac.uk
Wed Jul 4 15:59:12 BST 2012


Thanks Andy for the quick reply. 

I don't get how the $vf->feature_Slice()->name() would help in getting rid of the duplicate here (see 5th column)

It still prints chromosome:GRCh37:Y:xxx:xxx
I would have expected one to be on chro X the other on the Y...


22:Y:116772:116785:-1  rs36189917      G/A     chromosome:GRCh37:Y:116775:116775:-1    Y       116775  116775  SNP     dbSNP   UPSTREAM
22:Y:116772:116785:-1  rs36189917      G/A     chromosome:GRCh37:Y:116775:116775:-1    Y       116775  116775  SNP     dbSNP   INTERGENIC

22:Y:116798:116811:-1  rs35600455      G/A     chromosome:GRCh37:Y:116801:116801:-1    Y       116801  116801  SNP     dbSNP   UPSTREAM
22:Y:116798:116811:-1  rs35600455      G/A     chromosome:GRCh37:Y:116801:116801:-1    Y       116801  116801  SNP     dbSNP   INTERGENIC


Ben


On 4 Jul 2012, at 16:40, Andy Yates wrote:
> Hi Ben,
> 
> It seems that the duplication is in the database:
> 
> select sr.name, vf.seq_region_start, vf.seq_region_end, vf.seq_region_strand, vf.allele_string
> from variation v 
> join variation_feature vf using (variation_id)
> join seq_region sr using (seq_region_id)
> where v.name = 'rs71900610'
> 
> 
> name	seq_region_start	seq_region_end	seq_region_strand	allele_string
> Y	347778			347779		1			GA/-
> X	397778			397779		1			GA/-
> 
> 
> For the moment if you are on the Y chromosome & are in a PAR region then I would start hashing the duplicates out (anything less than position 2649521 in Y). Fastest way to do that I think would be to make a call to $vf->feature_Slice()->name() which will give you the coordinates along with the Sequence region information in a single string.
> 
> Andy
> 
> Andrew Yates                   Ensembl Core Software Project Leader
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensembl.org/
> 
> On 4 Jul 2012, at 15:18, Benoît Ballester wrote:
> 
>> It looks like the usual HAP/PAR headache _again_  :(
>> 
>> Any idea on how not to get VariationFeature twice when giving a slice falling in those region. 
>> 
>> Ps: my slice comes from a $feature->slice. Do I have to transform/project my slice to top-level ? I thought it was done by default. 
>> 
>> Ben
>> 
>> 
>> On 4 Jul 2012, at 15:46, Benoît Ballester wrote:
>> 
>>> Hi, 
>>> 
>>> I am trying to fetch some variants for some slices but get duplicated variation features. I don't understand why, as I would expect one variant per slice.
>>> 
>>> eg: 
>>> 7:Y:347772:347784:1  rs71900610      GA/-    Y       347778  347779  deletion        dbSNP   INTERGENIC
>>> 7:Y:347772:347784:1  rs71900610      GA/-    Y       347778  347779  deletion        dbSNP   INTERGENIC
>>> 
>>> or 
>>> 
>>> 48:Y:386863:386872:1     rs10600708      ACAC/-  Y       386864  386867  deletion        dbSNP   INTERGENIC
>>> 48:Y:386863:386872:1     rs10600708      ACAC/-  Y       386864  386867  deletion        dbSNP   INTERGENIC
>>> 
>>> or 
>>> 
>>> 22:Y:116772:116785:-1   rs36189917      G/A     Y       116775  116775  SNP     dbSNP   UPSTREAM
>>> 22:Y:116772:116785:-1   rs36189917      G/A     Y       116775  116775  SNP     dbSNP   INTERGENIC
>>> (here UPSTREAM/INTERGENIC difference)
>>> 
>>> 
>>> I am sure I am missing something obvious somewhere, but so far I couldn't put my finger on it.  
>>> 
>>> 
>>> My code is pretty straightforward :
>>> 
>>> my $vfs = $vfa->fetch_all_by_Slice($slice);
>>> 	foreach my $vf (@$vfs) {
>>> 	    my $v = $vf->variation();
>>> 	#print into on slice/variant/variantion-feature
>>>      }
>>> }
>>> 
>>> 
>>> Any feedback appreciated,
>>> 
>>> Ben
>>> 
>>> --
>>> Benoit Ballester, PhD
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> --
>> Benoit Ballester, PhD
>> Vertebrate Genomics - Ensembl
>> European Bioinformatics Institute (EMBL-EBI)
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge CB10 1SD, United Kingdom
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

--
Benoit Ballester, PhD



More information about the Dev mailing list