[ensembl-dev] Why do I get duplicated Variation Features

Andy Yates ayates at ebi.ac.uk
Wed Jul 4 15:40:37 BST 2012


Hi Ben,

It seems that the duplication is in the database:

select sr.name, vf.seq_region_start, vf.seq_region_end, vf.seq_region_strand, vf.allele_string
from variation v 
join variation_feature vf using (variation_id)
join seq_region sr using (seq_region_id)
where v.name = 'rs71900610'


name	seq_region_start	seq_region_end	seq_region_strand	allele_string
Y	347778			347779		1			GA/-
X	397778			397779		1			GA/-


For the moment if you are on the Y chromosome & are in a PAR region then I would start hashing the duplicates out (anything less than position 2649521 in Y). Fastest way to do that I think would be to make a call to $vf->feature_Slice()->name() which will give you the coordinates along with the Sequence region information in a single string.

Andy

Andrew Yates                   Ensembl Core Software Project Leader
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensembl.org/

On 4 Jul 2012, at 15:18, Benoît Ballester wrote:

> It looks like the usual HAP/PAR headache _again_  :(
> 
> Any idea on how not to get VariationFeature twice when giving a slice falling in those region. 
> 
> Ps: my slice comes from a $feature->slice. Do I have to transform/project my slice to top-level ? I thought it was done by default. 
> 
> Ben
> 
> 
> On 4 Jul 2012, at 15:46, Benoît Ballester wrote:
> 
>> Hi, 
>> 
>> I am trying to fetch some variants for some slices but get duplicated variation features. I don't understand why, as I would expect one variant per slice.
>> 
>> eg: 
>> 7:Y:347772:347784:1  rs71900610      GA/-    Y       347778  347779  deletion        dbSNP   INTERGENIC
>> 7:Y:347772:347784:1  rs71900610      GA/-    Y       347778  347779  deletion        dbSNP   INTERGENIC
>> 
>> or 
>> 
>> 48:Y:386863:386872:1     rs10600708      ACAC/-  Y       386864  386867  deletion        dbSNP   INTERGENIC
>> 48:Y:386863:386872:1     rs10600708      ACAC/-  Y       386864  386867  deletion        dbSNP   INTERGENIC
>> 
>> or 
>> 
>> 22:Y:116772:116785:-1   rs36189917      G/A     Y       116775  116775  SNP     dbSNP   UPSTREAM
>> 22:Y:116772:116785:-1   rs36189917      G/A     Y       116775  116775  SNP     dbSNP   INTERGENIC
>> (here UPSTREAM/INTERGENIC difference)
>> 
>> 
>> I am sure I am missing something obvious somewhere, but so far I couldn't put my finger on it.  
>> 
>> 
>> My code is pretty straightforward :
>> 
>> my $vfs = $vfa->fetch_all_by_Slice($slice);
>> 	foreach my $vf (@$vfs) {
>> 	    my $v = $vf->variation();
>> 	#print into on slice/variant/variantion-feature
>>       }
>> }
>> 
>> 
>> Any feedback appreciated,
>> 
>> Ben
>> 
>> --
>> Benoit Ballester, PhD
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> --
> Benoit Ballester, PhD
> Vertebrate Genomics - Ensembl
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge CB10 1SD, United Kingdom
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list