[ensembl-dev] Why do I get duplicated Variation Features
Andy Yates
ayates at ebi.ac.uk
Wed Jul 4 15:40:37 BST 2012
Hi Ben,
It seems that the duplication is in the database:
select sr.name, vf.seq_region_start, vf.seq_region_end, vf.seq_region_strand, vf.allele_string
from variation v
join variation_feature vf using (variation_id)
join seq_region sr using (seq_region_id)
where v.name = 'rs71900610'
name seq_region_start seq_region_end seq_region_strand allele_string
Y 347778 347779 1 GA/-
X 397778 397779 1 GA/-
For the moment if you are on the Y chromosome & are in a PAR region then I would start hashing the duplicates out (anything less than position 2649521 in Y). Fastest way to do that I think would be to make a call to $vf->feature_Slice()->name() which will give you the coordinates along with the Sequence region information in a single string.
Andy
Andrew Yates Ensembl Core Software Project Leader
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensembl.org/
On 4 Jul 2012, at 15:18, Benoît Ballester wrote:
> It looks like the usual HAP/PAR headache _again_ :(
>
> Any idea on how not to get VariationFeature twice when giving a slice falling in those region.
>
> Ps: my slice comes from a $feature->slice. Do I have to transform/project my slice to top-level ? I thought it was done by default.
>
> Ben
>
>
> On 4 Jul 2012, at 15:46, Benoît Ballester wrote:
>
>> Hi,
>>
>> I am trying to fetch some variants for some slices but get duplicated variation features. I don't understand why, as I would expect one variant per slice.
>>
>> eg:
>> 7:Y:347772:347784:1 rs71900610 GA/- Y 347778 347779 deletion dbSNP INTERGENIC
>> 7:Y:347772:347784:1 rs71900610 GA/- Y 347778 347779 deletion dbSNP INTERGENIC
>>
>> or
>>
>> 48:Y:386863:386872:1 rs10600708 ACAC/- Y 386864 386867 deletion dbSNP INTERGENIC
>> 48:Y:386863:386872:1 rs10600708 ACAC/- Y 386864 386867 deletion dbSNP INTERGENIC
>>
>> or
>>
>> 22:Y:116772:116785:-1 rs36189917 G/A Y 116775 116775 SNP dbSNP UPSTREAM
>> 22:Y:116772:116785:-1 rs36189917 G/A Y 116775 116775 SNP dbSNP INTERGENIC
>> (here UPSTREAM/INTERGENIC difference)
>>
>>
>> I am sure I am missing something obvious somewhere, but so far I couldn't put my finger on it.
>>
>>
>> My code is pretty straightforward :
>>
>> my $vfs = $vfa->fetch_all_by_Slice($slice);
>> foreach my $vf (@$vfs) {
>> my $v = $vf->variation();
>> #print into on slice/variant/variantion-feature
>> }
>> }
>>
>>
>> Any feedback appreciated,
>>
>> Ben
>>
>> --
>> Benoit Ballester, PhD
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
> --
> Benoit Ballester, PhD
> Vertebrate Genomics - Ensembl
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge CB10 1SD, United Kingdom
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list