[ensembl-dev] Why do I get duplicated Variation Features
Benoît Ballester
benoit at ebi.ac.uk
Wed Jul 4 15:59:12 BST 2012
Thanks Andy for the quick reply.
I don't get how the $vf->feature_Slice()->name() would help in getting rid of the duplicate here (see 5th column)
It still prints chromosome:GRCh37:Y:xxx:xxx
I would have expected one to be on chro X the other on the Y...
22:Y:116772:116785:-1 rs36189917 G/A chromosome:GRCh37:Y:116775:116775:-1 Y 116775 116775 SNP dbSNP UPSTREAM
22:Y:116772:116785:-1 rs36189917 G/A chromosome:GRCh37:Y:116775:116775:-1 Y 116775 116775 SNP dbSNP INTERGENIC
22:Y:116798:116811:-1 rs35600455 G/A chromosome:GRCh37:Y:116801:116801:-1 Y 116801 116801 SNP dbSNP UPSTREAM
22:Y:116798:116811:-1 rs35600455 G/A chromosome:GRCh37:Y:116801:116801:-1 Y 116801 116801 SNP dbSNP INTERGENIC
Ben
On 4 Jul 2012, at 16:40, Andy Yates wrote:
> Hi Ben,
>
> It seems that the duplication is in the database:
>
> select sr.name, vf.seq_region_start, vf.seq_region_end, vf.seq_region_strand, vf.allele_string
> from variation v
> join variation_feature vf using (variation_id)
> join seq_region sr using (seq_region_id)
> where v.name = 'rs71900610'
>
>
> name seq_region_start seq_region_end seq_region_strand allele_string
> Y 347778 347779 1 GA/-
> X 397778 397779 1 GA/-
>
>
> For the moment if you are on the Y chromosome & are in a PAR region then I would start hashing the duplicates out (anything less than position 2649521 in Y). Fastest way to do that I think would be to make a call to $vf->feature_Slice()->name() which will give you the coordinates along with the Sequence region information in a single string.
>
> Andy
>
> Andrew Yates Ensembl Core Software Project Leader
> EMBL-EBI Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK http://www.ensembl.org/
>
> On 4 Jul 2012, at 15:18, Benoît Ballester wrote:
>
>> It looks like the usual HAP/PAR headache _again_ :(
>>
>> Any idea on how not to get VariationFeature twice when giving a slice falling in those region.
>>
>> Ps: my slice comes from a $feature->slice. Do I have to transform/project my slice to top-level ? I thought it was done by default.
>>
>> Ben
>>
>>
>> On 4 Jul 2012, at 15:46, Benoît Ballester wrote:
>>
>>> Hi,
>>>
>>> I am trying to fetch some variants for some slices but get duplicated variation features. I don't understand why, as I would expect one variant per slice.
>>>
>>> eg:
>>> 7:Y:347772:347784:1 rs71900610 GA/- Y 347778 347779 deletion dbSNP INTERGENIC
>>> 7:Y:347772:347784:1 rs71900610 GA/- Y 347778 347779 deletion dbSNP INTERGENIC
>>>
>>> or
>>>
>>> 48:Y:386863:386872:1 rs10600708 ACAC/- Y 386864 386867 deletion dbSNP INTERGENIC
>>> 48:Y:386863:386872:1 rs10600708 ACAC/- Y 386864 386867 deletion dbSNP INTERGENIC
>>>
>>> or
>>>
>>> 22:Y:116772:116785:-1 rs36189917 G/A Y 116775 116775 SNP dbSNP UPSTREAM
>>> 22:Y:116772:116785:-1 rs36189917 G/A Y 116775 116775 SNP dbSNP INTERGENIC
>>> (here UPSTREAM/INTERGENIC difference)
>>>
>>>
>>> I am sure I am missing something obvious somewhere, but so far I couldn't put my finger on it.
>>>
>>>
>>> My code is pretty straightforward :
>>>
>>> my $vfs = $vfa->fetch_all_by_Slice($slice);
>>> foreach my $vf (@$vfs) {
>>> my $v = $vf->variation();
>>> #print into on slice/variant/variantion-feature
>>> }
>>> }
>>>
>>>
>>> Any feedback appreciated,
>>>
>>> Ben
>>>
>>> --
>>> Benoit Ballester, PhD
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>
>> --
>> Benoit Ballester, PhD
>> Vertebrate Genomics - Ensembl
>> European Bioinformatics Institute (EMBL-EBI)
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge CB10 1SD, United Kingdom
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
--
Benoit Ballester, PhD
More information about the Dev
mailing list