[ensembl-dev] Possible VEP bug?
Mark Aquino
aquinom85 at me.com
Tue Jan 3 14:18:14 GMT 2012
Will,
Yes that's what I meant.
To the devs: long story short would you consider having the API throw an error if one tries to call a sequence beyond the range of the slice that is generated? I guess I don't really understand the point of generating a "sub slice" if slice_adaptor->fetch_by_Slice_start_end_strand can query beyond the range of the slice originally created.
Best,
Mark
On Jan 3, 2012, at 9:14 AM, Will McLaren wrote:
> Hi Mark,
>
> Not sure exactly what you mean - do you mean that the sequence adaptor
> should return an error if you ask for sequence beyond the range of the
> slice you have?
>
> Probably best to email dev at ensembl.org with this, the core API team
> (who deal with this sort of thing) should pick it up from there.
>
> Will
>
> On 3 January 2012 14:10, Mark Aquino <aquinom85 at me.com> wrote:
>> Got the code wrong (unsurprisingly): this is actually what I was using:
>>
>>
>> $slice_adaptor = Bio::EnsEMBL::Registry->get_adaptor ( 'homo_sapiens', 'core', 'slice');
>> $slice = $slice_adaptor->fetch_by_region('chromosome', $chr, $start, $end);
>> $sa = Bio::EnsEMBL::Registry->get_adaptor('homo_sapiens', 'core', 'sequence');
>>
>> for(;$start <= $end; $start++){
>> my $dna = ${ $sa->fetch_by_Slice_start_end_strand($slice, $start, $start, $strand) };
>> foreach my $altbase (@bases){
>> if ($altbase ne $dna && $dna ne 'N'){
>> print OUTFILE "$chr\t$true_start\t$transcript_id\t$dna\t$altbase\n";
>> }
>> }
>> }
>>
>>
>>
>> On Jan 3, 2012, at 4:54 AM, Will McLaren wrote:
>>
>>> Hi Mark,
>>>
>>> My guess is you have two lines of input for the same position, one
>>> which as alleles (ref/alt) G/A and one which has alleles C/A.
>>>
>>> The VEP assumes that you have supplied the correct ref allele as the
>>> first allele, but unless you ask it to (using --check_ref) it won't
>>> enforce this, and merely gives you the results of the input you have
>>> given it. Note that using --check_ref references the database and
>>> hence is slow for large datasets.
>>>
>>> So, it may be that at this locus the actual reference sequence is "A",
>>> but your input is telling the VEP that in one case it is "C" and in
>>> one case it is "G"; this is then the output you would see.
>>>
>>> Just to check can you send me any lines of input you have for this
>>> position from you input file (just grep for 36429636)?
>>>
>>> Cheers
>>>
>>> Will
>>>
>>> On 28 December 2011 14:16, Mark Aquino <aquinom85 at me.com> wrote:
>>>> Hey Will,
>>>>
>>>> I am getting back two ref alleles for the same site sometimes from the VEP at the site 7: 36429636, a G and a C. Is this a bug?
>>>>
>>>>
>>>> output:
>>>> chrom start allele gene_id transcript_id amino_acid codon
>>>> 7 36429636 A ENSG00000011426 ENST00000265748 Transcript NON_SYNONYMOUS_CODING 222 1 1 V/M Gtg/Atg - -
>>>> 7 36429636 A ENSG00000011426 ENST00000265748 Transcript NON_SYNONYMOUS_CODING 222 1 1 L/M Ctg/Atg - -
>>>>
>>>> Best,
>>>> Mark
More information about the Dev
mailing list