[ensembl-dev] Possible VEP bug?

Mark Aquino aquinom85 at me.com
Tue Jan 3 14:18:14 GMT 2012


Will,

Yes that's what I meant.  

To the devs: long story short would you consider having the API throw an error if one tries to call a sequence beyond the range of the slice that is generated? I guess I don't really understand the point of generating a "sub slice" if slice_adaptor->fetch_by_Slice_start_end_strand can query beyond the range of the slice originally created.

Best,

Mark
On Jan 3, 2012, at 9:14 AM, Will McLaren wrote:

> Hi Mark,
> 
> Not sure exactly what you mean - do you mean that the sequence adaptor
> should return an error if you ask for sequence beyond the range of the
> slice you have?
> 
> Probably best to email dev at ensembl.org with this, the core API team
> (who deal with this sort of thing) should pick it up from there.
> 
> Will
> 
> On 3 January 2012 14:10, Mark Aquino <aquinom85 at me.com> wrote:
>> Got the code wrong (unsurprisingly): this is actually what I was using:
>> 
>> 
>> $slice_adaptor = Bio::EnsEMBL::Registry->get_adaptor ( 'homo_sapiens', 'core', 'slice');
>> $slice = $slice_adaptor->fetch_by_region('chromosome', $chr, $start, $end);
>> $sa = Bio::EnsEMBL::Registry->get_adaptor('homo_sapiens', 'core', 'sequence');
>> 
>>  for(;$start <= $end; $start++){
>>                my $dna = ${ $sa->fetch_by_Slice_start_end_strand($slice, $start, $start, $strand) };
>>                foreach my $altbase (@bases){
>>                        if ($altbase ne $dna && $dna ne 'N'){
>>                                print OUTFILE "$chr\t$true_start\t$transcript_id\t$dna\t$altbase\n";
>>                        }
>>                }
>>        }
>> 
>> 
>> 
>> On Jan 3, 2012, at 4:54 AM, Will McLaren wrote:
>> 
>>> Hi Mark,
>>> 
>>> My guess is you have two lines of input for the same position, one
>>> which as alleles (ref/alt) G/A and one which has alleles C/A.
>>> 
>>> The VEP assumes that you have supplied the correct ref allele as the
>>> first allele, but unless you ask it to (using --check_ref) it won't
>>> enforce this, and merely gives you the results of the input you have
>>> given it. Note that using --check_ref references the database and
>>> hence is slow for large datasets.
>>> 
>>> So, it may be that at this locus the actual reference sequence is "A",
>>> but your input is telling the VEP that in one case it is "C" and in
>>> one case it is "G"; this is then the output you would see.
>>> 
>>> Just to check can you send me any lines of input you have for this
>>> position from you input file (just grep for 36429636)?
>>> 
>>> Cheers
>>> 
>>> Will
>>> 
>>> On 28 December 2011 14:16, Mark Aquino <aquinom85 at me.com> wrote:
>>>> Hey Will,
>>>> 
>>>> I am getting back two ref alleles for the same site sometimes from the VEP at the site 7: 36429636, a G and a C.  Is this a bug?
>>>> 
>>>> 
>>>> output:
>>>> chrom  start            allele    gene_id                   transcript_id     amino_acid   codon
>>>> 7         36429636      A       ENSG00000011426 ENST00000265748 Transcript      NON_SYNONYMOUS_CODING   222     1       1       V/M           Gtg/Atg -       -
>>>> 7         36429636      A       ENSG00000011426 ENST00000265748 Transcript      NON_SYNONYMOUS_CODING   222     1       1       L/M           Ctg/Atg -       -
>>>> 
>>>> Best,
>>>> Mark





More information about the Dev mailing list