[ensembl-dev] VEP text output sort order

Will McLaren wm2 at ebi.ac.uk
Mon Oct 17 16:56:54 BST 2011


Hi Chris,

This looks like a bug in the sorting order - the VEP is using a string
comparison to sort the chromosomes rather than a numeric comparison
(otherwise Perl gives you a warning when you try and sort e.g.
1,2,6,"X").

I'll see what I can do to fix this, but I don't think it's severe a
problem enough to rush out a bugfix before the next full release.

If you have access to the API installation, you can try changing line
484 of ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm so
that it reads

foreach my $chr(sort {$a <=> $b} keys %vf_hash) {

Cheers

Will

On 17 October 2011 16:40,  <cj5 at sanger.ac.uk> wrote:
> Further to this, here's an extract from my log output, looks like a sort
> problem with the buffer:
>
> 2011-10-03 14:15:54 - Read 20000 variants into buffer
> 2011-10-03 14:15:54 - Analyzing chromosome 1
> 2011-10-03 14:15:54 - Reading transcript data from cache and/or database
> 2011-10-03 14:16:12 - Analyzing variants
> 2011-10-03 14:18:21 - Analyzing chromosome 2
> 2011-10-03 14:18:21 - Reading transcript data from cache and/or database
> 2011-10-03 14:18:40 - Analyzing variants
> 2011-10-03 14:20:01 - Read 20000 variants into buffer
> 2011-10-03 14:20:03 - Analyzing chromosome 2
> 2011-10-03 14:20:03 - Reading transcript data from cache and/or database
> 2011-10-03 14:20:06 - Analyzing variants
> 2011-10-03 14:20:23 - Analyzing chromosome 3
> 2011-10-03 14:20:23 - Reading transcript data from cache and/or database
> 2011-10-03 14:20:47 - Analyzing variants
> 2011-10-03 14:22:01 - Analyzing chromosome 4
> 2011-10-03 14:22:01 - Reading transcript data from cache and/or database
> 2011-10-03 14:22:19 - Analyzing variants
> 2011-10-03 14:23:09 - Analyzing chromosome 5
> 2011-10-03 14:23:09 - Reading transcript data from cache and/or database
> 2011-10-03 14:23:26 - Analyzing variants
> 2011-10-03 14:24:19 - Analyzing chromosome 6
> 2011-10-03 14:24:19 - Reading transcript data from cache and/or database
> 2011-10-03 14:24:24 - Analyzing variants
> 2011-10-03 14:25:00 - Read 20000 variants into buffer
> 2011-10-03 14:25:01 - Analyzing chromosome 10
> 2011-10-03 14:25:01 - Reading transcript data from cache and/or database
> 2011-10-03 14:25:02 - Analyzing variants
> 2011-10-03 14:25:04 - Analyzing chromosome 6
> 2011-10-03 14:25:04 - Reading transcript data from cache and/or database
> 2011-10-03 14:25:21 - Analyzing variants
> 2011-10-03 14:26:11 - Analyzing chromosome 7
>
>
>
> Tx
> Chris
>
>
>> Hi,
>> I've noticed that the VEP 2.2. text output order is occasionally not in
>> genomic order for a small percentage of records. I'm using the hs cache
>> for this like so:
>>
>> variant_effect_predictor.pl -v -i <my 120K record vcf> --sift b --polyphen
>> b --condel b --gene --hgnc --format vcf -o <my output file>
>> --force_overwrite --cache --dir <my vep_cache>
>>
>> Output:
>> ...
>> 6_4998993_C/T   6:4998993   T   ENSG00000124787 ENST00000380051 Transcript
>>  SYNONYMOUS_CODING   561 516 172 K   aaG/aaA -   HGNC=RPP40
>> 10_93502_C/G    10:93502    G   ENSG00000173876 ENST00000482075 Transcript
>>  DOWNSTREAM
>> ...
>> [several hundred chr10 records then jump back to chr 6]
>>
>> Is this a know issue?
>>
>> Thanks
>> Chris
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>




More information about the Dev mailing list