[ensembl-dev] VCF order chrX, chrM

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Wed Apr 30 10:30:11 BST 2014


Dear Will,

Your explanation is very helpful. I'll try to fix my problems changing 
variant order as you mentioned.
Anyways I think for the most part of users, the order should be 
chr1-22,X,Y,M (almost all the human references are already sorted into 
that order).

Thank you very much.

Best regards,
Guillermo.

On 04/30/2014 10:18 AM, Will McLaren wrote:
> Hi Guillermo,
>
> Currently the VEP internally sorts each buffer of 5000 variants that 
> it reads in before writing the output. The sort is done 
> alphanumerically, so it will order e.g. 1-22,M,X,Y.
>
> It looks like the buffer partially overlaps your input groups, such 
> that, in your example, the first buffer read would be
>
> chrX variant1
> chrX variant2
>
> These are parsed, sorted and written out. Then the buffer reads in the 
> next batch:
>
> chrX variant3
> chrX variant4
> chrM variant1
> chrM variant2
>
> which then get sorted to
>
> chrM variant1
> chrM variant2
> chrX variant3
> chrX variant4
>
> since M is before X alphabetically. So, I'm afraid this explains but 
> doesn't fix your problem! You could ensure that your chrM variants 
> appear before your chrX and chrY variants in the file, and this 
> problem shouldn't appear.
>
> For the next VEP release I'll look into retaining the input sorting 
> when using VCF as the output format as I think this would be 
> preferable for most users.
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
>
> On 30 April 2014 07:47, Guillermo Marco Puche 
> <guillermo.marco at sistemasgenomicos.com 
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>     Dear developers,
>
>     I'm experiencing a strange behavior when annotating a full sorted
>     VCF file.
>     My chr order is the following: chr1 to chr22, chrX, chrY, chrM.
>
>     I've noticed when I've variants in chrX then in chrM the vep
>     scripts annotates the full vcf file but it changes the order of
>     some of the lines. See example below:
>
>     Imagine I've the following variants in my vcf:
>
>     chrX variant1
>     chrX variant2
>     chrX variant3
>     chrX variant4
>     chrM variant1
>     chrM variant2
>
>     After annotating the VCF the order remains like this:
>
>     chrX variant1
>     chrX variant2
>     chrM variant1
>     chrM variant2
>     chrX variant3
>     chrX variant4
>
>     This is just a graphical example. I would like to fix this,
>     because it's a bit tricky to get a non sorted VCF annotated file.
>     I've not experienced this issue with other chrX and chrM. Already
>     tried to debug this disabling all the plugins and the issue
>     reproduces itself.
>
>     Thank very much.
>
>     Best regards,
>     Guillermo.
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140430/157b07c2/attachment.html>


More information about the Dev mailing list