[ensembl-dev] VCF order chrX, chrM
Guillermo Marco Puche
guillermo.marco at sistemasgenomicos.com
Wed Apr 30 10:30:11 BST 2014
Dear Will,
Your explanation is very helpful. I'll try to fix my problems changing
variant order as you mentioned.
Anyways I think for the most part of users, the order should be
chr1-22,X,Y,M (almost all the human references are already sorted into
that order).
Thank you very much.
Best regards,
Guillermo.
On 04/30/2014 10:18 AM, Will McLaren wrote:
> Hi Guillermo,
>
> Currently the VEP internally sorts each buffer of 5000 variants that
> it reads in before writing the output. The sort is done
> alphanumerically, so it will order e.g. 1-22,M,X,Y.
>
> It looks like the buffer partially overlaps your input groups, such
> that, in your example, the first buffer read would be
>
> chrX variant1
> chrX variant2
>
> These are parsed, sorted and written out. Then the buffer reads in the
> next batch:
>
> chrX variant3
> chrX variant4
> chrM variant1
> chrM variant2
>
> which then get sorted to
>
> chrM variant1
> chrM variant2
> chrX variant3
> chrX variant4
>
> since M is before X alphabetically. So, I'm afraid this explains but
> doesn't fix your problem! You could ensure that your chrM variants
> appear before your chrX and chrY variants in the file, and this
> problem shouldn't appear.
>
> For the next VEP release I'll look into retaining the input sorting
> when using VCF as the output format as I think this would be
> preferable for most users.
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
>
> On 30 April 2014 07:47, Guillermo Marco Puche
> <guillermo.marco at sistemasgenomicos.com
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
> Dear developers,
>
> I'm experiencing a strange behavior when annotating a full sorted
> VCF file.
> My chr order is the following: chr1 to chr22, chrX, chrY, chrM.
>
> I've noticed when I've variants in chrX then in chrM the vep
> scripts annotates the full vcf file but it changes the order of
> some of the lines. See example below:
>
> Imagine I've the following variants in my vcf:
>
> chrX variant1
> chrX variant2
> chrX variant3
> chrX variant4
> chrM variant1
> chrM variant2
>
> After annotating the VCF the order remains like this:
>
> chrX variant1
> chrX variant2
> chrM variant1
> chrM variant2
> chrX variant3
> chrX variant4
>
> This is just a graphical example. I would like to fix this,
> because it's a bit tricky to get a non sorted VCF annotated file.
> I've not experienced this issue with other chrX and chrM. Already
> tried to debug this disabling all the plugins and the issue
> reproduces itself.
>
> Thank very much.
>
> Best regards,
> Guillermo.
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140430/157b07c2/attachment.html>
More information about the Dev
mailing list