[ensembl-dev] VCF order chrX, chrM

Will McLaren wm2 at ebi.ac.uk
Wed Apr 30 09:18:12 BST 2014


Hi Guillermo,

Currently the VEP internally sorts each buffer of 5000 variants that it
reads in before writing the output. The sort is done alphanumerically, so
it will order e.g. 1-22,M,X,Y.

It looks like the buffer partially overlaps your input groups, such that,
in your example, the first buffer read would be

chrX variant1
chrX variant2

These are parsed, sorted and written out. Then the buffer reads in the next
batch:

chrX variant3
chrX variant4
chrM variant1
chrM variant2

which then get sorted to

chrM variant1
chrM variant2
chrX variant3
chrX variant4

since M is before X alphabetically. So, I'm afraid this explains but
doesn't fix your problem! You could ensure that your chrM variants appear
before your chrX and chrY variants in the file, and this problem shouldn't
appear.

For the next VEP release I'll look into retaining the input sorting when
using VCF as the output format as I think this would be preferable for most
users.

Regards

Will McLaren
Ensembl Variation


On 30 April 2014 07:47, Guillermo Marco Puche <
guillermo.marco at sistemasgenomicos.com> wrote:

>  Dear developers,
>
> I'm experiencing a strange behavior when annotating a full sorted VCF file.
> My chr order is the following: chr1 to chr22, chrX, chrY, chrM.
>
> I've noticed when I've variants in chrX then in chrM the vep scripts
> annotates the full vcf file but it changes the order of some of the lines.
> See example below:
>
> Imagine I've the following variants in my vcf:
>
> chrX variant1
> chrX variant2
> chrX variant3
> chrX variant4
> chrM variant1
> chrM variant2
>
> After annotating the VCF the order remains like this:
>
> chrX variant1
> chrX variant2
> chrM variant1
> chrM variant2
> chrX variant3
> chrX variant4
>
> This is just a graphical example. I would like to fix this, because it's a
> bit tricky to get a non sorted VCF annotated file. I've not experienced
> this issue with other chrX and chrM. Already tried to debug this disabling
> all the plugins and the issue reproduces itself.
>
> Thank very much.
>
> Best regards,
> Guillermo.
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140430/0c9bc69d/attachment.html>


More information about the Dev mailing list