[ensembl-dev] VCF order chrX, chrM

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Wed Apr 30 12:29:49 BST 2014


Hello Will,

That's just awesome !

Thank you.

Best regards,
Guillermo.

On 04/30/2014 11:42 AM, Will McLaren wrote:
> This was easier to fix than I thought it would be; I've pushed a fix 
> to the ensembl-variation GitHub repo, it's available on the release/75 
> branch.
>
> Will
>
>
> On 30 April 2014 10:32, mag <mr6 at ebi.ac.uk <mailto:mr6 at ebi.ac.uk>> wrote:
>
>     Hi Will,
>
>     Chromosomes in Ensembl have a 'karyotype_rank' attribute that
>     gives the expected chromosome ordering (1-22, X, Y, MT)
>
>     I don't know how applicable it is to VEP, but it might be
>     something to bear in mind.
>
>
>     Cheers,
>     mag
>
>
>     On 30/04/2014 09:18, Will McLaren wrote:
>>     Hi Guillermo,
>>
>>     Currently the VEP internally sorts each buffer of 5000 variants
>>     that it reads in before writing the output. The sort is done
>>     alphanumerically, so it will order e.g. 1-22,M,X,Y.
>>
>>     It looks like the buffer partially overlaps your input groups,
>>     such that, in your example, the first buffer read would be
>>
>>     chrX variant1
>>     chrX variant2
>>
>>     These are parsed, sorted and written out. Then the buffer reads
>>     in the next batch:
>>
>>     chrX variant3
>>     chrX variant4
>>     chrM variant1
>>     chrM variant2
>>
>>     which then get sorted to
>>
>>     chrM variant1
>>     chrM variant2
>>     chrX variant3
>>     chrX variant4
>>
>>     since M is before X alphabetically. So, I'm afraid this explains
>>     but doesn't fix your problem! You could ensure that your chrM
>>     variants appear before your chrX and chrY variants in the file,
>>     and this problem shouldn't appear.
>>
>>     For the next VEP release I'll look into retaining the input
>>     sorting when using VCF as the output format as I think this would
>>     be preferable for most users.
>>
>>     Regards
>>
>>     Will McLaren
>>     Ensembl Variation
>>
>>
>>     On 30 April 2014 07:47, Guillermo Marco Puche
>>     <guillermo.marco at sistemasgenomicos.com
>>     <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>>         Dear developers,
>>
>>         I'm experiencing a strange behavior when annotating a full
>>         sorted VCF file.
>>         My chr order is the following: chr1 to chr22, chrX, chrY, chrM.
>>
>>         I've noticed when I've variants in chrX then in chrM the vep
>>         scripts annotates the full vcf file but it changes the order
>>         of some of the lines. See example below:
>>
>>         Imagine I've the following variants in my vcf:
>>
>>         chrX variant1
>>         chrX variant2
>>         chrX variant3
>>         chrX variant4
>>         chrM variant1
>>         chrM variant2
>>
>>         After annotating the VCF the order remains like this:
>>
>>         chrX variant1
>>         chrX variant2
>>         chrM variant1
>>         chrM variant2
>>         chrX variant3
>>         chrX variant4
>>
>>         This is just a graphical example. I would like to fix this,
>>         because it's a bit tricky to get a non sorted VCF annotated
>>         file. I've not experienced this issue with other chrX and
>>         chrM. Already tried to debug this disabling all the plugins
>>         and the issue reproduces itself.
>>
>>         Thank very much.
>>
>>         Best regards,
>>         Guillermo.
>>
>>         _______________________________________________
>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>         Posting guidelines and subscribe/unsubscribe info:
>>         http://lists.ensembl.org/mailman/listinfo/dev
>>         Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog:http://www.ensembl.info/
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-- 
Guillermo Marco Puche
------------------------------------------------------------------------

Guillermo Marco Puche
Bioinformatician, Computer Science Engineer.
Sistemas Genómicos S.L.
Phone: +34 902 364 669 (Ext.777)
Fax: +34 902 364 670
www.sistemasgenomicos.com

	

<https://www.sistemasgenomicos.com/web_sg/web/areas-bioinformatica.php>

------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140430/36c0a9f3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bioinfo.png
Type: image/png
Size: 27377 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140430/36c0a9f3/attachment.png>


More information about the Dev mailing list