[ensembl-dev] VCF order chrX, chrM
Guillermo Marco Puche
guillermo.marco at sistemasgenomicos.com
Wed Apr 30 12:29:49 BST 2014
Hello Will,
That's just awesome !
Thank you.
Best regards,
Guillermo.
On 04/30/2014 11:42 AM, Will McLaren wrote:
> This was easier to fix than I thought it would be; I've pushed a fix
> to the ensembl-variation GitHub repo, it's available on the release/75
> branch.
>
> Will
>
>
> On 30 April 2014 10:32, mag <mr6 at ebi.ac.uk <mailto:mr6 at ebi.ac.uk>> wrote:
>
> Hi Will,
>
> Chromosomes in Ensembl have a 'karyotype_rank' attribute that
> gives the expected chromosome ordering (1-22, X, Y, MT)
>
> I don't know how applicable it is to VEP, but it might be
> something to bear in mind.
>
>
> Cheers,
> mag
>
>
> On 30/04/2014 09:18, Will McLaren wrote:
>> Hi Guillermo,
>>
>> Currently the VEP internally sorts each buffer of 5000 variants
>> that it reads in before writing the output. The sort is done
>> alphanumerically, so it will order e.g. 1-22,M,X,Y.
>>
>> It looks like the buffer partially overlaps your input groups,
>> such that, in your example, the first buffer read would be
>>
>> chrX variant1
>> chrX variant2
>>
>> These are parsed, sorted and written out. Then the buffer reads
>> in the next batch:
>>
>> chrX variant3
>> chrX variant4
>> chrM variant1
>> chrM variant2
>>
>> which then get sorted to
>>
>> chrM variant1
>> chrM variant2
>> chrX variant3
>> chrX variant4
>>
>> since M is before X alphabetically. So, I'm afraid this explains
>> but doesn't fix your problem! You could ensure that your chrM
>> variants appear before your chrX and chrY variants in the file,
>> and this problem shouldn't appear.
>>
>> For the next VEP release I'll look into retaining the input
>> sorting when using VCF as the output format as I think this would
>> be preferable for most users.
>>
>> Regards
>>
>> Will McLaren
>> Ensembl Variation
>>
>>
>> On 30 April 2014 07:47, Guillermo Marco Puche
>> <guillermo.marco at sistemasgenomicos.com
>> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>> Dear developers,
>>
>> I'm experiencing a strange behavior when annotating a full
>> sorted VCF file.
>> My chr order is the following: chr1 to chr22, chrX, chrY, chrM.
>>
>> I've noticed when I've variants in chrX then in chrM the vep
>> scripts annotates the full vcf file but it changes the order
>> of some of the lines. See example below:
>>
>> Imagine I've the following variants in my vcf:
>>
>> chrX variant1
>> chrX variant2
>> chrX variant3
>> chrX variant4
>> chrM variant1
>> chrM variant2
>>
>> After annotating the VCF the order remains like this:
>>
>> chrX variant1
>> chrX variant2
>> chrM variant1
>> chrM variant2
>> chrX variant3
>> chrX variant4
>>
>> This is just a graphical example. I would like to fix this,
>> because it's a bit tricky to get a non sorted VCF annotated
>> file. I've not experienced this issue with other chrX and
>> chrM. Already tried to debug this disabling all the plugins
>> and the issue reproduces itself.
>>
>> Thank very much.
>>
>> Best regards,
>> Guillermo.
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
--
Guillermo Marco Puche
------------------------------------------------------------------------
Guillermo Marco Puche
Bioinformatician, Computer Science Engineer.
Sistemas Genómicos S.L.
Phone: +34 902 364 669 (Ext.777)
Fax: +34 902 364 670
www.sistemasgenomicos.com
<https://www.sistemasgenomicos.com/web_sg/web/areas-bioinformatica.php>
------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140430/36c0a9f3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bioinfo.png
Type: image/png
Size: 27377 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140430/36c0a9f3/attachment.png>
More information about the Dev
mailing list