[ensembl-dev] Release-97 FASTA header

Black, Andrew N andrew at cgrb.oregonstate.edu
Thu Aug 8 23:46:53 BST 2019

Looking at the following file:

Lists the number of nucleotides as:

>X dna:chromosome chromosome:GRCh38:X:1:156040895:1 REF
Stating that there are 156,040,895 nucleotides in this sequence.

However, the number of nucleotides doesn’t match the number of characters:

grep -v “>” GRCh38X.fa | grep [A,T,C,G,N] | wc -c

Stating that there are 158,641,577 nucleotides in this sequence

It appears that the headers might be recycled from previous releases?

If I am correct in my conclusion, I just wanted to make sure that people at Ensembl were aware of this for future / past releases…

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190808/38a574b6/attachment.html>

More information about the Dev mailing list