[ensembl-dev] Release-97 FASTA header
Black, Andrew N
andrew at cgrb.oregonstate.edu
Thu Aug 8 23:46:53 BST 2019
Looking at the following file:
ftp.ensembl.org/pub/release-97/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.X.fa.gz
Lists the number of nucleotides as:
>X dna:chromosome chromosome:GRCh38:X:1:156040895:1 REF
Stating that there are 156,040,895 nucleotides in this sequence.
However, the number of nucleotides doesn’t match the number of characters:
grep -v “>” GRCh38X.fa | grep [A,T,C,G,N] | wc -c
158,641,577
Stating that there are 158,641,577 nucleotides in this sequence
It appears that the headers might be recycled from previous releases?
If I am correct in my conclusion, I just wanted to make sure that people at Ensembl were aware of this for future / past releases…
Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190808/38a574b6/attachment.html>
More information about the Dev
mailing list