[ensembl-dev] Sorted chromosomes in genome FASTA + chr prefix in GRCh38 + dbSNP updates

Joel Fillon, Mr joel.fillon at mcgill.ca
Tue Oct 14 19:58:10 BST 2014


Hi Ensembl admins,

3 unrelated questions (should maybe posted in different messages):

1. Would it be possible to sort chromosomes in genome FASTA files by "biological" order
instead of lexicographic order or other order (sequence length?) e.g.:
ftp://ftp.ensembl.org/pub/release-77/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y
M


instead of:
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
3
4
5
6
7
8
9
MT
X
Y

or in ftp://ftp.ensemblgenomes.org/pub/plants/release-23/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.23.dna.genome.fa.gz

1
2
3
4
5
Mt
Pt

instead of:
Pt
Mt
4
2
3
5
1

since random order can cause problems with tools like GATK.

2. Would it be possible in Homo sapiens GRCh38 genome to prefix chromosome names with "chr" like NCBI and UCSC versions,
to match them with Ensembl GTF chromosome IDs?

3.  Regarding dbSNP updates included in Ensembl releases, I understand from this page http://useast.ensembl.org/Help/Faq?id=432
that it takes several months to curate dbSNP entries. Do you have any rough idea of when dbSNP build 141 would be available for Homo sapiens GRCh38?
By the end of this year or not before 2015?

Thanks a lot for your help and for the hard work!
Joël

_____________________________________________________
Joël Fillon
McGill University and Génome Québec Innovation Centre
740, Dr. Penfield Avenue, Room 4200
Montréal (QC) H3A 0G1
CANADA

Phone: 514-398-3311 ext. 00721
E-mail: joel.fillon at mcgill.ca




More information about the Dev mailing list