[ensembl-dev] DNA sequence filenames in Ensembl Metazoa FTP without release number

Sebastien Moretti smoretti at unil.ch
Fri Dec 11 11:57:55 GMT 2020


Hi James

Thanks for the explanation.
Clearer now for me.

Best
Sébastien

> Hello,
> The file naming conventions are different for different types of files; 
> release 49 has the same methodology as previous releases (but it isn't 
> entirely consistent, which doesn't help when trying to explain it...)
> 
> Assembly-related files, such as the DNA in FASTA format, do not have the 
> release number in the filename, but annotation-related files, such as 
> the GTF, do [*]. This is the same for Ensembl and Ensembl Genomes, but 
> the latter uses the EG release number (e.g. 49) rather than the Ensembl 
> release number (e.g. 102).
> 
> The rationale behind this is that filenames include the assembly name, 
> so assembly-related updates generate filenames that are easily 
> distinguished from past assemblies/releases. The annotation on an 
> assembly (where "annotation" includes things like cross-references, as 
> well as genes), can change from one release to the next, so to avoid 
> files having different content but the same filename in different 
> releases, the release number is included in the name.
> 
> Cheers,
> James
> Ensembl Production
> 
> [*] The exceptions to this rule are the annotation-related FASTA files, 
> containing cDNA or peptide sequences - the content will change when a 
> geneset is updated, but they do not have a release number in the filename.
> 
> 
> On 09/12/2020 13:52, Sebastien Moretti wrote:
>> Hi
>>
>> I wonder why the release number disappeared from the DNA sequence 
>> filenames in the Ensembl Metazoa FTP.
>> e.g. for D. simulans:
>> ftp://ftp.ensemblgenomes.org/pub/metazoa/release-49/fasta/drosophila_simulans/dna/Drosophila_simulans.ASM75419v3.dna.toplevel.fa.gz 
>>
>> -> no .49
>>
>>
>> The GTF filenames still contain the release number.
>> e.g. 
>> ftp://ftp.ensemblgenomes.org/pub/metazoa/release-49/gtf/drosophila_simulans/Drosophila_simulans.ASM75419v3.49.gtf.gz 
>>
>> -> .49 is present
>>
>> And this is not the case in the main Ensembl FTP.
>>
>> Best
>>
>> -- 
>> Sébastien Moretti
>> Staff Scientist
>> Department of Ecology and Evolution,
>> Biophore, University of Lausanne,
>> CH-1015 Lausanne, Switzerland
>> Tel.: +41 (21) 692 4221/4079
>> http://bioinfo.unil.ch/ http://bgee.org/ http://selectome.unil.ch/
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: 
>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/



More information about the Dev mailing list