[ensembl-dev] How transcriptome fasta files are created

Thu Sep 27 14:53:45 BST 2018

Hello,

I try to understand how ensembl transcriptome fasta files are created.
I did some tests using these 2 files from release 84:

-
ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz

-
ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.gtf.gz

I can easily understand that you filter some biotypes from the gtf in
order to create the transcriptome. Then it is normal that some
transcripts annotated in the gtf file are not present in the
transcriptome fasta file.
But I do not understand why some transcripts (15091 different
transcripts IDs) are present in the transcriptome fasta file but not in
the gtf file.

Could you please give me some information on how this transcriptome
fasta file is created?

Best regards,

Julien Wollbrett