[ensembl-dev] Homo_sapiens.GRCh38.92.chr.gtf contents compared to fasta files (cdna + ncrna)

Vivek Iyer vvi at sanger.ac.uk
Wed Sep 19 10:50:44 BST 2018


Hi all,

From the downloadable data on ftp://ftp.ensembl.org/pub/release-92/gtf/ <ftp://ftp.ensembl.org/pub/release-92/gtf/> I can see one gtf file for download (I’m using v92 at the moment): Homo_sapiens.GRCh38.92.chr.gtf 

Are the transcripts in here a superset / subset or the identical to the combined transcripts in the sum of these two fasta files under ftp://ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/:
Homo_sapiens.GRCh38.cdna.all.fa  
Homo_sapiens.GRCh38.ncrna.fa

Of course, I could resolve the IDs and do a simple comparison :-) I was hoping someone could point me at docs (along with a nudge to RTFM) or supply some motivation for the split. Both types of files are needed at different points of an RNAseq pipeline.

Thanks,

Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180919/469716ba/attachment.html>


More information about the Dev mailing list