[ensembl-dev] Custom GTF

Thomas Danhorn danhornt at njhealth.org
Mon Nov 12 15:46:29 GMT 2018


Hi Pankaj,

>From the error message it looks like you are using a GFF3 file (*.gff), 
not the GTF (*.gtf; a.k.a. GFF version 2) format that cellranger expects. 
The difference is in format of the 9th tab-delimited column, and the GFF3 
has a hierarchical format, with "Parent=...;" fields, whereas the parent 
information in the GFT is encoded in the same line in the form of 
"gene_id" and "transcript_id" fields.

Check the Ensembl page where you downloaded your file if there is a "GTF" 
version.  If not, search for a utility that can covert GFF3 to GFF.

Thomas



On Mon, 12 Nov 2018, Pankaj Agarwal wrote:

> Hi,
> I needed the human ERBB2 gene fasta formatted sequence and the corresponding GTF, which I downloaded from the Ensembl site.
> Then I added these to the mouse ref fasta and gtf respectively and tried to build the index.
> This is needed because I have a mouse tumor sequencing data that I am analyzing for a whole transcriptome RNAseq study and the human ERBB2 has been knocked-in.
> I am getting the following error in the GTF format:
>
> cellranger mkgtf \
>> Mus_musculus.GRCm38.84.gtf \
>> Mus_musculus.GRCm38.84.filtered.gtf \
>> --attribute=gene_biotype:protein_coding \
>> --attribute=gene_biotype:lincRNA \
>> --attribute=gene_biotype:antisense
> /GenomicPrimaryData/installs/10X/cellranger-2.2.0/cellranger-cs/2.2.0/bin
> cellranger mkgtf (2.2.0)
> Copyright (c) 2018 10x Genomics, Inc.  All rights reserved.
> -------------------------------------------------------------------------------
>
> Writing new genes GTF file (may take 10 minutes for a 1GB input GTF file)...
> Property 'transcript_id' not found in GTF line 1591734: 17      Ensembl exon    39687834        39688057        .       -       .       Name=ENSE00003599710;Parent=ENST00000579146
>
> Please fix your GTF and start again.
>
> The line that is causing the problem is:
> 17      Ensembl exon    39687834        39688057        .       -       .       Name=ENSE00003599710;Parent=ENST00000579146
>
> Can you please help with troubleshooting this problem.
>
> Thanks,
>
> - Pankaj
>
> -----------------------------
> Pankaj Agarwal, M.S
> Bioinformatician
> Data Analyst
> Applied Therapeutics
> Div. of Surgical Sciences
> Dept. of Surgery
> Duke University
> M: 919-244-6389
> O: 919-681-2251
> p.agarwal at duke.edu<mailto:p.agarwal at duke.edu>
>
>



More information about the Dev mailing list