[ensembl-dev] Custom GTF
Thomas Danhorn
danhornt at njhealth.org
Mon Nov 12 15:46:29 GMT 2018
Hi Pankaj,
>From the error message it looks like you are using a GFF3 file (*.gff),
not the GTF (*.gtf; a.k.a. GFF version 2) format that cellranger expects.
The difference is in format of the 9th tab-delimited column, and the GFF3
has a hierarchical format, with "Parent=...;" fields, whereas the parent
information in the GFT is encoded in the same line in the form of
"gene_id" and "transcript_id" fields.
Check the Ensembl page where you downloaded your file if there is a "GTF"
version. If not, search for a utility that can covert GFF3 to GFF.
Thomas
On Mon, 12 Nov 2018, Pankaj Agarwal wrote:
> Hi,
> I needed the human ERBB2 gene fasta formatted sequence and the corresponding GTF, which I downloaded from the Ensembl site.
> Then I added these to the mouse ref fasta and gtf respectively and tried to build the index.
> This is needed because I have a mouse tumor sequencing data that I am analyzing for a whole transcriptome RNAseq study and the human ERBB2 has been knocked-in.
> I am getting the following error in the GTF format:
>
> cellranger mkgtf \
>> Mus_musculus.GRCm38.84.gtf \
>> Mus_musculus.GRCm38.84.filtered.gtf \
>> --attribute=gene_biotype:protein_coding \
>> --attribute=gene_biotype:lincRNA \
>> --attribute=gene_biotype:antisense
> /GenomicPrimaryData/installs/10X/cellranger-2.2.0/cellranger-cs/2.2.0/bin
> cellranger mkgtf (2.2.0)
> Copyright (c) 2018 10x Genomics, Inc. All rights reserved.
> -------------------------------------------------------------------------------
>
> Writing new genes GTF file (may take 10 minutes for a 1GB input GTF file)...
> Property 'transcript_id' not found in GTF line 1591734: 17 Ensembl exon 39687834 39688057 . - . Name=ENSE00003599710;Parent=ENST00000579146
>
> Please fix your GTF and start again.
>
> The line that is causing the problem is:
> 17 Ensembl exon 39687834 39688057 . - . Name=ENSE00003599710;Parent=ENST00000579146
>
> Can you please help with troubleshooting this problem.
>
> Thanks,
>
> - Pankaj
>
> -----------------------------
> Pankaj Agarwal, M.S
> Bioinformatician
> Data Analyst
> Applied Therapeutics
> Div. of Surgical Sciences
> Dept. of Surgery
> Duke University
> M: 919-244-6389
> O: 919-681-2251
> p.agarwal at duke.edu<mailto:p.agarwal at duke.edu>
>
>
More information about the Dev
mailing list