[ensembl-dev] problems with transcript ENSCJAT00000102765 from Callithrix jacchus

Julien Wollbrett julien.wollbrett at unil.ch
Fri Jan 8 16:46:29 GMT 2021


Hello,

I have a problem using the genome annotaton GTF file of Callithrix 
jacchus 
(ftp://ftp.ensembl.org/pub/current_gtf/callithrix_jacchus/Callithrix_jacchus.ASM275486v1.102.gtf.gz)
I tried to use the gtf annotation and the corresponding genome 
(ftp://ftp.ensembl.org/pub/current_fasta/callithrix_jacchus/dna/Callithrix_jacchus.ASM275486v1.dna.toplevel.fa.gz) 
to generate a transcriptome (using the gtf_to_fasta software from bedtools).
The problem I have is that one transcript (ENSCJAT00000102765) is 
present twice in my transcriptome.
I tried to understand where the problem comes from. It looks like this 
transcript has 6 exon and the first one (ENSCJAE00000540609) is more 
than 10 millions bp far from the others.
gtf_to_fasta does not allow to generate one transcript using exon that far.
My questions are :
- do you agree this is an annotation error?
- What should be the actual sequence of this transcript? Should I remove 
the first exon from my annotation file? Should I keep only this one?

Best Regards,

Julien Wollbrett





More information about the Dev mailing list