[ensembl-dev] Transcripts missing in Mus_musculus.GRCm38.102.gtf.gz and Mus_musculus.GRCm38.102.gff3.gz

Hervé Pagès hpages.on.github at gmail.com
Wed Oct 27 00:54:34 BST 2021


Hmm.. not quite. GFF3 file 
Mus_musculus.GRCm38.102.chr_patch_hapl_scaff.gff3.gz still seems to be 
missing some transcripts.

The mus_musculus_core_102_38 db and 
Mus_musculus.GRCm38.102.chr_patch_hapl_scaff.gtf.gz both contain 144778 
transcripts but Mus_musculus.GRCm38.102.chr_patch_hapl_scaff.gff3.gz 
contains only 144726. So 52 transcripts are missing. For example 
ENSMUST00000206994 is missing. This transcript belongs to gene 
ENSMUSG00000108408. So this gene has 5 transcripts in the GTF file but 
only 4 in the GFF3 file.

What could be the reason why some transcripts are excluded from the GFF3 
file?

Thanks,
H.


On 26/10/2021 16:37, Hervé Pagès wrote:
> That's it. These *.chr_patch_hapl_scaff.* files seem indeed to contain 
> the full db dump. Thanks!
> 
> Cheers,
> H.
> 
> 
> On 26/10/2021 16:22, Thomas Danhorn wrote:
>> As far as I know, these GTFs/GFFs only contain genes and transcripts 
>> from the primary assembly, i.e. not from patches.  I suspect 
>> http://ftp.ensembl.org/pub/release-102/gtf/mus_musculus/Mus_musculus.GRCm38.102.chr_patch_hapl_scaff.gtf.gz 
>>
>> might contain such transcripts.
>>
>> Best wishes,
>>
>> Thomas
>>
>>
>> On Tue, 26 Oct 2021, Hervé Pagès wrote:
>>
>>> Hi,
>>>
>>> Does anybody know why transcript ENSMUST00000230762 is missing from 
>>> the GTF and GFF3 files for Mus musculus in Ensembl release 102?
>>>
>>> ENSMUST00000230762 is a transcript present in the 
>>> mus_musculus_core_102_38 db. It's located on novel-patch sequence 
>>> CHR_WSB_EIJ_MMCHR11_CTG3 from GRCm38.p6. But for some reason it's not 
>>> in the Mus_musculus.GRCm38.102.gtf.gz or 
>>> Mus_musculus.GRCm38.102.gff3.gz files found here 
>>> http://ftp.ensembl.org/pub/release-102/gtf/mus_musculus/ and here 
>>> http://ftp.ensembl.org/pub/release-102/gff3/mus_musculus/
>>>
>>> Furthermore, it seems that the GTF and GTF3 files are missing 2079 
>>> transcripts compared to the mus_musculus_core_102_38 db. Anybody 
>>> knows what's going on?
>>>
>>> Thanks,
>>> H.
>>>
>>> -- 
>>> Hervé Pagès
>>>
>>> Bioconductor Core Team
>>> hpages.on.github at gmail.com
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: 
>>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: 
>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/
>>
> 

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.github at gmail.com




More information about the Dev mailing list