[ensembl-dev] Transcripts missing in Mus_musculus.GRCm38.102.gtf.gz and Mus_musculus.GRCm38.102.gff3.gz
Hervé Pagès
hpages.on.github at gmail.com
Wed Oct 27 00:54:34 BST 2021
Hmm.. not quite. GFF3 file
Mus_musculus.GRCm38.102.chr_patch_hapl_scaff.gff3.gz still seems to be
missing some transcripts.
The mus_musculus_core_102_38 db and
Mus_musculus.GRCm38.102.chr_patch_hapl_scaff.gtf.gz both contain 144778
transcripts but Mus_musculus.GRCm38.102.chr_patch_hapl_scaff.gff3.gz
contains only 144726. So 52 transcripts are missing. For example
ENSMUST00000206994 is missing. This transcript belongs to gene
ENSMUSG00000108408. So this gene has 5 transcripts in the GTF file but
only 4 in the GFF3 file.
What could be the reason why some transcripts are excluded from the GFF3
file?
Thanks,
H.
On 26/10/2021 16:37, Hervé Pagès wrote:
> That's it. These *.chr_patch_hapl_scaff.* files seem indeed to contain
> the full db dump. Thanks!
>
> Cheers,
> H.
>
>
> On 26/10/2021 16:22, Thomas Danhorn wrote:
>> As far as I know, these GTFs/GFFs only contain genes and transcripts
>> from the primary assembly, i.e. not from patches. I suspect
>> http://ftp.ensembl.org/pub/release-102/gtf/mus_musculus/Mus_musculus.GRCm38.102.chr_patch_hapl_scaff.gtf.gz
>>
>> might contain such transcripts.
>>
>> Best wishes,
>>
>> Thomas
>>
>>
>> On Tue, 26 Oct 2021, Hervé Pagès wrote:
>>
>>> Hi,
>>>
>>> Does anybody know why transcript ENSMUST00000230762 is missing from
>>> the GTF and GFF3 files for Mus musculus in Ensembl release 102?
>>>
>>> ENSMUST00000230762 is a transcript present in the
>>> mus_musculus_core_102_38 db. It's located on novel-patch sequence
>>> CHR_WSB_EIJ_MMCHR11_CTG3 from GRCm38.p6. But for some reason it's not
>>> in the Mus_musculus.GRCm38.102.gtf.gz or
>>> Mus_musculus.GRCm38.102.gff3.gz files found here
>>> http://ftp.ensembl.org/pub/release-102/gtf/mus_musculus/ and here
>>> http://ftp.ensembl.org/pub/release-102/gff3/mus_musculus/
>>>
>>> Furthermore, it seems that the GTF and GTF3 files are missing 2079
>>> transcripts compared to the mus_musculus_core_102_38 db. Anybody
>>> knows what's going on?
>>>
>>> Thanks,
>>> H.
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Bioconductor Core Team
>>> hpages.on.github at gmail.com
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/
>>
>
--
Hervé Pagès
Bioconductor Core Team
hpages.on.github at gmail.com
More information about the Dev
mailing list