[ensembl-dev] Obtaining the genomic sequences for all the 5'UTR and CDS for mouse genome
Allan Kamau
kamauallan at gmail.com
Tue Sep 10 14:03:35 BST 2024
I would like to obtain the sequences for the 5' UTR and CDS for the mouse
genome.
I began by filtering all the records having "five_prime_UTR" from the
chromosome.<chromosome_name>.gff3.gz files from "
https://ftp.ensembl.org/pub/release-112/gff3/mus_musculus/", I obtain
some 95358 records, it seems this number is too high as mouse genome has
approximately 25,000 genes.
I did the similar filtering for records having the value "CDS" as their
third field and obtained some 522159 entries, which is a large number
considered there are only 25,000 genes for the GRCm39 genome.
What would be preferred way to obtain the 5' UTR and CDS for the entire
mouse genome?
Regards,
-Allan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20240910/0c47af1e/attachment.html>
More information about the Dev
mailing list