[ensembl-dev] Obtaining the genomic sequences for all the 5'UTR and CDS for mouse genome

Allan Kamau kamauallan at gmail.com
Wed Sep 11 12:17:20 BST 2024


In short, is there a way to download the 5' UTR and the CDS sequences of
the mouse genome?

Any update will be appreciated.

-Allan.

On Tue, Sep 10, 2024 at 4:03 PM Allan Kamau <kamauallan at gmail.com> wrote:

> I would like to obtain the sequences for the 5' UTR and CDS for the mouse
> genome.
> I began by filtering all the records having "five_prime_UTR" from the
> chromosome.<chromosome_name>.gff3.gz files from "
> https://ftp.ensembl.org/pub/release-112/gff3/mus_musculus/", I obtain
> some 95358 records, it seems this number is too high as mouse genome has
> approximately 25,000 genes.
>
> I did the similar filtering for records having the value "CDS" as their
> third field and obtained some 522159 entries, which is a large number
> considered there are only 25,000 genes for the GRCm39 genome.
>
> What would be preferred way to obtain the 5' UTR and CDS for the entire
> mouse genome?
>
> Regards,
> -Allan.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20240911/6c41204d/attachment.html>


More information about the Dev mailing list