[ensembl-dev] [External] Re: Obtaining the genomic sequences for all the 5'UTR and CDS for mouse genome

Eric Engelhard eric.engelhard at regeneron.com
Wed Sep 11 13:58:16 BST 2024


Hello Allan,

The recommended methods for accessing genome wide feature annotations and  sequences are through either the Ensembl BioMart or by creating a local MySQL database and using the Perl API.

BioMart (https://useast.ensembl.org/biomart/martview) provides the easier query method and does not require scripting. You will want to select the “Ensembl Genes 112” database and the “Mouse Genes (GRCm39)” dataset and then select the attributes from “Features” and “Sequences” that meet your requirements.

Cheers!
Eric



Regeneron - Internal
From: Dev <dev-bounces at ensembl.org> On Behalf Of Allan Kamau
Sent: Wednesday, September 11, 2024 7:17 AM
To: dev <dev at ensembl.org>
Subject: [External] Re: [ensembl-dev] Obtaining the genomic sequences for all the 5'UTR and CDS for mouse genome

In short, is there a way to download the 5' UTR and the CDS sequences of the mouse genome? Any update will be appreciated. -Allan. On Tue, Sep 10, 2024 at 4: 03 PM Allan Kamau <kamauallan@ gmail. com> wrote: I would like to obtain the

In short, is there a way to download the 5' UTR and the CDS sequences of the mouse genome?

Any update will be appreciated.

-Allan.

On Tue, Sep 10, 2024 at 4:03 PM Allan Kamau <kamauallan at gmail.com<mailto:kamauallan at gmail.com>> wrote:
I would like to obtain the sequences for the 5' UTR and CDS for the mouse genome.
I began by filtering all the records having "five_prime_UTR" from the chromosome.<chromosome_name>.gff3.gz files from "https://ftp.ensembl.org/pub/release-112/gff3/mus_musculus/<https://urldefense.com/v3/__https:/ftp.ensembl.org/pub/release-112/gff3/mus_musculus/__;!!ODpDvJZr5w!A6FsD3BVS3WGT6ZFBpdbTtPlpVguweDG6V3d61yHHrFf8S0p4KRvTa6jt4dmfY7IV6hWw_RVScMolk9envpjnnE0$>", I obtain some 95358 records, it seems this number is too high as mouse genome has approximately 25,000 genes.

I did the similar filtering for records having the value "CDS" as their third field and obtained some 522159 entries, which is a large number considered there are only 25,000 genes for the GRCm39 genome.

What would be preferred way to obtain the 5' UTR and CDS for the entire mouse genome?

Regards,
-Allan.


******************************************************************** 
This e-mail and any attachment hereto, is intended only for use by the addressee(s) named above and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, any dissemination, distribution or copying of this email, or any attachment hereto, is strictly prohibited. If you receive this email in error please immediately notify me by return electronic mail and permanently delete this email and any attachment hereto, any copy of this e-mail and of any such attachment, and any printout thereof. Finally, please note that only authorized representatives of Regeneron Pharmaceuticals, Inc. have the power and authority to enter into business dealings with any third party. 
********************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20240911/8e1e276a/attachment-0001.html>


More information about the Dev mailing list