[ensembl-dev] Obtaining 5'UTR sequences for a list of ensembl ids programmatically?

Benjamin Moore bmoore at ebi.ac.uk
Wed Apr 24 10:13:43 BST 2024


Hi Allan,

No problem at all- very happy to help. With the GET sequence/region 
endpoint, the chromosome number should be entered as a required 
parameter instead. From the GET lookup endpoint, this is the 
"seq_region_name" key-value pair. In the example you provided, this is 
"1", so the URL should look like this instead:

https://rest.ensembl.org/sequence/region/mus_musculus/1:59521583..59522118:1?content-type=text/x-fasta

Best wishes

Ben

On 23/04/2024 21:48, Allan Kamau wrote:
>
>
> On Tue, Apr 23, 2024 at 10:43 PM Allan Kamau <kamauallan at gmail.com> wrote:
>
>
>
>     On Tue, Apr 23, 2024 at 6:37 PM Benjamin Moore <bmoore at ebi.ac.uk>
>     wrote:
>
>         Hi Allan,
>
>         I think the most straightforward way to retreieve the 5'UTR
>         sequences
>         for a list of Ensembl features (I assume you have a list of
>         gene IDs,
>         ENSG...) using the REST API is to use the Lookup endpoints
>         with the
>         expand and utr optional parameters to retreieve the genomic
>         coordinates
>         of the 5'UTRs of each transcript for your list of genes:
>
>         https://rest.ensembl.org/documentation/info/lookup
>
>         Then, you can use the coordinates from the first step as the
>         input for
>         the Sequence/region endpoints to retreieve the genomic
>         sequence of the
>         5' UTRs:
>
>         https://rest.ensembl.org/documentation/info/sequence_region
>
>         I hope this helps.
>
>         Best wishes
>
>         Ben
>
>         On 23/04/2024 15:30, Allan Kamau wrote:
>         > Is there a way to obtain 5'UTR sequences given a list of
>         ensembl ids
>         > programmatically?
>         >
>         > I have a list of ensembl ids for which I would like to
>         obtain the
>         > 5'UTR region for each one of them programmatically hopefully
>         via
>         > ensembl rest using wget, or python (ensembl-rest).
>         >
>         > Kindly assist.
>         >
>         > Thanks.
>         >
>         > -Allan.
>         >
>         >
>         >
>         >
>         > _______________________________________________
>         > Dev mailing list Dev at ensembl.org
>         > Posting guidelines and subscribe/unsubscribe info:
>         https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>         > Ensembl Blog: http://www.ensembl.info/
>
>         -- 
>         Dr. Ben Moore (he/him)
>         Ensembl Outreach Manager
>
>         European Bioinformatics Institute (EMBL-EBI)
>         European Molecular Biology Laboratory
>         Wellcome Trust Genome Campus
>         Hinxton
>         Cambridge
>         CB10 1SD
>         UK
>
>         bmoore at ebi.ac.uk
>         +44 (0)1223 494265
>
>
>     Thank you Ben for your response. I am now stuck in defining the
>     url for the https://rest.ensembl.org/sequence/region resource.
>
>     I am using the ensembl id "ENSMUSG00000041075" in this example.
>     The URL below provides the sequence features the ensembl id object
>     "ENSMUSG00000041075".
>
>     https://rest.ensembl.org/lookup/id/ENSMUSG00000041075?content-type=application/json;expand=1;utr=1
>
>     This returns the query below
>
>     {
>         "ENSMUSG00000041075": {
>             "seq_region_name": "1",
>             "logic_name": "ensembl_havana_gene_mus_musculus",
>             "end": 59526114,
>             "biotype": "protein_coding",
>             "version": 9,
>             "db_type": "core",
>             "object_type": "Gene",
>             "strand": 1,
>             "start": 59521583,
>             "canonical_transcript": "ENSMUST00000114246.4",
>             "Transcript": [
>                 {
>                     "biotype": "protein_coding",
>                     "version": 4,
>                     "db_type": "core",
>                     "object_type": "Transcript",
>                     "seq_region_name": "1",
>                     "logic_name":
>     "ensembl_havana_transcript_mus_musculus",
>                     "end": 59526114,
>                     "Translation": {
>                         "id": "ENSMUSP00000109884",
>                         "length": 572,
>                         "start": 59522119,
>                         "end": 59523837,
>                         "version": 3,
>                         "object_type": "Translation",
>                         "db_type": "core",
>                         "Parent": "ENSMUST00000114246",
>                         "species": "mus_musculus"
>                     },
>                     "assembly_name": "GRCm39",
>                     "Parent": "ENSMUSG00000041075",
>                     "is_canonical": 1,
>                     "display_name": "Fzd7-201",
>                     "Exon": [
>                         {
>                             "species": "mus_musculus",
>                             "version": 4,
>                             "db_type": "core",
>                             "object_type": "Exon",
>                             "assembly_name": "GRCm39",
>                             "id": "ENSMUSE00000698652",
>                             "start": 59521583,
>                             "end": 59526114,
>                             "strand": 1,
>                             "seq_region_name": "1"
>                         }
>                     ],
>                     "UTR": [
>                         {
>                             "object_type": "five_prime_UTR",
>                             "db_type": "core",
>                             "assembly_name": "GRCm39",
>                             "Parent": "ENSMUST00000114246",
>                             "type": "five_prime_utr",
>                             "species": "mus_musculus",
>                             "seq_region_name": "1",
>                             "strand": 1,
>                             "id": "ENSMUST00000114246",
>                             "source": "ensembl_havana",
>                             "start": 59521583,
>                             "end": 59522118
>                         },
>                         {
>                             "type": "three_prime_utr",
>                             "species": "mus_musculus",
>                             "db_type": "core",
>                             "object_type": "three_prime_UTR",
>                             "assembly_name": "GRCm39",
>                             "Parent": "ENSMUST00000114246",
>                             "id": "ENSMUST00000114246",
>                             "source": "ensembl_havana",
>                             "end": 59526114,
>                             "start": 59523838,
>                             "seq_region_name": "1",
>                             "strand": 1
>                         }
>                     ],
>                     "species": "mus_musculus",
>                     "strand": 1,
>                     "start": 59521583,
>                     "id": "ENSMUST00000114246",
>                     "source": "ensembl_havana",
>                     "length": 4532
>                 }
>             ],
>             "id": "ENSMUSG00000041075",
>             "description": "frizzled class receptor 7 [Source:MGI
>     Symbol;Acc:MGI:108570]",
>             "source": "ensembl_havana",
>             "assembly_name": "GRCm39",
>             "display_name": "Fzd7",
>             "species": "mus_musculus"
>         }
>     }
>
>     What would be formulation for the
>     "https://rest.ensembl.org/sequence/region/" for the 5'UTR gene
>     region given above.
>
>     Below is the step where I am stuck.
>     https://rest.ensembl.org/sequence/region/mus_musculus/<what_goes_here>:59522119..59523837:1?coord_system=seqlevel;content-type=text/x-fasta
>
>     -Allan.
>
>
> What would be the url to obtain the five_prime_UTR region?
>
> I have tried the URL below but it finds no slice.
> https://rest.ensembl.org/sequence/region/mus_musculus/GRCm39:59521583..59522118:1?coord_system=seqlevel;content-type=text/x-fasta 
>
>
> -Allan.
>
> _______________________________________________
> Dev mailing listDev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog:http://www.ensembl.info/

-- 
Dr. Ben Moore (he/him)
Ensembl Outreach Manager

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge
CB10 1SD
UK

bmoore at ebi.ac.uk
+44 (0)1223 494265
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20240424/8e5bdd3e/attachment-0001.html>


More information about the Dev mailing list