[ensembl-dev] Obtaining 5'UTR sequences for a list of ensembl ids programmatically?

Allan Kamau kamauallan at gmail.com
Tue Apr 23 21:48:59 BST 2024


On Tue, Apr 23, 2024 at 10:43 PM Allan Kamau <kamauallan at gmail.com> wrote:

>
>
> On Tue, Apr 23, 2024 at 6:37 PM Benjamin Moore <bmoore at ebi.ac.uk> wrote:
>
>> Hi Allan,
>>
>> I think the most straightforward way to retreieve the 5'UTR sequences
>> for a list of Ensembl features (I assume you have a list of gene IDs,
>> ENSG...) using the REST API is to use the Lookup endpoints with the
>> expand and utr optional parameters to retreieve the genomic coordinates
>> of the 5'UTRs of each transcript for your list of genes:
>>
>> https://rest.ensembl.org/documentation/info/lookup
>>
>> Then, you can use the coordinates from the first step as the input for
>> the Sequence/region endpoints to retreieve the genomic sequence of the
>> 5' UTRs:
>>
>> https://rest.ensembl.org/documentation/info/sequence_region
>>
>> I hope this helps.
>>
>> Best wishes
>>
>> Ben
>>
>> On 23/04/2024 15:30, Allan Kamau wrote:
>> > Is there a way to obtain 5'UTR sequences given a list of ensembl ids
>> > programmatically?
>> >
>> > I have a list of ensembl ids for which I would like to obtain the
>> > 5'UTR region for each one of them programmatically hopefully via
>> > ensembl rest using wget, or python (ensembl-rest).
>> >
>> > Kindly assist.
>> >
>> > Thanks.
>> >
>> > -Allan.
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Dev mailing list    Dev at ensembl.org
>> > Posting guidelines and subscribe/unsubscribe info:
>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> > Ensembl Blog: http://www.ensembl.info/
>>
>> --
>> Dr. Ben Moore (he/him)
>> Ensembl Outreach Manager
>>
>> European Bioinformatics Institute (EMBL-EBI)
>> European Molecular Biology Laboratory
>> Wellcome Trust Genome Campus
>> Hinxton
>> Cambridge
>> CB10 1SD
>> UK
>>
>> bmoore at ebi.ac.uk
>> +44 (0)1223 494265
>>
>>
> Thank you Ben for your response. I am now stuck in defining the url for
> the https://rest.ensembl.org/sequence/region resource.
>
> I am using the ensembl id "ENSMUSG00000041075" in this example.
> The URL below provides the sequence features the ensembl id object
> "ENSMUSG00000041075".
>
>
> https://rest.ensembl.org/lookup/id/ENSMUSG00000041075?content-type=application/json;expand=1;utr=1
>
> This returns the query below
>
> {
>
>
>
>
>
>     "ENSMUSG00000041075": {
>
>
>
>
>
>         "seq_region_name": "1",
>
>
>
>
>
>         "logic_name": "ensembl_havana_gene_mus_musculus",
>
>
>
>
>
>         "end": 59526114,
>
>
>
>
>
>         "biotype": "protein_coding",
>
>
>
>
>
>         "version": 9,
>
>
>
>
>
>         "db_type": "core",
>
>
>
>
>
>         "object_type": "Gene",
>
>
>
>
>
>         "strand": 1,
>
>
>
>
>
>         "start": 59521583,
>
>
>
>
>
>         "canonical_transcript": "ENSMUST00000114246.4",
>
>
>
>
>
>         "Transcript": [
>
>
>
>
>
>             {
>
>
>
>
>
>                 "biotype": "protein_coding",
>                 "version": 4,
>                 "db_type": "core",
>                 "object_type": "Transcript",
>                 "seq_region_name": "1",
>                 "logic_name": "ensembl_havana_transcript_mus_musculus",
>                 "end": 59526114,
>                 "Translation": {
>                     "id": "ENSMUSP00000109884",
>                     "length": 572,
>                     "start": 59522119,
>                     "end": 59523837,
>                     "version": 3,
>                     "object_type": "Translation",
>                     "db_type": "core",
>                     "Parent": "ENSMUST00000114246",
>                     "species": "mus_musculus"
>                 },
>                 "assembly_name": "GRCm39",
>                 "Parent": "ENSMUSG00000041075",
>                 "is_canonical": 1,
>                 "display_name": "Fzd7-201",
>                 "Exon": [
>                     {
>                         "species": "mus_musculus",
>                         "version": 4,
>                         "db_type": "core",
>                         "object_type": "Exon",
>                         "assembly_name": "GRCm39",
>                         "id": "ENSMUSE00000698652",
>                         "start": 59521583,
>                         "end": 59526114,
>                         "strand": 1,
>                         "seq_region_name": "1"
>                     }
>                 ],
>                 "UTR": [
>                     {
>                         "object_type": "five_prime_UTR",
>                         "db_type": "core",
>                         "assembly_name": "GRCm39",
>                         "Parent": "ENSMUST00000114246",
>                         "type": "five_prime_utr",
>                         "species": "mus_musculus",
>                         "seq_region_name": "1",
>                         "strand": 1,
>                         "id": "ENSMUST00000114246",
>                         "source": "ensembl_havana",
>                         "start": 59521583,
>                         "end": 59522118
>                     },
>                     {
>                         "type": "three_prime_utr",
>                         "species": "mus_musculus",
>                         "db_type": "core",
>                         "object_type": "three_prime_UTR",
>                         "assembly_name": "GRCm39",
>                         "Parent": "ENSMUST00000114246",
>                         "id": "ENSMUST00000114246",
>                         "source": "ensembl_havana",
>                         "end": 59526114,
>                         "start": 59523838,
>                         "seq_region_name": "1",
>                         "strand": 1
>                     }
>                 ],
>                 "species": "mus_musculus",
>                 "strand": 1,
>                 "start": 59521583,
>                 "id": "ENSMUST00000114246",
>                 "source": "ensembl_havana",
>                 "length": 4532
>             }
>         ],
>         "id": "ENSMUSG00000041075",
>         "description": "frizzled class receptor 7 [Source:MGI
> Symbol;Acc:MGI:108570]",
>         "source": "ensembl_havana",
>         "assembly_name": "GRCm39",
>         "display_name": "Fzd7",
>         "species": "mus_musculus"
>     }
> }
>
> What would be formulation for the "
> https://rest.ensembl.org/sequence/region/" for the 5'UTR gene region
> given above.
>
> Below is the step where I am stuck.
> https://rest.ensembl.org/sequence/region/mus_musculus/
> <what_goes_here>:59522119..59523837:1?coord_system=seqlevel;content-type=text/x-fasta
>
> -Allan.
>
>

What would be the url to obtain the five_prime_UTR region?

I have tried the URL below but it finds no slice.
https://rest.ensembl.org/sequence/region/mus_musculus/GRCm39:59521583..59522118:1?coord_system=seqlevel;content-type=text/x-fasta


-Allan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20240423/662a9c53/attachment-0001.html>


More information about the Dev mailing list