[ensembl-dev] Obtaining 5'UTR sequences for a list of ensembl ids programmatically?

Allan Kamau kamauallan at gmail.com
Wed Apr 24 11:05:17 BST 2024


On Wed, Apr 24, 2024 at 12:15 PM Benjamin Moore <bmoore at ebi.ac.uk> wrote:

> Hi Allan,
>
> No problem at all- very happy to help. With the GET sequence/region
> endpoint, the chromosome number should be entered as a required parameter
> instead. From the GET lookup endpoint, this is the "seq_region_name"
> key-value pair. In the example you provided, this is "1", so the URL should
> look like this instead:
>
>
> https://rest.ensembl.org/sequence/region/mus_musculus/1:59521583..59522118:1?content-type=text/x-fasta
>
> Best wishes
>
> Ben
> On 23/04/2024 21:48, Allan Kamau wrote:
>
>
>
> On Tue, Apr 23, 2024 at 10:43 PM Allan Kamau <kamauallan at gmail.com> wrote:
>
>>
>>
>> On Tue, Apr 23, 2024 at 6:37 PM Benjamin Moore <bmoore at ebi.ac.uk> wrote:
>>
>>> Hi Allan,
>>>
>>> I think the most straightforward way to retreieve the 5'UTR sequences
>>> for a list of Ensembl features (I assume you have a list of gene IDs,
>>> ENSG...) using the REST API is to use the Lookup endpoints with the
>>> expand and utr optional parameters to retreieve the genomic coordinates
>>> of the 5'UTRs of each transcript for your list of genes:
>>>
>>> https://rest.ensembl.org/documentation/info/lookup
>>>
>>> Then, you can use the coordinates from the first step as the input for
>>> the Sequence/region endpoints to retreieve the genomic sequence of the
>>> 5' UTRs:
>>>
>>> https://rest.ensembl.org/documentation/info/sequence_region
>>>
>>> I hope this helps.
>>>
>>> Best wishes
>>>
>>> Ben
>>>
>>> On 23/04/2024 15:30, Allan Kamau wrote:
>>> > Is there a way to obtain 5'UTR sequences given a list of ensembl ids
>>> > programmatically?
>>> >
>>> > I have a list of ensembl ids for which I would like to obtain the
>>> > 5'UTR region for each one of them programmatically hopefully via
>>> > ensembl rest using wget, or python (ensembl-rest).
>>> >
>>> > Kindly assist.
>>> >
>>> > Thanks.
>>> >
>>> > -Allan.
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Dev mailing list    Dev at ensembl.org
>>> > Posting guidelines and subscribe/unsubscribe info:
>>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>> > Ensembl Blog: http://www.ensembl.info/
>>>
>>> --
>>> Dr. Ben Moore (he/him)
>>> Ensembl Outreach Manager
>>>
>>> European Bioinformatics Institute (EMBL-EBI)
>>> European Molecular Biology Laboratory
>>> Wellcome Trust Genome Campus
>>> Hinxton
>>> Cambridge
>>> CB10 1SD
>>> UK
>>>
>>> bmoore at ebi.ac.uk
>>> +44 (0)1223 494265
>>>
>>>
>> Thank you Ben for your response. I am now stuck in defining the url for
>> the https://rest.ensembl.org/sequence/region resource.
>>
>> I am using the ensembl id "ENSMUSG00000041075" in this example.
>> The URL below provides the sequence features the ensembl id object
>> "ENSMUSG00000041075".
>>
>>
>> https://rest.ensembl.org/lookup/id/ENSMUSG00000041075?content-type=application/json;expand=1;utr=1
>>
>> This returns the query below
>>
>> {
>>
>>
>>
>>
>>
>>     "ENSMUSG00000041075": {
>>
>>
>>
>>
>>
>>         "seq_region_name": "1",
>>
>>
>>
>>
>>
>>         "logic_name": "ensembl_havana_gene_mus_musculus",
>>
>>
>>
>>
>>
>>         "end": 59526114,
>>
>>
>>
>>
>>
>>         "biotype": "protein_coding",
>>
>>
>>
>>
>>
>>         "version": 9,
>>
>>
>>
>>
>>
>>         "db_type": "core",
>>
>>
>>
>>
>>
>>         "object_type": "Gene",
>>
>>
>>
>>
>>
>>         "strand": 1,
>>
>>
>>
>>
>>
>>         "start": 59521583,
>>
>>
>>
>>
>>
>>         "canonical_transcript": "ENSMUST00000114246.4",
>>
>>
>>
>>
>>
>>         "Transcript": [
>>
>>
>>
>>
>>
>>             {
>>
>>
>>
>>
>>
>>                 "biotype": "protein_coding",
>>                 "version": 4,
>>                 "db_type": "core",
>>                 "object_type": "Transcript",
>>                 "seq_region_name": "1",
>>                 "logic_name": "ensembl_havana_transcript_mus_musculus",
>>                 "end": 59526114,
>>                 "Translation": {
>>                     "id": "ENSMUSP00000109884",
>>                     "length": 572,
>>                     "start": 59522119,
>>                     "end": 59523837,
>>                     "version": 3,
>>                     "object_type": "Translation",
>>                     "db_type": "core",
>>                     "Parent": "ENSMUST00000114246",
>>                     "species": "mus_musculus"
>>                 },
>>                 "assembly_name": "GRCm39",
>>                 "Parent": "ENSMUSG00000041075",
>>                 "is_canonical": 1,
>>                 "display_name": "Fzd7-201",
>>                 "Exon": [
>>                     {
>>                         "species": "mus_musculus",
>>                         "version": 4,
>>                         "db_type": "core",
>>                         "object_type": "Exon",
>>                         "assembly_name": "GRCm39",
>>                         "id": "ENSMUSE00000698652",
>>                         "start": 59521583,
>>                         "end": 59526114,
>>                         "strand": 1,
>>                         "seq_region_name": "1"
>>                     }
>>                 ],
>>                 "UTR": [
>>                     {
>>                         "object_type": "five_prime_UTR",
>>                         "db_type": "core",
>>                         "assembly_name": "GRCm39",
>>                         "Parent": "ENSMUST00000114246",
>>                         "type": "five_prime_utr",
>>                         "species": "mus_musculus",
>>                         "seq_region_name": "1",
>>                         "strand": 1,
>>                         "id": "ENSMUST00000114246",
>>                         "source": "ensembl_havana",
>>                         "start": 59521583,
>>                         "end": 59522118
>>                     },
>>                     {
>>                         "type": "three_prime_utr",
>>                         "species": "mus_musculus",
>>                         "db_type": "core",
>>                         "object_type": "three_prime_UTR",
>>                         "assembly_name": "GRCm39",
>>                         "Parent": "ENSMUST00000114246",
>>                         "id": "ENSMUST00000114246",
>>                         "source": "ensembl_havana",
>>                         "end": 59526114,
>>                         "start": 59523838,
>>                         "seq_region_name": "1",
>>                         "strand": 1
>>                     }
>>                 ],
>>                 "species": "mus_musculus",
>>                 "strand": 1,
>>                 "start": 59521583,
>>                 "id": "ENSMUST00000114246",
>>                 "source": "ensembl_havana",
>>                 "length": 4532
>>             }
>>         ],
>>         "id": "ENSMUSG00000041075",
>>         "description": "frizzled class receptor 7 [Source:MGI
>> Symbol;Acc:MGI:108570]",
>>         "source": "ensembl_havana",
>>         "assembly_name": "GRCm39",
>>         "display_name": "Fzd7",
>>         "species": "mus_musculus"
>>     }
>> }
>>
>> What would be formulation for the "
>> https://rest.ensembl.org/sequence/region/" for the 5'UTR gene region
>> given above.
>>
>> Below is the step where I am stuck.
>> https://rest.ensembl.org/sequence/region/mus_musculus/
>> <what_goes_here>:59522119..59523837:1?coord_system=seqlevel;content-type=text/x-fasta
>>
>> -Allan.
>>
>>
>
> What would be the url to obtain the five_prime_UTR region?
>
> I have tried the URL below but it finds no slice.
>
> https://rest.ensembl.org/sequence/region/mus_musculus/GRCm39:59521583..59522118:1?coord_system=seqlevel;content-type=text/x-fasta
>
>
> -Allan.
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
>
> --
> Dr. Ben Moore (he/him)
> Ensembl Outreach Manager
>
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge
> CB10 1SD
> UK
> bmoore at ebi.ac.uk
> +44 (0)1223 494265
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/



Thank you Ben,  for the solution and advice.

- Allan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20240424/e0091cee/attachment-0001.html>


More information about the Dev mailing list