[ensembl-dev] Ensembl REST API | map/:species/:asm_one/:region/:asm_two

Kieron Taylor ktaylor at ebi.ac.uk
Mon Oct 28 11:28:02 GMT 2019


Dear Ramiro,

I'm sorry but it looks like this question got lost in your flurry of other enquiries.

The core API reports the following projection:

chromosome:GRCh37:X:10001:60001:1
chromosome:GRCh38:X:10001:10001:1

As near as I can tell, the GRCh37->GRCh38 mapping only starts at 60001, meaning only one base pair of your request can map to the other assembly, thus the result it gives you is 10001-10001, plus the coordinates of the original base that mapped. A request such as 
https://rest.ensembl.org/map/homo_sapiens/GRCh37/X:10001..60000:1/GRCh38/?coord_system=chromosome;target_coord_system=chromosome returns no results.

The assembly mapping provided via this endpoint is exact sequence matches, i.e. the contig sequences from GRCh37 that were retained in GRCh38. Anything else where fuzzy matching would be required is not available via this route. The degree of similarity required varies too much on use-case for us to provide all permutations of possible alignments.

Feel free to ask further questions. 

Regards,

Kieron


Kieron Taylor PhD.
Ensembl Developer

EMBL, European Bioinformatics Institute






> On 24 Sep 2019, at 10:17, Ramiro Magno <ramiro.magno at gmail.com> wrote:
> 
> Hi Devs,
> 
> When can I expect an update on this question?
> 
> Thank you very much indeed.
> 
> Best regards,
> 
> Ramiro Magno
> 
> On Fri, 13 Sep 2019 at 16:25, Ramiro Magno <ramiro.magno at gmail.com> wrote:
> Hi Devs,
> 
> I am trying to understand here the liftover functionality provided by the "map/:species/:asm_one/:region/:asm_two" endpoint.
> 
> I am getting unexpected results with this endpoint. It could be just because of my expectation which is not correct but I'd appreciate some clarification.
> 
> In attachment you find a csv file with results organised as a table. I've asked the api to map some coordinates from GRCh37 to GRCh38. Each column ending with "_0" indicates query input data, those columns ending in "_1" indicate data I obtained in the json data field named "original", and those ending in "_2" indicate data obtained from the json data field named "mapped".
> 
> In summary:
> 
> - Columns "*_0" are my query values, i.e. GRCh37.
> - Columns "*_1" are data from the "original" json object, I expected the same values as used in the query, i.e. coordinates in GRCh37 assembly.
> - Columns "*_1" are data from the "mapped" json object, the new coordinates as mapped onto the GRCh38 assembly.
> 
> I expected coordinates in columns "start_0" and "start_1" to be the same, and in columns "end_0" and "end_1" also to be the same. However, this is not always the case... It seems that, at times, the values returned in "original" and "mapped" have been swapped... I am really confused...
> 
> Here's an example: https://rest.ensembl.org/map/homo_sapiens/GRCh37/X:10001..60001:1/GRCh38/?coord_system=chromosome;target_coord_system=chromosome
> 
> --- 
> mappings: 
>   - 
>     mapped: 
>       assembly: GRCh38
>       coord_system: chromosome
>       end: 10001
>       seq_region_name: X
>       start: 10001
>       strand: 1
>     original: 
>       assembly: GRCh37
>       coord_system: chromosome
>       end: 60001
>       seq_region_name: X
>       start: 60001
>       strand: 1
> 
> 
> Many thanks in advance.
> 
> All the best, RM
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list