[ensembl-dev] Converting cDNA position to genomic position!

Andy Yates ayates at ebi.ac.uk
Fri Nov 23 15:07:32 GMT 2012


Hi Diran,

So the service is returning exactly what you want but the complexities of the reverse strand & the way the website represents this can make it difficult to understand and pin down the numbers. So you queried for the following:

http://beta.rest.ensembl.org/map/cdna/ENST00000218516/1..2

And this returns:

--- 
mappings: 
  - 
    coord_system: chromosome
    end: 100662913
    gap: 0
    rank: 0
    seq_region_name: X
    start: 100662912
    strand: -1


These coordinates are always reported with start always being less than end with a strand. Compare this to the website (http://www.ensembl.org/Homo_sapiens/Transcript/Exons?db=core;t=ENST00000218516) and you will see a correspondence between the REST's end & the transcript's reported end position (Chromosome X: 100,652,791-100,662,913 reverse strand).

Now when you look at the Exons listings the numbers have been reversed to report the exons in the order they are transcribed in. So here exon start is the same as the transcript end:

No.	ID		Start		End		Start Phase	End Phase	Length
1	ENSE00000674005	100,662,913	100,662,698	-		2		216

You can confirm this with the REST API by requesting this exon on the cDNA:

http://beta.rest.ensembl.org/map/cdna/ENST00000218516/1..216

--- 
mappings: 
  - 
    coord_system: chromosome
    end: 100662913
    gap: 0
    rank: 0
    seq_region_name: X
    start: 100662698
    strand: -1

Once again these values are reported with genomic start always being less than genomic end.

Going back to your question:

> Is there a way to make it get all the genomic position/coordinates in a uniform direction irrespective of the strand and direction of transcription?

Yes the code is already reporting the coordinates in a uniform direction; start is always less than end as if it was on the forward strand. It is up to the developer to take note of the strand and to realise this means that:

cDNA		Genomic
1 	== 	100662913
216 	== 	100662698

Hope this helps

Andy

Andrew Yates                   Ensembl Core Software Project Leader
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensembl.org/

On 23 Nov 2012, at 13:25, Oyediran Akinrinade wrote:

> Hello,
> 
> Thanks for the help, it's now working perfectly! However, it seems to be reporting the coordinates for genes transcribed in the reverse direction in a way different from what is available on ensembl.org web interface.
> For instance,
> 
> curl 'http://beta.rest.ensembl.org/map/cdna/ENST00000218516/1..2?'" -H 'Content-type:application/json'
> 
> yields:
> 
> "('ENST00000218516', 1)","{u'mappings': [{u'end': 100662913, u'start': 100662912, u'coord_system': u'chromosome', u'rank': 0, u'gap': 0, u'seq_region_name': u'X', u'strand': -1}]}"
> 
> cDNAcodingStart cDNAcodingEnd 	GenomiccodingStart 	Genomic coding end
> 23 	            216 	100662698 	           100662891
> 
> The "end" in the output however corresponds to the Exon Chr End (bp) from the ensembl.org webpage. Thus reporting the genomic cordinate of cDNA pos 216 (i.e end of 1st coding cDNA) as the Exon Chr Start (bp.
> 
> 
> It however works well for transcripts on the + strand and are transcribed in the forward direction.
> 
> This makes the results unreliable in a way. Is there a way to make it get all the genomic position/coordinates in a uniform direction irrespective of the strand and direction of transcription?
> 
> Thanks for your anticipated support.
> 
> _Diran
> 
> 
> 
> 
> Quoting "Andy Yates" <ayates at ebi.ac.uk>:
> 
>> Hi,
>> 
>> We limit the rest service to a maximum of 3 requests per second over a hour and allow you to burst that limit by 6. If you sleep every 3 requests for the remainder of the second you are on your issue will go e.g.
>> 
>> import time
>> 
>> then = time.time()
>> time.sleep(0.1)
>> now = time.time()
>> diff = now - then
>> if diff <= 1.0:
>>  sleep_time = 1.0 - diff
>>  time.sleep(sleep_time)
>>  print 'I went to sleep for '+str(sleep_time)+' seconds'
>> print diff
>> 
>> 
>> This way you only sleep for as long as you need to. You'll also have to combine this with a request counter to ensure you only sleep every 3 requests.
>> 
>> HTH
>> 
>> Andy
>> 
>> Andrew Yates                   Ensembl Core Software Project Leader
>> EMBL-EBI                       Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK         http://www.ensembl.org/
>> 
>> On 22 Nov 2012, at 14:07, Oyediran Akinrinade wrote:
>> 
>>> Hello,
>>> 
>>> Thanks for the help/info!
>>> 
>>> I tried using the REST API but it keeps stopping without completing. Here is my script for the query:
>>> 
>>> #!/usr/bin/python
>>> 
>>> import httplib2, sys
>>> http = httplib2.Http(".cache")
>>> 
>>> map=[]
>>> f=open("EXTRACTED_FILE.txt","a") ## file open in appending mode i.e 'a'
>>> 
>>> file=open('card_complete.txt', 'rU')
>>> lines=file.readlines()
>>> for line in lines:
>>>   transcript, start = line.split('\t')
>>>   start=int(start)
>>>   end=start+1
>>>   m=(transcript,start)
>>>   map.append(m)
>>>   server = "http://beta.rest.ensembl.org"
>>> #ext = "/map/cdna/ENST00000379802/5513..5514?"
>>> #ext = "/map/cdna/transcript/start..end?"
>>>   ext="/map/cdna/"+transcript+"/"+str(start)+".."+str(end)+"?"
>>>   resp, content = http.request(server+ext, method="GET", headers={"Content-Type":"application/json"})
>>>   if not resp.status == 200:
>>>       print "Invalid response: ", resp.status
>>>       sys.exit()
>>>   import json
>>>   decoded = json.loads(content)
>>>   print "fetching", m, "now"
>>>   #f=open("EXTRACTED_FILE.txt","a") ## file open in appending mode i.e 'a'
>>>   f.write(repr(decoded)) ## writing the contain decoded  to file
>>>   #print repr(decoded)
>>>   #print decoded
>>> f.close()
>>> print "completed"
>>> 
>>> And it only works for few:
>>> 
>>> fetching ('ENST00000070846', 76) now
>>> fetching ('ENST00000070846', 84) now
>>> fetching ('ENST00000070846', 156) now
>>> fetching ('ENST00000070846', 176) now
>>> fetching ('ENST00000070846', 184) now
>>> fetching ('ENST00000070846', 193) now
>>> fetching ('ENST00000070846', 227) now
>>> fetching ('ENST00000070846', 235) now
>>> fetching ('ENST00000070846', 258) now
>>> fetching ('ENST00000070846', 259) now
>>> fetching ('ENST00000070846', 275) now
>>> fetching ('ENST00000070846', 302) now
>>> Invalid response:  429
>>> 
>>> # if not resp.status == 200:
>>>       print "Invalid response: ", resp.status
>>>       sys.exit()
>>> 
>>> Do you have an idea of what could be wrong here?
>>> 
>>> 
>>> 
>>> Quoting "Andy Yates" <ayates at ebi.ac.uk>:
>>> 
>>>> Hi,
>>>> 
>>>> You can also use the REST API which can map from protein, cds and cDNA to genomic locations:
>>>> 
>>>> http://beta.rest.ensembl.org/map/cdna/ENST00000373968/123..554?content-type=application/json
>>>> 
>>>> Brings back
>>>> 
>>>> {"mappings":[{"seq_region_name":"10","gap":0,"coord_system":"chromosome","strand":-1,"rank":0,"end":54531338,"start":54531209},{"seq_region_name":"10","gap":0,"coord_system":"chromosome","strand":-1,"rank":0,"end":54530546,"start":54530430},{"seq_region_name":"10","gap":0,"coord_system":"chromosome","strand":-1,"rank":0,"end":54529075,"start":54529007},{"seq_region_name":"10","gap":0,"coord_system":"chromosome","strand":-1,"rank":0,"end":54528270,"start":54528155}]}
>>>> 
>>>> The REST API is rate limited to 3 mappings per second so using the Perl API will give you a higher throughput
>>>> 
>>>> Andy
>>>> 
>>>> Andrew Yates                   Ensembl Core Software Project Leader
>>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>>> Cambridge CB10 1SD, UK         http://www.ensembl.org/
>>>> 
>>>> On 15 Nov 2012, at 10:59, Will McLaren wrote:
>>>> 
>>>>> Hello Oyediran,
>>>>> 
>>>>> You will need to use a TranscriptMapper object; http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1TranscriptMapper.html
>>>>> 
>>>>> The method you should use is cdna2genomic. Note that this returns an array of coordinate objects; this is useful to know if, for example, your coordinates only partially overlap the transcript.
>>>>> 
>>>>> Here's a bit of code to get you started:
>>>>> 
>>>>> my $reg = 'Bio::EnsEMBL::Registry';
>>>>> $reg->load_registry_from_db(-host => 'ensembldb.ensembl.org',-user => 'anonymous');
>>>>> 
>>>>> my $ta = $reg->get_adaptor('human','core','transcript');
>>>>> 
>>>>> my $t = $ta->fetch_by_stable_id('ENST00000373968');
>>>>> 
>>>>> my $mapper = Bio::EnsEMBL::TranscriptMapper->new($t);
>>>>> 
>>>>> @coords = $trmapper->cdna2genomic( 123, 554 );
>>>>> 
>>>>> Regards
>>>>> 
>>>>> Will McLaren
>>>>> Ensembl Variation
>>>>> 
>>>>> 
>>>>> On 15 November 2012 10:45, Oyediran Akinrinade <oyediran.akinrinade at helsinki.fi> wrote:
>>>>> 
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I have a list of ensembl transcripts IDs with their corresponding cDNA
>>>>> positions and I will like to get their genomic positions using the
>>>>> ensembl API. I have no experience with perl although I have
>>>>> installed ensembl-api on my mac computer. There are about 6000 IDs
>>>>> that I will like to get their genomic cordinates/positions, and
>>>>> web-based queries will not be the best solution. To this end, I
>>>>> write to request for your assistance.
>>>>> 
>>>>> Looking forward to hearing from you soonest.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> -Oyediran
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>> 
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>> 
> 
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list