[ensembl-dev] Conversion of ENA xref ids into working URLs

Dmitry Kuznetsov Dmitry.Kuznetsov at isb-sib.ch
Fri Apr 22 12:19:59 BST 2016


Dan and Guy, thanks for your answers and suggestions.

 

>You can resolve to ENA subsequences in HTML using http://www.ebi.ac.uk/ena/data/view/GG666297 <http://www.ebi.ac.uk/ena/data/view/GG666297&range=18707-20086> &range=18707-20086
is there any way to rather end up at this [equivalent] URL: http://www.ebi.ac.uk/ena/data/view/ACGC01000047

?

 

Thanks,

Dmitry.

 

 

From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Guy Cochrane EBI
Sent: Friday, April 22, 2016 12:37
To: Ensembl developers list
Cc: Nicole Silvester
Subject: Re: [ensembl-dev] Conversion of ENA xref ids into working URLs

 

 

On 22 Apr 2016, at 10:27, Dan Staines <dstaines at ebi.ac.uk> wrote:

 

Hi Dimitry,




KE159678.1:CDS:7103..8458

GG666297.1:CDS:complement(18707..20086)


These are provided as tracking IDs to indicate the piece of INSDC annotation from which the Ensembl object was loaded. INSDC do not have feature level identifiers, so on the advice of ENA these strings are constructed to provide at least some way to find the original piece of data from which the Ensembl object was constructed.

These identifiers are of the form accession:feature_type:location e.g.
GG666297.1:CDS:complement(18707..20086)
which refers to this feature:
FT   CDS             complement(18707..20086)
FT                   /codon_start=1
FT                   /transl_table=11
FT                   /locus_tag="HMPREF0077_0851"
FT                   /product="ATPase/histidine kinase/DNA gyrase B/HSP90 domain
...
FT                   /protein_id="EEI83065.1"
...
from this expanded CON (i.e an entry composed of multiple sub-entries):
http://www.ebi.ac.uk/ena/data/view/GG666297 <http://www.ebi.ac.uk/ena/data/view/GG666297&display=text&expanded=true> &display=text&expanded=true

I'm not aware of any way to resolve these automatically in ENA - there is a REST service which can return entire entries in text or XML but it would return the whole record, not just the feature.

 

You can resolve to ENA subsequences in HTML using http://www.ebi.ac.uk/ena/data/view/GG666297 <http://www.ebi.ac.uk/ena/data/view/GG666297&range=18707-20086> &range=18707-20086 (see http://www.ebi.ac.uk/ena/browse/data-retrieval-rest#sequence_region_html) Other formats available here. This doesn’t consider complement, rather it simply finds the sub-sequence and spanned features will be shown in complement if this is where they lie.

 


If you're only interested in CDS features, you can also use the protein_id identifier (in this case EEI83065.1) which is an xref on the Ensembl transcript to link to ENA e.g.
http://www.ebi.ac.uk/ena/data/view/EEI83065.1

Indeed - protein IDs can be used for this. For non-coding features, an addressing system is used that combines accession version, feature type and locations. E.g. http://www.ebi.ac.uk/ena/data/view/Non-coding:KT289404.1:11505..11592:tRNA. Note that the ‘domain’ of data is also used here (’Non-coding’) and can be considered part of the ID for the feature. Thinking of it this way allows you to consider the generic pattern to be http://www.ebi.ac.uk/ena/data/view/ <http://www.ebi.ac.uk/ena/data/view/%3cENA_identifier%3e> <ENA_identifier>.

 

You could also see these non-coding features by coordinates only (http://www.ebi.ac.uk/ena/data/view/KT289404 <http://www.ebi.ac.uk/ena/data/view/KT289404&range=11505-11592> &range=11505-11592) as in the CDS example above.

 

 

So - in all cases a little syntactic transformation to build these links, at least from your examples it seems that the information is there.





Regarding linking to ENA, my best advice is for you to contact ENA directly at datasubs at ebi.ac.uk.

 

Yes - broader documentation at http://www.ebi.ac.uk/ena/browse/data-retrieval-rest and please do contact us at datasubs at ebi.ac.uk.

 

Thanks,

 

Guy.

 

 

 





Sorry I can't be of more help.

Dan.

-- 
Dan Staines, PhD
Genomics Technology Infrastructure Coordinator
EMBL-EBI, Wellcome Trust Genome Campus
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-492507

_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

 

 

------------------------------------------------------------------------------------

Guy Cochrane, PhD

European Nucleotide Archive Team Leader.

 

European Bioinformatics Institute (EMBL-EBI)

European Molecular Biology Laboratory

Wellcome Trust Genome Campus

Hinxton

Cambridge CB10 1SD

United Kingdom

 

Tel: +44 1223 494444. Fax: +44 1223 494472

 


datasubs at ebi.ac.uk (data submissions and general enquiries), 

http://www.ebi.ac.uk/ena (submissions, updates, services, info)
------------------------------------------------------------------------------------

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160422/f81804a3/attachment.html>


More information about the Dev mailing list