[ensembl-dev] Conversion of ENA xref ids into working URLs

Dan Staines dstaines at ebi.ac.uk
Fri Apr 22 10:27:04 BST 2016


Hi Dimitry,

> KE159678.1:CDS:7103..8458
>
> GG666297.1:CDS:complement(18707..20086)

These are provided as tracking IDs to indicate the piece of INSDC 
annotation from which the Ensembl object was loaded. INSDC do not have 
feature level identifiers, so on the advice of ENA these strings are 
constructed to provide at least some way to find the original piece of 
data from which the Ensembl object was constructed.

These identifiers are of the form accession:feature_type:location e.g.
GG666297.1:CDS:complement(18707..20086)
which refers to this feature:
FT   CDS             complement(18707..20086)
FT                   /codon_start=1
FT                   /transl_table=11
FT                   /locus_tag="HMPREF0077_0851"
FT                   /product="ATPase/histidine kinase/DNA gyrase 
B/HSP90 domain
...
FT                   /protein_id="EEI83065.1"
...
from this expanded CON (i.e an entry composed of multiple sub-entries):
http://www.ebi.ac.uk/ena/data/view/GG666297&display=text&expanded=true

I'm not aware of any way to resolve these automatically in ENA - there 
is a REST service which can return entire entries in text or XML but it 
would return the whole record, not just the feature.

If you're only interested in CDS features, you can also use the 
protein_id identifier (in this case EEI83065.1) which is an xref on the 
Ensembl transcript to link to ENA e.g.
http://www.ebi.ac.uk/ena/data/view/EEI83065.1

Regarding linking to ENA, my best advice is for you to contact ENA 
directly at datasubs at ebi.ac.uk.

Sorry I can't be of more help.

Dan.

-- 
Dan Staines, PhD
Genomics Technology Infrastructure Coordinator
EMBL-EBI, Wellcome Trust Genome Campus
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-492507




More information about the Dev mailing list