[ensembl-dev] Transcript Biotype
Kieron Taylor
ktaylor at ebi.ac.uk
Tue Apr 23 10:19:27 BST 2019
>
> On 11 Apr 2019, at 11:36, Olson, Andrew <olson at cshl.edu> wrote:
>
> Is there a mapping between biotypes and Sequence Ontology terms?
>
> Andrew
>
Hi Andrew,
For several years now, the Havana annotation team (the originator of "biotype") have collaborated with Sequence Ontology maintainers to ensure approximate parity between the two nomenclatures. Where possible, there should be exact matches between the biotype and the stringified SO term name (not the SO:accession). Sadly the two vocabularies are not in complete agreement.
Ensembl now maintains a mapping between the two which is accessible in several ways:
1) REST API https://rest.ensembl.org/documentation/info/biotypes_name
2) master_biotype held in ensembl_production DB as found on our public MySQL servers
3) Perl API access via the BioType adaptor: https://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BiotypeAdaptor.html
I have attached the results from release 96 below to illustrate. The functionality was only added in the last few releases, so it is not available in the majority of archives. The mappings are fairly stable, so you can perhaps use contemporary biotype/SO-term mappings for inference on older data. In release 97 the SO term name will be present in the REST API responses along with the accession.
Hopefully that is useful to you and other users.
Regards,
Kieron Taylor
Ensembl Developer
EMBL-EBI
---------------------------------------------------------
SELECT name, so_acc FROM master_biotype;
IG_C_gene SO:0001217
IG_C_gene SO:0000478
IG_D_gene SO:0001217
IG_D_gene SO:0000458
IG_J_gene SO:0001217
IG_J_gene SO:0000470
IG_J_pseudogene SO:0000336
IG_J_pseudogene SO:0000516
IG_V_gene SO:0001217
IG_V_gene SO:0000466
IG_V_pseudogene SO:0000336
IG_V_pseudogene SO:0000516
IG_gene SO:0001217
IG_gene SO:3000000
IG_pseudogene SO:0000336
IG_pseudogene SO:0000516
LRG_gene NULL
LRG_gene NULL
Mt_rRNA SO:0001263
Mt_rRNA SO:0000252
Mt_tRNA SO:0001263
Mt_tRNA SO:0000253
Mt_tRNA_pseudogene SO:0000336
Mt_tRNA_pseudogene SO:0000516
RNA-Seq_gene NULL
RNA-Seq_gene NULL
TEC NULL
TEC SO:0002139
TR_gene SO:0001217
TR_gene SO:3000000
TR_pseudogene SO:0000336
TR_pseudogene SO:0000516
ambiguous_orf SO:0001877
cdna_update NULL
cdna_update NULL
cdna SO:0000756
cdna SO:0000756
disrupted_domain SO:0000516
est NULL
est SO:0000345
lincRNA SO:0001263
lincRNA SO:0001877
miRNA SO:0001263
miRNA SO:0000276
miRNA_pseudogene SO:0000336
miRNA_pseudogene SO:0000516
misc_RNA SO:0001263
misc_RNA SO:0000655
misc_RNA_pseudogene SO:0000336
misc_RNA_pseudogene SO:0000516
ncRNA SO:0001263
ncRNA SO:0000655
non_coding SO:0001263
non_coding SO:0001877
nonsense_mediated_decay SO:0000234
polymorphic SO:0001217
polymorphic_pseudogene SO:0001217
polymorphic_pseudogene SO:0000234
processed_pseudogene SO:0000336
processed_pseudogene SO:0000516
processed_transcript SO:0001263
processed_transcript SO:0001877
protein_coding SO:0001217
protein_coding SO:0000234
pseudogene SO:0000336
pseudogene SO:0000516
rRNA SO:0001263
rRNA SO:0000252
rRNA_pseudogene SO:0000336
rRNA_pseudogene SO:0000516
retained_intron SO:0001877
retrotransposed SO:0000569
retrotransposed SO:0000569
scRNA_pseudogene SO:0000336
scRNA_pseudogene SO:0000516
snRNA SO:0001263
snRNA SO:0000274
snRNA_pseudogene SO:0000336
snRNA_pseudogene SO:0000516
snlRNA SO:0001263
snlRNA SO:0000274
snoRNA SO:0001263
snoRNA SO:0000275
snoRNA_pseudogene SO:0000336
snoRNA_pseudogene SO:0000516
tRNA SO:0001263
tRNA SO:0000253
tRNA_pseudogene SO:0000336
tRNA_pseudogene SO:0000516
transcribed_processed_pseudogene SO:0000336
transcribed_processed_pseudogene SO:0000516
transcribed_unitary_pseudogene SO:0000336
transcribed_unprocessed_pseudogene SO:0000336
transcribed_unprocessed_pseudogene SO:0000516
unitary_pseudogene SO:0000336
unitary_pseudogene SO:0000516
unprocessed_pseudogene SO:0000336
unprocessed_pseudogene SO:0000516
ccds_gene NULL
protein_coding_in_progress NULL
IG_Z_gene SO:0001217
IG_M_gene SO:0001217
ncRNA_host NULL
TR_V_pseudogene SO:0000336
TR_V_gene SO:0001217
IG_C_pseudogene SO:0000336
TR_C_gene SO:0001217
TR_J_gene SO:0001217
TR_V_pseudogene SO:0000516
TR_V_gene SO:0000466
IG_C_pseudogene SO:0000516
TR_C_gene SO:0000478
TR_J_gene SO:0000470
protein_coding_in_progress NULL
IG_M_gene SO:3000000
IG_Z_gene SO:3000000
3prime_overlapping_ncRNA SO:0002120
antisense_RNA SO:0001263
antisense_RNA SO:0001877
scRNA SO:0001263
scRNA SO:0000013
RNase_MRP_RNA SO:0001263
RNase_MRP_RNA SO:0000385
RNase_P_RNA SO:0001263
RNase_P_RNA SO:0000386
telomerase_RNA SO:0001263
telomerase_RNA SO:0000390
sense_intronic SO:0001877
sense_overlapping SO:0001877
sense_intronic SO:0001263
ambiguous_orf SO:0001263
retained_intron SO:0001263
3prime_overlapping_ncRNA NULL
ncRNA_host NULL
sense_overlapping SO:0001263
TR_D_gene SO:0000458
TR_J_pseudogene SO:0000516
TR_D_gene SO:0001217
TR_J_pseudogene SO:0000336
ncbi_pseudogene SO:0000336
ncbi_pseudogene SO:0000516
ncbigene NULL
non_stop_decay SO:0000234
pre_miRNA SO:0001244
tmRNA SO:0001263
tmRNA SO:0000584
SRP_RNA SO:0001263
SRP_RNA SO:0000590
ribozyme SO:0001877
ncRNA_pseudogene SO:0000336
ncRNA_pseudogene SO:0000516
IG_LV_gene SO:0001217
IG_LV_gene SO:3000000
translated_processed_pseudogene SO:0000336
translated_processed_pseudogene SO:0000516
nontranslating_CDS SO:0001217
nontranslating_CDS SO:0000234
translated_unprocessed_pseudogene SO:0000336
translated_unprocessed_pseudogene SO:0000516
mRNA SO:0001217
mRNA SO:0000234
pre_miRNA SO:0001263
artifact NULL
artifact NULL
lncRNA SO:0001263
class_I_RNA SO:0001263
class_I_RNA SO:0000990
class_II_RNA SO:0001263
class_II_RNA SO:0000989
known_ncRNA NULL
known_ncRNA SO:0000655
transcribed_unitary_pseudogene SO:0000516
piRNA SO:0001263
piRNA SO:0001035
IG_D_pseudogene SO:0000336
macro_lncRNA SO:0001263
vaultRNA SO:0001263
scaRNA SO:0001263
scaRNA SO:0000013
sRNA SO:0000274
sRNA SO:0001263
CRISPR SO:0001263
CRISPR SO:0001459
antitoxin SO:0001877
antitoxin SO:0001263
ribozyme SO:0001263
vaultRNA SO:0002040
macro_lncRNA SO:0001877
IG_D_pseudogene SO:0000516
guide_RNA SO:0001263
guide_RNA SO:0000602
Y_RNA SO:0001263
Y_RNA SO:0000405
transposable_element SO:0000101
transposable_element SO:0000111
bidirectional_promoter_lncRNA NULL
bidirectional_promoter_lncRNA SO:0002185
unknown_likely_coding NULL
unknown_likely_coding NULL
other NULL
lncRNA SO:0001877
aligned_transcript NULL
aligned_transcript NULL
antisense SO:0001263
antisense SO:0001877
vault_RNA SO:0001263
vault_RNA SO:0001877
rnaseq_putative_cds NULL
rnaseq_putative_cds NULL
transcribed_pseudogene NULL
transcribed_pseudogene NULL
More information about the Dev
mailing list