[ensembl-dev] RFAM annotations inconsistent

Ben Moore bmoore at ebi.ac.uk
Wed Jun 17 14:22:10 BST 2020


Hi Sabine,

This is actually a long standing issue that we have encountered before that, unfortunately, we have not been able to correct. Thank you for bringing this to our attention and providing examples and help us fix this issue.

We will remove the Rfam annotations from our databases to prevent incorrect data being displayed. This means the data will be removed from Ensembl release 102 onwards (Autumn 2020). We suspect that something is systematically wrong with the code used to load this data and we intend to extensively investigate this issue in the long term, so that we are able to eventually present this data correctly.

Best wishes

Ben

> On 8 Jun 2020, at 16:04, Sabine Reißer <sabine.reisser at mdc-berlin.de> wrote:
> 
> Dear Ensembl developers,
> 
> I've come across some inconsistencies regarding RFAM families of bacterial sRNA in bacteria.ensembl.
> 
> If I look for the sRNA ArcZ in ensembl, I get as a result e.g.
> 
> EBT00001618347 citrobacter_koseri_atcc_baa_895. This transcript has as annotation method: "Non-coding RNA gene models based on alignment by of RFAM families to genomic sequences (alignments provided by RFAM)"
> 
> I find this transcript also in the ArcZ family on RFAM: https://rfam.xfam.org/family/RF00081#tabview=tab1
> 
> However, the sequence at the ensembl coordinates is reverse complement to the (correct) sequence in RFAM. In this case, the correct sequence is on the forward strand, while the ensembl coordinates give the backward strand.
> 
> I encountered several such cases where the correct sequence can be on any strand. EBT00001534313 citrobacter_rodentium_icc168 is an example were the correct strand is backward but ensembl says forward.
> 
> Is it possible that there's simply a sign error on the annotation import from RFAM? Or am I missing something?
> 
> It would be great if you could check this.
> 
> 
> With best regards
> 
> Sabine
> 
> 
> 
> -- 
> Dr. Sabine Reißer
> Postdoctoral researcher
> 
> Bioinformatics of RNA Structure and Transcriptome Regulation
> Berlin Institute for Medical Systems Biology
> Max Delbrück Center for Molecular Medicine in the Helmholtz Association
> Hannoversche Str. 28, 10115 Berlin, Germany
> 
> Tel.: +49 30 9406-3294
> 
> sabine.reisser at mdc-berlin.de
> www.mdc-berlin.de
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/

Ben Moore
Ensembl Outreach Officer

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge
CB10 1SD
UK

bmoore at ebi.ac.uk
+44 (0)1223 494265

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200617/942f167b/attachment.html>


More information about the Dev mailing list