[ensembl-dev] pseudogenes in ensembl bacteria

Adam Witney awitney at sgul.ac.uk
Wed Aug 21 16:40:19 BST 2013



On 21. 8. 2013 15:57, Dan Staines wrote:
> On 08/21/2013 03:42 PM, Adam Witney wrote:
>>>
>>
>> Actually it looks like they are generally within the
>> $simple_feature->display_label() field, but in some cases this seems to
>> have been truncated? e.g. gene_feature:1633
>
> If you look at the fasta header generated by this script, its of this form:
>  ><id> <location string> <description>
> id is the gene stable ID for those genes loaded as "proper" Ensembl gene
> - this is the same as the locus_tag from the INSDC entry. In the cases
> of the subset of pseudogenes which not loaded as full genes, but as
> simple_features, we don't have the locus_tags stored anywhere in the
> database. In this case, id is gene_feature:<dbid> where <dbid> is the
> internal surrogate key (simple_feature.simple_feature_id) used by
> Ensembl. As I said, you can modify this format easily enough.

Yes, sorry, I meant that if you look at the end of the description field 
for rd_kw20, then these pseudogenes have the locus_tag within that 
description field, although it is truncated in some cases.

Adam




More information about the Dev mailing list