[ensembl-dev] pseudogenes in ensembl bacteria

Adam Witney awitney at sgul.ac.uk
Wed Aug 28 15:14:03 BST 2013



On 21. 8. 2013 15:57, Dan Staines wrote:
> On 08/21/2013 03:42 PM, Adam Witney wrote:
>>>
>>
>> Actually it looks like they are generally within the
>> $simple_feature->display_label() field, but in some cases this seems to
>> have been truncated? e.g. gene_feature:1633
>
> If you look at the fasta header generated by this script, its of this form:
>  ><id> <location string> <description>
> id is the gene stable ID for those genes loaded as "proper" Ensembl gene
> - this is the same as the locus_tag from the INSDC entry. In the cases
> of the subset of pseudogenes which not loaded as full genes, but as
> simple_features, we don't have the locus_tags stored anywhere in the
> database. In this case, id is gene_feature:<dbid> where <dbid> is the
> internal surrogate key (simple_feature.simple_feature_id) used by
> Ensembl. As I said, you can modify this format easily enough.

Hi Dan,

Sorry to reopen this, but are there any reasons for the mismatches 
between genomes on the ftp site and in the database? e.g. Using your script

https://gist.github.com/danstaines/6293539

This genome is not extracted?

ftp.ensemblgenomes.org/pub/current/bacteria/fasta/bacteria_25_collection/haemophilus_parainfluenzae_hk2019

Thanks

Adam




More information about the Dev mailing list