[ensembl-dev] pseudogenes in ensembl bacteria
    Adam Witney 
    awitney at sgul.ac.uk
       
    Wed Aug 28 15:14:03 BST 2013
    
    
  
On 21. 8. 2013 15:57, Dan Staines wrote:
> On 08/21/2013 03:42 PM, Adam Witney wrote:
>>>
>>
>> Actually it looks like they are generally within the
>> $simple_feature->display_label() field, but in some cases this seems to
>> have been truncated? e.g. gene_feature:1633
>
> If you look at the fasta header generated by this script, its of this form:
>  ><id> <location string> <description>
> id is the gene stable ID for those genes loaded as "proper" Ensembl gene
> - this is the same as the locus_tag from the INSDC entry. In the cases
> of the subset of pseudogenes which not loaded as full genes, but as
> simple_features, we don't have the locus_tags stored anywhere in the
> database. In this case, id is gene_feature:<dbid> where <dbid> is the
> internal surrogate key (simple_feature.simple_feature_id) used by
> Ensembl. As I said, you can modify this format easily enough.
Hi Dan,
Sorry to reopen this, but are there any reasons for the mismatches 
between genomes on the ftp site and in the database? e.g. Using your script
https://gist.github.com/danstaines/6293539
This genome is not extracted?
ftp.ensemblgenomes.org/pub/current/bacteria/fasta/bacteria_25_collection/haemophilus_parainfluenzae_hk2019
Thanks
Adam
    
    
More information about the Dev
mailing list