[ensembl-dev] pseudogenes in ensembl bacteria
Adam Witney
awitney at sgul.ac.uk
Wed Aug 21 15:37:33 BST 2013
On 21. 8. 2013 13:06, Dan Staines wrote:
>
>>> Hi Dan,
>>>
>>> I am not too familiar with the core database schema, but what I would
>>> like to be able to do is extract all gene sequences (including
>>> pseudogenes) into a multi fasta file. What would be the easiest way for
>>> me to do this?
>>
>> I'll devise a script for you...
>>
>
> OK, here's an example script using the ensemblgenomes and ensembl Perl
> APIs to dump FASTA files containing all gene and "gene" simple_feature
> sequences for each genome matching haemophilus_.*:
> https://gist.github.com/danstaines/6293539
>
> You should be able to customise this to fit your exact needs (e,g,
> contents of the header, or the range of genomes involved) but please let
> me know if you're not sure how to get a specific bit of information from
> Ensembl. One thing to note is that the "gene" simple_features stored
> don't include the locus_tag as a very generic approach is taken to
> storing these.
>
> Hope this help - let me know if I can do anything else to help.
Thanks for that Dan. Are the pseudogene labels stored anywhere in the
database? ie the label HI_0006 in haemophilus_influenzae_rd_kw20.fa is
not present, but I guess it is one of the gene_feature's at the bottom.
Thanks
Adam
More information about the Dev
mailing list