[ensembl-dev] pseudogenes in ensembl bacteria
Dan Staines
dstaines at ebi.ac.uk
Wed Aug 21 13:06:27 BST 2013
>> Hi Dan,
>>
>> I am not too familiar with the core database schema, but what I would
>> like to be able to do is extract all gene sequences (including
>> pseudogenes) into a multi fasta file. What would be the easiest way for
>> me to do this?
>
> I'll devise a script for you...
>
OK, here's an example script using the ensemblgenomes and ensembl Perl
APIs to dump FASTA files containing all gene and "gene" simple_feature
sequences for each genome matching haemophilus_.*:
https://gist.github.com/danstaines/6293539
You should be able to customise this to fit your exact needs (e,g,
contents of the header, or the range of genomes involved) but please let
me know if you're not sure how to get a specific bit of information from
Ensembl. One thing to note is that the "gene" simple_features stored
don't include the locus_tag as a very generic approach is taken to
storing these.
Hope this help - let me know if I can do anything else to help.
Dan.
--
Dan Staines, PhD
Technical Coordinator, Ensembl Genomes
European Bioinformatics Institute (EMBL-EBI)
http://www.ebi.ac.uk/
http://www.ensemblgenomes.org/
More information about the Dev
mailing list