[ensembl-dev] VEP offline script: Checking/creating FASTA index fails to find existing index

mag mr6 at ebi.ac.uk
Mon Oct 20 16:30:08 BST 2014


Hi Cyriac,

As Will said, this is a Bioperl issue.

The module Bio::DB::Fasta is responsible for the indexing
According to the documentation 
(http://search.cpan.org/dist/BioPerl-1.6.901/Bio/DB/Fasta.pm), it will 
use the AnyDBM module to know how to index the file

The type of index created seems to depend on the environment you're 
running in.
We have noticed the creation of .pag and .dir indexes in limited linux 
distributions (for example VMs) which might be missing the required 
executables
DB::Fasta is then unable to identify this as a correct index and keeps 
re-indexing the file although nothing has changed

One workaround is to manually edit your DB::Fasta file, by removing the 
force_index
-  my $reindex = $force_reindex || $indextime < $modtime;
+  my $reindex = 0; # $force_reindex || $indextime < $modtime;
It does mean though that it will not pick up if your file has changed, 
so you would need to edit this every time you get a new fasta file

If you can find a working solution, I would be interested to hear about it.


Regards,
Magali

On 20/10/2014 09:29, Will McLaren wrote:
>
> Hi Cyriac,
>
> This is not something I've come across before; the FASTA indexing is 
> performed by code that we do not maintain (the Bio::DB::Fasta module 
> is part of the BioPerl package).
>
> Which version of BioPerl are you using (there are known issues with 
> 1.2.3, though not this issue AFAIK, the VEP installs 1.6.0)? And are 
> you using a single FASTA file or a directory containing multiple FASTA 
> files?
>
> For VEP it is normal that it just generates the .fa.index file; I have 
> never seen the other two you mention (perhaps they appear with a 
> directory of files rather than a single .fa).
>
> I'd try removing the indexes and reindexing, or removing the .fa file 
> and re-downloading/re-generating it.
>
> HTH
>
> Will
>
> On 18 Oct 2014 02:33, "Cyriac Kandoth" <kandoth at cbio.mskcc.org 
> <mailto:kandoth at cbio.mskcc.org>> wrote:
>
>     Hi Devs,
>
>     The code to check whether a FASTA index needs to be created, looks
>     for a file with extension ".fa.index". However, (and this may be
>     recent) the indexes created are files named ".fa.index.dir" and
>     ".fa.index.pag". I haven't checked the code to confirm this. I'm
>     assuming this is the case, since VEP appears to index the FASTA
>     everytime it runs, unless I create a copy of ".fa.index.pag" with
>     extension ".fa.index".
>
>     Cheers!
>
>     ~Cyriac
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141020/0de3c50f/attachment.html>


More information about the Dev mailing list