[ensembl-dev] Perl BLOBs in the ensembl compara database

Andy Yates ayates at ebi.ac.uk
Fri Jan 21 15:29:38 GMT 2011


Hi Trevor,

There are quite a few instances of BLOB & pack/unpack usage. A quick search of the main APIs shows up hits in Compara, functional genomics and variation. The thing to note here is that the Perl documentation says about the f option in pack:

A single-precision float in native format.

This could mean native format as in a Perl native format (unlikely as the F option explicitly mentions Perl) so that would leave us with native format meaning the endian used. You would have to guess that little-endian is going to be the answer there. 

Have you tried to decode these BLOBs to Java floats yet? Also what machine type are you on? You may find an issue if you're doing this on a windows box and will have to force the endian to little (which in Java means using an nio ByteBuffer IIRC)

Andy

On 21 Jan 2011, at 14:28, PATERSON Trevor wrote:

> I have been trying to get to grips with the Compara schema in order to think about writing Java libraries to access the data...
> 
> However it appears that some of the data in Compara is more intimately wedded to Perl than I had hoped!
> 
> Looking at Genomic Alignment data, the Conservation Score values (which I think can be a variable length array of floats) are stored as BLOBS, packed internal representations of Perl floats.... and therefore require Perl to unpack them.
> 
> quickly scanning through the schema I don't see any other fields of type BLOB. 
> 
> My understanding is that these values are probably dumped here using 'pack' as a quick 'hack' to avoid having to deal with variable length arrays.
> 
> Unfortunately it does, however, rather tie the data to Perl.
> 
> Is this a design decision  - or just a historical accident?
> Are there (or will there be.. ) any other examples of Perl BLOBs in Ensembl?
> 
> 
> cheers
> 
> Trevor
> 
> 
> 
> mysql> describe conservation_score;
> +------------------------+----------------------+------+-----+---------+-------+
> | Field                  | Type                 | Null | Key | Default | Extra |
> +------------------------+----------------------+------+-----+---------+-------+
> | genomic_align_block_id | bigint(20) unsigned  | NO   | MUL | NULL    |       |
> | window_size            | smallint(5) unsigned | NO   |     | NULL    |       |
> | position               | int(10) unsigned     | NO   |     | NULL    |       |
> | expected_score         | blob                 | YES  |     | NULL    |       |
> | diff_score             | blob                 | YES  |     | NULL    |       |
> +------------------------+----------------------+------+-----+---------+-------+
> 
> 
> 
> Trevor Paterson PhD
> email trevor.paterson at roslin.ed.ac.uk <mailto:trevor.paterson at roslin.ed.ac.uk> 
> 
> Bioinformatics 
> The Roslin Institute
> The Royal (Dick) School of Veterinary Studies
> University of Edinburgh
> Scotland EH25 9PS
> phone +44 (0)131 5274197
> http://bioinformatics.roslin.ed.ac.uk/ <http://bioinformatics.roslin.ed.ac.uk/> 
> 
> Please consider the environment before printing this e-mail
> 
> The University of Edinburgh is a charitable body, registered in Scotland with registration number SC005336
> Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/








More information about the Dev mailing list