[ensembl-dev] Perl BLOBs in the ensembl compara database

Andy Jenkinson andy.jenkinson at ebi.ac.uk
Fri Jan 21 17:04:22 GMT 2011


Hi Trevor,

The implementation as it stands essentially requires that store/read operations on the database be done on machines with the same architecture. I guess this could be made more explicit, but bear in mind it will affect use of an API written in any language (including the Perl API itself). Unfortunately I think you are always going to have to keep abreast of changes to the Perl API if you create a Java translation, whether those changes affect the database or not. The extent to which you will have to modify the Java code accordingly is going to depend on how complete your implementation is - I expect there's plenty of pure logic that could harbour bugs or be subject to update!

Cheers,
Andy

On 21 Jan 2011, at 16:42, PATERSON Trevor wrote:

> Thanks Andy(s)
> 
> I may play about with trying to translate the BLOBS next week.
> 
> As you point out the actual type/format of the BLOB is determined when the data is stored by the PerlAPI itself.
> 
> If this does turn out to be OS specific it could be unpleasant to write robust Java code that manages to retrieve floats from the BLOBs on any platform.
> 
> And as the datatype is not determined by the database schema but by the PerlAPI, maintaining Java access code then relies on keeping abreast of changes within the internals of the PerlAPI aswell as following any schema evolution.
> 
> (I'm not liking it :)
> 
> trevor
> 
> 
> 
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> -----Original Message-----
> From: Andy Jenkinson [mailto:andy.jenkinson at ebi.ac.uk] 
> Sent: 21 January 2011 15:46
> To: PATERSON Trevor
> Subject: Re: [ensembl-dev] Perl BLOBs in the ensembl compara database
> 
> Hi Trevor,
> 
> I'm pretty sure the blob will just be a binary representation of a series of floating point numbers, represented depending on the machine used to pack them. That is, it's unlikely to actually be some sort of binary representation of a special Perl data structure (Perl floats aren't objects remember). That is, every 64 bits will be a little endian float (probably). Just unpack them in Java and see what happens.
> 
> Cheers,
> Andy
> 
> On 21 Jan 2011, at 15:29, Andy Yates wrote:
> 
>> Hi Trevor,
>> 
>> There are quite a few instances of BLOB & pack/unpack usage. A quick search of the main APIs shows up hits in Compara, functional genomics and variation. The thing to note here is that the Perl documentation says about the f option in pack:
>> 
>> A single-precision float in native format.
>> 
>> This could mean native format as in a Perl native format (unlikely as the F option explicitly mentions Perl) so that would leave us with native format meaning the endian used. You would have to guess that little-endian is going to be the answer there. 
>> 
>> Have you tried to decode these BLOBs to Java floats yet? Also what 
>> machine type are you on? You may find an issue if you're doing this on 
>> a windows box and will have to force the endian to little (which in 
>> Java means using an nio ByteBuffer IIRC)
>> 
>> Andy
>> 
>> On 21 Jan 2011, at 14:28, PATERSON Trevor wrote:
>> 
>>> I have been trying to get to grips with the Compara schema in order to think about writing Java libraries to access the data...
>>> 
>>> However it appears that some of the data in Compara is more intimately wedded to Perl than I had hoped!
>>> 
>>> Looking at Genomic Alignment data, the Conservation Score values (which I think can be a variable length array of floats) are stored as BLOBS, packed internal representations of Perl floats.... and therefore require Perl to unpack them.
>>> 
>>> quickly scanning through the schema I don't see any other fields of type BLOB. 
>>> 
>>> My understanding is that these values are probably dumped here using 'pack' as a quick 'hack' to avoid having to deal with variable length arrays.
>>> 
>>> Unfortunately it does, however, rather tie the data to Perl.
>>> 
>>> Is this a design decision  - or just a historical accident?
>>> Are there (or will there be.. ) any other examples of Perl BLOBs in Ensembl?
>>> 
>>> 
>>> cheers
>>> 
>>> Trevor
>>> 
>>> 
>>> 
>>> mysql> describe conservation_score;
>>> +------------------------+----------------------+------+-----+---------+-------+
>>> | Field                  | Type                 | Null | Key | Default | Extra |
>>> +------------------------+----------------------+------+-----+---------+-------+
>>> | genomic_align_block_id | bigint(20) unsigned  | NO   | MUL | NULL    |       |
>>> | window_size            | smallint(5) unsigned | NO   |     | NULL    |       |
>>> | position               | int(10) unsigned     | NO   |     | NULL    |       |
>>> | expected_score         | blob                 | YES  |     | NULL    |       |
>>> | diff_score             | blob                 | YES  |     | NULL    |       |
>>> +------------------------+----------------------+------+-----+---------+-------+
>>> 
>>> 
>>> 
>>> Trevor Paterson PhD
>>> email trevor.paterson at roslin.ed.ac.uk 
>>> <mailto:trevor.paterson at roslin.ed.ac.uk>
>>> 
>>> Bioinformatics
>>> The Roslin Institute
>>> The Royal (Dick) School of Veterinary Studies University of Edinburgh 
>>> Scotland EH25 9PS phone +44 (0)131 5274197 
>>> http://bioinformatics.roslin.ed.ac.uk/ 
>>> <http://bioinformatics.roslin.ed.ac.uk/>
>>> 
>>> Please consider the environment before printing this e-mail
>>> 
>>> The University of Edinburgh is a charitable body, registered in 
>>> Scotland with registration number SC005336 Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
>>> 
>>> 
>>> --
>>> The University of Edinburgh is a charitable body, registered in 
>>> Scotland, with registration number SC005336.
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list
>>> Dev at ensembl.org
>>> http://lists.ensembl.org/mailman/listinfo/dev
>> 
>> -- 
>> Andrew Yates                   Ensembl Genomes Engineer
>> EMBL-EBI                       Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
> 
> 
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev





More information about the Dev mailing list