[ensembl-dev] database model and API versions

Will Chow wc2 at sanger.ac.uk
Tue Apr 28 10:44:27 BST 2015


Just a curiosity question, on 3337, what is being updated?  Is it just schema changes?  I guess with the static organism databases used for compara, compara itself doesn’t require any update, unless there are updates to the human gene build affecting maybe gene trees?

thanks.

Will


On Apr 28, 2015, at 10:25 AM, mag <mr6 at ebi.ac.uk<mailto:mr6 at ebi.ac.uk>> wrote:

Hi Duarte,

The VEP --assembly flag uses the solution I suggested initially, which is to have the two databases on two separate servers.
By specifying --assembly GRCh37, the default 3306 port is replaced by the 3337 port, which is where the GRCh37 databases are hosted.
It is worth noting as well that only the human databases on port 3337 are updated, all the other databases are identical to the ones from release 75.

The current implementation of the registry does not support two core databases for a single species on the same server.

The solutions are:
- use two separate servers
In the case of our live servers, we have ensembldb.ensembl.org<http://ensembldb.ensembl.org>:3306 for GRCh38 and ensembl.ensembl.org<http://ensembl.ensembl.org>:3337 for GRCh37
- bypass the registry and specify each required database individually
This will only work if connecting to one database at a time

Our system is currently in transition between two models.
Historically, one species has one assembly at one given time.
With the migration from GRCh37 to GRCh38 and the future of genomics, we see the need to support multiple assemblies for a single species.
We are currently working on better solutions for this.


Regards,
Magali

On 28/04/2015 09:51, Duarte Molha wrote:
Ok ... thanks Magali.

I believe the latest VEP now supports an --assembly flag to allow it to annotate against a specific assembly.
Can we not have the same flag on the registry ?
How does VEP do it?
This would be incredibly useful because I would not have to  create new scripts to support a different assembly.

I could just download all the tables and just tell the registry which one to use.

Please correct me if I am wrong but your proposed solution would mean I would have to bypass the registry completely and I would need to create each each adaptor from scratch and thus I would need to alter a lot of my scripts to support both assemblies.

"--assembly GRCh37" would be a much more preferable route.

Best regards,

    Duarte




=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On 27 April 2015 at 17:48, mag <mr6 at ebi.ac.uk<mailto:mr6 at ebi.ac.uk>> wrote:
Hi Duarte,

The mysql dumps for GRCh37 are available on the ftp site as well
ftp://ftp.ensembl.org/pub/grch37/release-79/mysql/

I would recommend having only one copy of human for release 79.
So if you are interested in the GRCh37 data, you can download the database from ftp://ftp.ensembl.org/pub/grch37/release-79/mysql/ rather than ftp://ftp.ensembl.org/pub/release-79/mysql/

If you need both databases on the same server, you can access a given database directly rather than using the registry.
my $human_dba = Bio::EnsEMBL::DBSQL::DBAdaptor->new(
    -HOST => 'localhost',
    -PORT => 3306,
    -USER => 'user',
    -DBNAME => 'homo_sapiens_core_79_37',
    -SPECIES => 'homo_sapiens',
    -GROUP => 'core'
);


Hope that helps,
Magali

On 27/04/2015 17:11, Duarte Molha wrote:
Thanks Magali

But I think you have not understtod my question.

Assume I want to download the databases to my local computer and use the perl API 79 to query the latest 79_37 database instead of the default 79_38.
Previously, I just had to download the mysql tables corresponding to the api I was using to fetch the data correctly, however, now you have broken that link API->Underlying_assembly

So how do I tell my scripts what database to query?

My local sql will all be under the same port.

Best regards

Duarte



=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On 27 April 2015 at 17:01, mag <mr6 at ebi.ac.uk<mailto:mr6 at ebi.ac.uk>> wrote:
Hi Duarte,

The archive 75 website is still based on the release 75 API.

For the dedicated GRCh37 website though, we have used a data freeze from release 75 and have since been updating the website and underlying databases along with the main release.
The GRCh37 databases are available on our main MySQL server on port 3337 (instead of the default 3306 which will give you access to GRCh38 databases)


Hope that helps,
Magali


On 27/04/2015 16:56, Duarte Molha wrote:
Dear developers

On your GRCh37 archive site you say this:

===========================

About this archive

This archive is based on Ensembl Release 75 data, and gives continuing access to human assembly GRCh37, as well as all our other release 75 species (data freeze March 2014) for comparative purposes. Human variation and regulation data has since been updated in March 2015.

The API and website will be updated in tandem with the release of the main Ensembl website (currently version 79), and we will also periodically update this site with new data human, which will be announced in this panel.

MySQL dumps of human databases on the most recent schema version are available on our FTP site<ftp://ftp.ensembl.org/pub/grch37/>.

=========================

It was my understanding that an API version was directly linked to a specific assembly. So I thought that if I wanted to query the latest GRCh37 assembly I would need to use the api v75 and if I wanted to use a local database, I would download the corresponding sql tables for that version.

However, according to this announcement, I can now use the V79 api and query the old assembly... How is this accomplished ?
What you I have to do on my scripts to make sure I am querying the 37 version even though I am using the latest API?

Sorry, I hope this is not a stupid question but I am a bit confused.

Best regards

Duarte



=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================



_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/



_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/





_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/



_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/





_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150428/43154ba6/attachment.html>


More information about the Dev mailing list