[ensembl-dev] Cannot find 1000 Genomes genotype data in database

Zheng Jin zheng at selfdecode.com
Wed Aug 28 02:57:33 BST 2019


Hi Devs,

I'm trying to programmatically fetch allele and genotype frequency data
provided by 1000 Genomes for a (large) list of SNPs, which is available on
the website (e.g.
http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:41610728-41611728;v=rs7627367;vdb=variation;vf=323376947
)

After digging into the documentations I have found that there are three
ways to retrieve large amount of data from the ensembl server:
1. Rest API, which does contain all the information I need, however it also
includes a lot of other data in the same endpoint and is thus relatively
slow;
2. Perl API, but I'm not familiar with Perl, and it seems what it does is
basically accessing the MySQL database;
3. Public MySQL server

So I decided to connect to the public MySQL servers.
However, while I can find the table (`population_genotype`) that should
contain the information, as well as the 1000 Genomes references
(`population`), I cannot find the data that I need.

SELECT
>     `p`.`population_id`,
>     `p`.`name`,
>     `p`.`size`,
>     `p`.`description`
> FROM `population` AS `p`
> WHERE
>     `p`.`name` LIKE '1000GENOMES:phase_3:%'
> ORDER BY `p`.`population_id`;

gives me the 32 populations I need, however

> SELECT
>     `pg`.*,
>     `p`.`name` AS `population_name`
> FROM `population_genotype` AS `pg`
> JOIN `population` AS `p`
>     ON `p`.`population_id` = `pg`.`population_id`
> WHERE
>     `p`.`name` LIKE '1000GENOMES:phase_3:%';

would return empty set.

Similar situation exists for the `sample` and `sample_genotype_multiple_bp`
tables:

> SELECT
>     `s`.`sample_id`,
>     `s`.`name` AS `sample_name`,
>     `ind`.`individual_id`,
>     `ind`.`name` AS `individual_name`,
>     `ind`.`gender`
> FROM `sample` AS `s`
> JOIN `individual` AS `ind`
>     ON `ind`.`individual_id` = `s`.`individual_id`
> WHERE
>     `s`.`name` LIKE '1000GENOMES:phase_3:%'

ORDER BY `s`.`sample_id`;

 returns 2504 records, which is the correct number, but

> SELECT
>     `sg`.*,
>     `s`.`name`
> FROM `sample_genotype_multiple_bp` AS `sg`
> JOIN `sample` AS `s`
>     ON `s`.`sample_id` = `sg`.`sample_id`

WHERE

    `s`.`name` LIKE '1000GENOMES:phase_3:%';

returns empty set.

I tried connecting to both
ensembldb.ensembl.org:3337/homo_sapiens_variation_97_37 and
ensembldb.ensembl.org:3306/homo_sapiens_variation_97_38, and the situation
is the same for both.

I'm genuinely at a loss here, because I can clearly see the data on the
website. Is it retrieved from somewhere else other than the database I'm
looking at?

I'm relatively new to Ensembl so I might be looking at the wrong place.
Thanks in advance.

Yours,
Zheng Jin

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190827/a1ef5a79/attachment.html>


More information about the Dev mailing list