[ensembl-dev] Cannot find 1000 Genomes genotype data in database
Zheng Jin
zheng at selfdecode.com
Wed Aug 28 02:57:33 BST 2019
Hi Devs,
I'm trying to programmatically fetch allele and genotype frequency data
provided by 1000 Genomes for a (large) list of SNPs, which is available on
the website (e.g.
http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:41610728-41611728;v=rs7627367;vdb=variation;vf=323376947
)
After digging into the documentations I have found that there are three
ways to retrieve large amount of data from the ensembl server:
1. Rest API, which does contain all the information I need, however it also
includes a lot of other data in the same endpoint and is thus relatively
slow;
2. Perl API, but I'm not familiar with Perl, and it seems what it does is
basically accessing the MySQL database;
3. Public MySQL server
So I decided to connect to the public MySQL servers.
However, while I can find the table (`population_genotype`) that should
contain the information, as well as the 1000 Genomes references
(`population`), I cannot find the data that I need.
SELECT
> `p`.`population_id`,
> `p`.`name`,
> `p`.`size`,
> `p`.`description`
> FROM `population` AS `p`
> WHERE
> `p`.`name` LIKE '1000GENOMES:phase_3:%'
> ORDER BY `p`.`population_id`;
gives me the 32 populations I need, however
> SELECT
> `pg`.*,
> `p`.`name` AS `population_name`
> FROM `population_genotype` AS `pg`
> JOIN `population` AS `p`
> ON `p`.`population_id` = `pg`.`population_id`
> WHERE
> `p`.`name` LIKE '1000GENOMES:phase_3:%';
would return empty set.
Similar situation exists for the `sample` and `sample_genotype_multiple_bp`
tables:
> SELECT
> `s`.`sample_id`,
> `s`.`name` AS `sample_name`,
> `ind`.`individual_id`,
> `ind`.`name` AS `individual_name`,
> `ind`.`gender`
> FROM `sample` AS `s`
> JOIN `individual` AS `ind`
> ON `ind`.`individual_id` = `s`.`individual_id`
> WHERE
> `s`.`name` LIKE '1000GENOMES:phase_3:%'
ORDER BY `s`.`sample_id`;
returns 2504 records, which is the correct number, but
> SELECT
> `sg`.*,
> `s`.`name`
> FROM `sample_genotype_multiple_bp` AS `sg`
> JOIN `sample` AS `s`
> ON `s`.`sample_id` = `sg`.`sample_id`
WHERE
`s`.`name` LIKE '1000GENOMES:phase_3:%';
returns empty set.
I tried connecting to both
ensembldb.ensembl.org:3337/homo_sapiens_variation_97_37 and
ensembldb.ensembl.org:3306/homo_sapiens_variation_97_38, and the situation
is the same for both.
I'm genuinely at a loss here, because I can clearly see the data on the
website. Is it retrieved from somewhere else other than the database I'm
looking at?
I'm relatively new to Ensembl so I might be looking at the wrong place.
Thanks in advance.
Yours,
Zheng Jin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190827/a1ef5a79/attachment.html>
More information about the Dev
mailing list