[ensembl-dev] question variation APi

Will McLaren wm2 at ebi.ac.uk
Tue Oct 7 16:47:25 BST 2014


On 7 October 2014 16:34, Nathalie Conte <nconte at ebi.ac.uk> wrote:

>
> On 7 Oct 2014, at 16:00, Will McLaren <wm2 at ebi.ac.uk> wrote:
>
> Hi Nathalie,
>
> Yes you could use MySQL, though again you may have memory issues (either
> on the server or client side), and we prefer not to have long running or
> very large queries running on our public database server.
>
> mysql -hensembldb.ensembl.org -uanonymous -e"SELECT name, somatic FROM
> variation LIMIT 10;" homo_sapiens_variation_77_38
>
> should do it (obviously without the LIMIT 10), though without wanting to
> test this myself I think it would be best to avoid using this.
>
> You could wrap that statement in a Perl DBI call with {'MYSQL_USE_RESULT'
> => 1} to avoid killing the server memory, though I imagine it will still
> take a long time.
>
>
> thanks, would this work to connect to the database in a perl DBI call ?
> my $geno_dbh = DBI->connect(    'dbi:mysql:dbname=ensembl;host=
> ensembldb.ensembl.org;port=5306',    'anonymous',    {InactiveDestroy =>
> 1, RaiseError => 1, PrintError => 1, MYSQL_USE_RESULT => 1}    ) || die
> "Database connection not made: $DBI::errstr";
> this would need a password for anonymous user?
>

There's no password for the anonymous user.


>
> my $sql = "SELECT name, somatic FROM variation LIMIT 10 ";
> my $geno_sth = $geno_dbh->prepare($sql);
> $geno_sth->execute()  or die "SQL Error: $DBI::errstr\n"; …..
>

The {mysql_use_result => 1}  should go in your prepare statement, so

my $geno_sth = $geno_dbh->prepare($sql, {mysql_use_result => 1});

Will

>
>
> You shouldn't have issues with memory using the FTP (you could just wget
> the whole file first before parsing), and you can expect the latest VCF
> files to appear in the same address structure every Ensembl release. You
> can find the latest Ensembl release programmatically through our REST API,
> see http://rest.ensembl.org/documentation/info/software
>
> Will
>
>
>
> On 7 October 2014 15:45, Nathalie Conte <nconte at ebi.ac.uk> wrote:
>
>> HI Will,
>> Thanks for your answer.
>> I would like to have programmatic access to this data as this will be
>> saved in a data warehouse containing other data pulled from other databases
>> which would be updated automatically in the future.To manually retrieve the
>> data using ftp may not be sustainable  for these reasons - I have tried
>> using FTP but as you predicted , I have issues with memory and this is
>> clearly not optimal. Would a direct sql query to ensembl database work?
>> Thanks for any advice
>> Nathalie
>>
>>
>> On 7 Oct 2014, at 13:52, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>
>> Hi Nathalie,
>>
>> We wouldn't recommend using the API to retrieve all of the rsIDs; there
>> are >60million and the API is not optimised for retrieving the whole
>> dataset in this way.
>>
>> Instead I'd recommend you extract the IDs from one of our dump files;
>> probably VCF or GVF would be the easiest to work with:
>>
>> curl
>> ftp://ftp.ensembl.org/pub/release-77/variation/vcf/homo_sapiens/Homo_sapiens.vcf.gz
>> | zcat | grep -v # | cut -f 3 | head
>>
>> (remove the head and redirect to a file to get all of them).
>>
>> The somatic mutations are in a separate file,
>> ftp://ftp.ensembl.org/pub/release-77/variation/vcf/homo_sapiens/Homo_sapiens_somatic.vcf.gz
>>
>> To answer your question, to fetch somatic mutations use
>> fetch_all_somatic() (see
>> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1DBSQL_1_1VariationAdaptor.html#a22e69dacdd77542463320a1ef16b151f
>> )
>>
>> Regards
>>
>> Will McLaren
>> Ensembl Variation
>>
>> On 7 October 2014 10:38, Nathalie Conte <nconte at ebi.ac.uk> wrote:
>>
>>> Hi,
>>> I would like to get all variation ID  (ie rs1822893 )from ensembl, I am
>>> using the variation API to do so.
>>> Is it the best way? the fetch_all method seems to get all germline
>>> variation, is there another method for somatic ones?
>>>
>>> my $vf_adaptor = Bio::EnsEMBL::Registry->get_adaptor('human',
>>> 'variation', 'variationfeature');
>>> my @vfs = @{$vf_adaptor->fetch_all()};
>>> foreach my $vf(@vfs){
>>>                  if ($vf){
>>>                         my $varID=defined($vf->variation_name) ?
>>> $vf->variation_name :'No_variation';
>>>                         if ($varID) {
>>> print  "$varID\n";
>>>                         }
>>>                 }
>>> }
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141007/e0a307d3/attachment.html>


More information about the Dev mailing list