[ensembl-dev] question variation APi

Nathalie Conte nconte at ebi.ac.uk
Wed Oct 8 11:07:02 BST 2014


got it now 
homo_sapiens_variation_77_38
On 8 Oct 2014, at 10:54, Nathalie Conte <nconte at ebi.ac.uk> wrote:

> hi WIll, 
> Could you please check the DBI->connect statement
> I have an error message, I suspect the dbname is not correct, I used variation as dbname.
> 
> error message
> ******
> DBI connect('dbname=variation;host=ensembldb.ensembl.org;port=5306','anonymous',...) failed: Unknown database 'variation' at ./start_test2.pl line 11.
> ******
> 
> my statement
> *****
> my $geno_dbh = DBI->connect(    'dbi:mysql:dbname=variation;host=ensembldb.ensembl.org;port=5306',    'anonymous', '',  {InactiveDestroy => 1, RaiseError => 1, PrintError => 1}    ) || die "Database connection not made: $DBI::errstr";
> *****
> 
> thanks
> On 7 Oct 2014, at 16:47, Will McLaren <wm2 at ebi.ac.uk> wrote:
> 
>> 
>> 
>> On 7 October 2014 16:34, Nathalie Conte <nconte at ebi.ac.uk> wrote:
>> 
>> On 7 Oct 2014, at 16:00, Will McLaren <wm2 at ebi.ac.uk> wrote:
>> 
>>> Hi Nathalie,
>>> 
>>> Yes you could use MySQL, though again you may have memory issues (either on the server or client side), and we prefer not to have long running or very large queries running on our public database server.
>>> 
>>> mysql -hensembldb.ensembl.org -uanonymous -e"SELECT name, somatic FROM variation LIMIT 10;" homo_sapiens_variation_77_38
>>> 
>>> should do it (obviously without the LIMIT 10), though without wanting to test this myself I think it would be best to avoid using this.
>>> 
>>> You could wrap that statement in a Perl DBI call with {'MYSQL_USE_RESULT' => 1} to avoid killing the server memory, though I imagine it will still take a long time.
>> 
>> thanks, would this work to connect to the database in a perl DBI call ?
>> my $geno_dbh = DBI->connect(    'dbi:mysql:dbname=ensembl;host=ensembldb.ensembl.org;port=5306',    'anonymous',    {InactiveDestroy => 1, RaiseError => 1, PrintError => 1, MYSQL_USE_RESULT => 1}    ) || die "Database connection not made: $DBI::errstr";
>> this would need a password for anonymous user?
>> 
>> There's no password for the anonymous user.
>>  
>> 
>> my $sql = "SELECT name, somatic FROM variation LIMIT 10 ";  
>> my $geno_sth = $geno_dbh->prepare($sql);
>> $geno_sth->execute()  or die "SQL Error: $DBI::errstr\n"; …..
>> 
>> The {mysql_use_result => 1}  should go in your prepare statement, so
>> 
>> my $geno_sth = $geno_dbh->prepare($sql, {mysql_use_result => 1});
>> 
>> Will
>> 
>>> 
>>> You shouldn't have issues with memory using the FTP (you could just wget the whole file first before parsing), and you can expect the latest VCF files to appear in the same address structure every Ensembl release. You can find the latest Ensembl release programmatically through our REST API, see http://rest.ensembl.org/documentation/info/software
>>> 
>>> Will
>>> 
>>> 
>>> 
>>> On 7 October 2014 15:45, Nathalie Conte <nconte at ebi.ac.uk> wrote:
>>> HI Will, 
>>> Thanks for your answer.
>>> I would like to have programmatic access to this data as this will be saved in a data warehouse containing other data pulled from other databases which would be updated automatically in the future.To manually retrieve the data using ftp may not be sustainable  for these reasons - I have tried using FTP but as you predicted , I have issues with memory and this is clearly not optimal. Would a direct sql query to ensembl database work?
>>> Thanks for any advice 
>>> Nathalie
>>> 
>>> 
>>> On 7 Oct 2014, at 13:52, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>> 
>>>> Hi Nathalie,
>>>> 
>>>> We wouldn't recommend using the API to retrieve all of the rsIDs; there are >60million and the API is not optimised for retrieving the whole dataset in this way.
>>>> 
>>>> Instead I'd recommend you extract the IDs from one of our dump files; probably VCF or GVF would be the easiest to work with:
>>>> 
>>>> curl ftp://ftp.ensembl.org/pub/release-77/variation/vcf/homo_sapiens/Homo_sapiens.vcf.gz | zcat | grep -v # | cut -f 3 | head
>>>> 
>>>> (remove the head and redirect to a file to get all of them).
>>>> 
>>>> The somatic mutations are in a separate file, ftp://ftp.ensembl.org/pub/release-77/variation/vcf/homo_sapiens/Homo_sapiens_somatic.vcf.gz
>>>> 
>>>> To answer your question, to fetch somatic mutations use fetch_all_somatic() (see http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1DBSQL_1_1VariationAdaptor.html#a22e69dacdd77542463320a1ef16b151f)
>>>> 
>>>> Regards
>>>> 
>>>> Will McLaren
>>>> Ensembl Variation
>>>> 
>>>> On 7 October 2014 10:38, Nathalie Conte <nconte at ebi.ac.uk> wrote:
>>>> Hi,
>>>> I would like to get all variation ID  (ie rs1822893 )from ensembl, I am using the variation API to do so.
>>>> Is it the best way? the fetch_all method seems to get all germline variation, is there another method for somatic ones?
>>>> 
>>>> my $vf_adaptor = Bio::EnsEMBL::Registry->get_adaptor('human', 'variation', 'variationfeature');
>>>> my @vfs = @{$vf_adaptor->fetch_all()};
>>>> foreach my $vf(@vfs){
>>>>                  if ($vf){
>>>>                         my $varID=defined($vf->variation_name) ? $vf->variation_name :'No_variation';
>>>>                         if ($varID) {
>>>> print  "$varID\n";
>>>>                         }
>>>>                 }
>>>> }
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141008/4615616d/attachment.html>


More information about the Dev mailing list