[ensembl-dev] question variation APi

Will McLaren wm2 at ebi.ac.uk
Tue Oct 7 16:00:52 BST 2014


Hi Nathalie,

Yes you could use MySQL, though again you may have memory issues (either on
the server or client side), and we prefer not to have long running or very
large queries running on our public database server.

mysql -hensembldb.ensembl.org -uanonymous -e"SELECT name, somatic FROM
variation LIMIT 10;" homo_sapiens_variation_77_38

should do it (obviously without the LIMIT 10), though without wanting to
test this myself I think it would be best to avoid using this.

You could wrap that statement in a Perl DBI call with {'MYSQL_USE_RESULT'
=> 1} to avoid killing the server memory, though I imagine it will still
take a long time.

You shouldn't have issues with memory using the FTP (you could just wget
the whole file first before parsing), and you can expect the latest VCF
files to appear in the same address structure every Ensembl release. You
can find the latest Ensembl release programmatically through our REST API,
see http://rest.ensembl.org/documentation/info/software

Will



On 7 October 2014 15:45, Nathalie Conte <nconte at ebi.ac.uk> wrote:

> HI Will,
> Thanks for your answer.
> I would like to have programmatic access to this data as this will be
> saved in a data warehouse containing other data pulled from other databases
> which would be updated automatically in the future.To manually retrieve the
> data using ftp may not be sustainable  for these reasons - I have tried
> using FTP but as you predicted , I have issues with memory and this is
> clearly not optimal. Would a direct sql query to ensembl database work?
> Thanks for any advice
> Nathalie
>
>
> On 7 Oct 2014, at 13:52, Will McLaren <wm2 at ebi.ac.uk> wrote:
>
> Hi Nathalie,
>
> We wouldn't recommend using the API to retrieve all of the rsIDs; there
> are >60million and the API is not optimised for retrieving the whole
> dataset in this way.
>
> Instead I'd recommend you extract the IDs from one of our dump files;
> probably VCF or GVF would be the easiest to work with:
>
> curl
> ftp://ftp.ensembl.org/pub/release-77/variation/vcf/homo_sapiens/Homo_sapiens.vcf.gz
> | zcat | grep -v # | cut -f 3 | head
>
> (remove the head and redirect to a file to get all of them).
>
> The somatic mutations are in a separate file,
> ftp://ftp.ensembl.org/pub/release-77/variation/vcf/homo_sapiens/Homo_sapiens_somatic.vcf.gz
>
> To answer your question, to fetch somatic mutations use
> fetch_all_somatic() (see
> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1DBSQL_1_1VariationAdaptor.html#a22e69dacdd77542463320a1ef16b151f
> )
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
> On 7 October 2014 10:38, Nathalie Conte <nconte at ebi.ac.uk> wrote:
>
>> Hi,
>> I would like to get all variation ID  (ie rs1822893 )from ensembl, I am
>> using the variation API to do so.
>> Is it the best way? the fetch_all method seems to get all germline
>> variation, is there another method for somatic ones?
>>
>> my $vf_adaptor = Bio::EnsEMBL::Registry->get_adaptor('human',
>> 'variation', 'variationfeature');
>> my @vfs = @{$vf_adaptor->fetch_all()};
>> foreach my $vf(@vfs){
>>                  if ($vf){
>>                         my $varID=defined($vf->variation_name) ?
>> $vf->variation_name :'No_variation';
>>                         if ($varID) {
>> print  "$varID\n";
>>                         }
>>                 }
>> }
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141007/b5c2ccec/attachment.html>


More information about the Dev mailing list