[ensembl-dev] question variation APi

Nathalie Conte nconte at ebi.ac.uk
Tue Oct 7 15:55:24 BST 2014


typo
>  I have tried using FTP but as you predicted 
>  I have tried using API but as you predicted 
On 7 Oct 2014, at 15:45, Nathalie Conte <nconte at ebi.ac.uk> wrote:

> HI Will, 
> Thanks for your answer.
> I would like to have programmatic access to this data as this will be saved in a data warehouse containing other data pulled from other databases which would be updated automatically in the future.To manually retrieve the data using ftp may not be sustainable  for these reasons - I have tried using FTP but as you predicted , I have issues with memory and this is clearly not optimal. Would a direct sql query to ensembl database work?
> Thanks for any advice 
> Nathalie
> 
> 
> On 7 Oct 2014, at 13:52, Will McLaren <wm2 at ebi.ac.uk> wrote:
> 
>> Hi Nathalie,
>> 
>> We wouldn't recommend using the API to retrieve all of the rsIDs; there are >60million and the API is not optimised for retrieving the whole dataset in this way.
>> 
>> Instead I'd recommend you extract the IDs from one of our dump files; probably VCF or GVF would be the easiest to work with:
>> 
>> curl ftp://ftp.ensembl.org/pub/release-77/variation/vcf/homo_sapiens/Homo_sapiens.vcf.gz | zcat | grep -v # | cut -f 3 | head
>> 
>> (remove the head and redirect to a file to get all of them).
>> 
>> The somatic mutations are in a separate file, ftp://ftp.ensembl.org/pub/release-77/variation/vcf/homo_sapiens/Homo_sapiens_somatic.vcf.gz
>> 
>> To answer your question, to fetch somatic mutations use fetch_all_somatic() (see http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1DBSQL_1_1VariationAdaptor.html#a22e69dacdd77542463320a1ef16b151f)
>> 
>> Regards
>> 
>> Will McLaren
>> Ensembl Variation
>> 
>> On 7 October 2014 10:38, Nathalie Conte <nconte at ebi.ac.uk> wrote:
>> Hi,
>> I would like to get all variation ID  (ie rs1822893 )from ensembl, I am using the variation API to do so.
>> Is it the best way? the fetch_all method seems to get all germline variation, is there another method for somatic ones?
>> 
>> my $vf_adaptor = Bio::EnsEMBL::Registry->get_adaptor('human', 'variation', 'variationfeature');
>> my @vfs = @{$vf_adaptor->fetch_all()};
>> foreach my $vf(@vfs){
>>                  if ($vf){
>>                         my $varID=defined($vf->variation_name) ? $vf->variation_name :'No_variation';
>>                         if ($varID) {
>> print  "$varID\n";
>>                         }
>>                 }
>> }
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141007/daddb416/attachment.html>


More information about the Dev mailing list