[ensembl-dev] wget/curl access to ftp data among other frustrations

Mark Aquino aquinom85 at me.com
Mon Oct 31 21:25:49 GMT 2011

You can use command line ftp and easily download all the files in a directory. 

Sent from my iPhone

On Oct 31, 2011, at 4:54 PM, "W. Augustine Dunn III" <wadunn83 at gmail.com> wrote:

> In short:
> why is this not enabled?!
> But with a few more words and hopefully a bit less fragrant frustration bleeding though (I'll TRY anyway):
> I have spent a very frustrating few hours trying to mirror a few species from one of the metazoa.ensembl releases (Aedes,Culex,Anopheles,Dmel).  You may prefer that people go to the html ftp interface (http://metazoa.ensembl.org/info/data/ftp/index.html) and repetitively sit at their desk and right-click>download for every file needed for every species needed but this very annoying and results in an un-organized set of files that then need to be put BACK in the nice little order that they were ALREADY in before I had the misfortune of realizing that I needed to try to get data from you guys.  
> I appreciate that for the lay-person it is useful to have point-and-click a-la-carte downloads but why prevent those that can actually make the MOST use of your data (people who know their way around a cmd line and understand the need to keep data in a predetermined/organized structure) from using standard methods to access your data in a way that is able to be automated and scripted for reliable updating in the future?
> Actually, on that note can you PLEASE standardize the names of the files included in diff "release" folders? Exp: in relese9 the gtf for anopheles is ftp://ftp.ensemblgenomes.org/pub/metazoa/release-9/gtf/anopheles_gambiae/anopheles_gambiae.AgamP3.62.gtf.gz which is GREAT because it gives me some idea of which DB version the gtf is based from.  BUT in release 10 its now named: ftp://ftp.ensemblgenomes.org/pub/metazoa/release-10/gtf/anopheles_gambiae/Anopheles_gambiae.AgamP3.gtf.gz.  This gives people no idea which gene-build this gtf represents bc AgamP3 refers to a genome assembly version NOT a gene-build.  
> Basically, PLEASE PLEASE PLEASE decide on a standard way to name files so that they reflect what version of data they contain. PREFERABLY just use Ensembl's conventions!  If you guys are gonna split these orphan geneomes out into their own little "kids table at Christmas dinner" the LEAST you could do is try to make that as un-obstructive to the science that relies on this data as possible.  Its not like people work only in metazoa.ensembl OR the "blessed" normal ensembl.  Why put easily avoidable stumbling blocks in the way?  All that we should have to do is use a different url. All other internals should be made to behave identically.  Anything else is directly contributing to important science NOT GETTING DONE.  Not because the data is unavailable or tainted but because the scientists are spending all their time dealing with cryptically broken scripts and trying to learn another schema if there even IS a cohearent new one to begin with.
> PLEASE make this better!  A the VERY least please let us use standard tools like wget/curl/rsync to pull whole directory structures down intact.  PLEASE PLEASE PLEASE.
> I am sorry if I have offended.  it is NOT my intention.  I am simply incredibly frustrated because I have to work in both these domains and simple things like easy access and standardized naming conventions does NOT seem like that hard of a thing to get right.
> Gus Dunn 
>  -- 
> W. Augustine Dunn, III
> Ph.D. Candidate
> Laboratory of Dr. Anthony James
> Department of Molecular Biology and Biochemistry
> University of California, Irvine
> (949) 824-3210 - Lab
> (949) 824-8551 - Fax   
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20111031/6e33fa64/attachment.html>

More information about the Dev mailing list