[ensembl-dev] searchin on ensembl with perl API

Carnë Draug carandraug+dev at gmail.com
Wed Jul 20 14:39:33 BST 2011


Hi

thank you all for your responses. It's nice to feel so much helped

On 20 July 2011 09:17, Giulietta Spudich <gspudich at ebi.ac.uk> wrote:
> We don't usually recommend relying on Lucene searches that way, as the
> search engine looks for hits to the text you're entering, and they might not
> be related to your gene of interest.  Meaning, if you search for RHO you get
> rhodopsin-like proteins, and other genes with RHO in the name or description
> that are not related to Rhodopsin at all.  Also, you could miss genes that
> are homologous but have not been named in exactly the same way.  I see with
> your search that every hit is relevant to the histone h2a cluster, but are
> you missing genes in all species this way?

My plan is to make an application for my colleagues, where given the a
query that they would enter on the site search field, gets them a CSV
file of all the gene hits, with the the accession of the transcripts,
gene descriptions, etc and also get the sequences of them all. You're
right, it's not the most efficient way to find everything but they
would first make sure on the site if the query returned what they
wanted.

I finished writing such a program for the entrez database
http://pastebin.com/nJH9GwaT and am now extending it to ensembl (I had
to delay ensembl while taking care of all the paperwork to have the
needed ports open. And still, they're only allowing connections on
that port between my computer and your server).

Guess I'll have to write something different for this case which might
not be so bad. Will force people (and me too) to learn a bit more
about the way genes are organized.

Carnë




More information about the Dev mailing list