[ensembl-dev] Getting variation consequence predictions without perl

Andreas Kahari ak at ebi.ac.uk
Tue Sep 14 14:54:30 BST 2010


I'm just quietly wondering why you necessarily want to do this in
Java...  Are the tools (language) part of the problem specification
you've been given?  Wouldn't you be able to install the SNP effect
predictor on one machine and then make it available to the Macs, Windows
machines, and Linux boxen that needs access to it over HTTP?  This is,
after all, what we do on our live site.

It seems to me that if the 3rd-party Java API (which I personally
know nothing about) is complete enough to handle coordinate mappings
(including HAP/PAR regions and SeqEdits etc.), and if it understands the
variation schema and mimics the Perl API somewhat in what objects it
can create from it, then that (i.e. porting the SNP effect predictor to
Java) is your best alternative bet.  The people developing the Java API
would probably be very happy to see this happen, I imagine.

Regarding mart: Remember that the Ensembl marts are *summaries* of what
we have in the Core, Variation, FuncGen, Compara (etc.) databases,
available for bulk query, and that the structure of the mart databases
themselves might change without much warning.  I would think carefully
before writing an application on top of mart, especially if the
application relies on the mart database schema instead of the existing
XML-based API.  The real data is in the other databases and we have
(Perl) APIs for them that take into account changing schemas.  Please
use these primarily.


Andreas


On Tue, Sep 14, 2010 at 02:54:57PM +0200, Asraniel wrote:
> Thank you.
> 
> i guess i have to understand the way transcripts are stored in biomart, and 
> the calculate the consequence by myself. For basic stuff it looks easy enough.
> I just have digg a little more into genetics. After what i have read a 
> transcript can have one "5' UTR ", but it seems like every exon can have one.
> 
> What CDS End/Start is i can guess, but why a Genomic coding end/start is also 
> there, no idea yet. What exactly a cDNA coding are is is also a mystery 
> currently, i thought that the CDS was coding... 
> 
> Is there a place where the structure of the transcripts in the DB is 
> documented? I tried wikipedia to understand it. Hope i'll get there.
> 
> Beat Wolf
> 
> Am Dienstag 14 September 2010, um 14.34:47 schrieb Pontus Larsson:
> >   Ok, those consequences have to be calculated on-the-fly by the API and
> > are not stored in the database. Hence, BioMart will not be useful to you
> > in this respect. Unless you can wrap the perl code like Stuart
> > suggested, I believe your best bet would be the kind of third-party API
> > that Andrew mentioned.
> > 
> > Cheers
> > /Pontus
> > 
> > On 14/09/2010 13:14, Asraniel wrote:
> > > I got that part.
> > > But this only tells me the variants that are already known.
> > > 
> > > I want to know the consequence of a random variant at a random position
> > > not yet know.
> > > The perl api can tell me what the consequence is for a specific
> > > transcript. Biomart does not seem to allow me to do that...
> > > 
> > > Beat Wolf
> > > 
> > > Am Dienstag 14 September 2010, um 13.58:55 schrieben Sie:
> > >>    Yes, Ensembl Variation ->  Homo Sapiens Variation will get you human
> > >> 
> > >> variation data. To get the transcript consequences, you'll find them
> > >> under Attributes ->  Gene associated information ->  Consequence to
> > >> transcript.
> > >> 
> > >> There is also the option to filter results by consequence, you'll find
> > >> that under Filters ->  Gene associated variation filters ->  Consequence
> > >> type.
> > >> 
> > >> Hope this helps!
> > >> /Pontus
> > >> 
> > >> On 14/09/2010 12:04, Asraniel wrote:
> > >>> Sounds great.
> > >>> 
> > >>> I'm no biologist, so i'm not sure what biomart to choose.
> > >>> 
> > >>> For the known variations i use:
> > >>> Ensembl Variation 59
> > >>> Homo sapiens Variation (dbSNP 131; ENSEMBL).
> > >>> 
> > >>> what would i have to choose for the predicted variants?
> > >>> 
> > >>> thank you
> > >>> 
> > >>> Beat Wolf
> > >>> 
> > >>> Am Dienstag 14 September 2010, um 12.58:39 schrieben Sie:
> > >>>>     Hi,
> > >>>> 
> > >>>> There is a dedicated BioMart for Ensembl Variation data which contains
> > >>>> the predicted transcript consequences
> > >>>> (http://www.ensembl.org/biomart/martview). Could you extract them from
> > >>>> there?
> > >>>> 
> > >>>> Cheers
> > >>>> /Pontus
> > >>>> 
> > >>>> On 14/09/2010 11:42, Asraniel wrote:
> > >>>>> Thanks for your answer.
> > >>>>> 
> > >>>>> Sadly this is not an option, because my app has to work on
> > >>>>> linux/windows/mac and is started trough webstart, so i can't expect
> > >>>>> that everybody has a perl interpreter installed.
> > >>>>> 
> > >>>>> Beat Wolf
> > >>>>> 
> > >>>>> Am Dienstag 14 September 2010, um 12.39:54 schrieb Stuart Meacham:
> > >>>>>> Hi there,
> > >>>>>> 
> > >>>>>> This is probably not going to answer your question! However I also
> > >>>>>> developed a Java app and wanted to use the SNP consequence
> > >>>>>> prediction script (or variations thereof). My first attempt was to
> > >>>>>> just execute the script from within the app with a call to:
> > >>>>>> 
> > >>>>>> ///////////////
> > >>>>>> 
> > >>>>>> Process p = Runtime.getRuntime().exec("/path/to/script/script.pl");
> > >>>>>> 
> > >>>>>> //////////////
> > >>>>>> 
> > >>>>>> and then reading the output of the script with:
> > >>>>>> 
> > >>>>>> //////////////
> > >>>>>> 
> > >>>>>> BufferedReader stdInput = new BufferedReader(new
> > >>>>>> InputStreamReader(p.getInputStream()));
> > >>>>>> 
> > >>>>>> String s = null;
> > >>>>>> 
> > >>>>>> while ((s = stdInput.readLine()) != null) {
> > >>>>>> 
> > >>>>>> 	//do stuff with s
> > >>>>>> 
> > >>>>>> }
> > >>>>>> 
> > >>>>>> //////////////
> > >>>>>> 
> > >>>>>> This works fine although can be slow, and the speed is erratic,
> > >>>>>> especially if your app is going to support many concurrent users. In
> > >>>>>> order to traverse this problem I implemented a second version which
> > >>>>>> simply ran the script independently (outside the app) saved the
> > >>>>>> output to a database and read the database from the app.
> > >>>>>> 
> > >>>>>> I do remember having a conversation once with an Ensembl Dev who
> > >>>>>> said that there was a Java API a few years ago but it now lacks
> > >>>>>> support and is obviously out of date.
> > >>>>>> 
> > >>>>>> Good luck!
> > >>>>>> 
> > >>>>>> Stuart
> > >>>>>> 
> > >>>>>> On 14/09/10 11:22, Asraniel wrote:
> > >>>>>>> Hi,
> > >>>>>>> 
> > >>>>>>> i'm developing java app and i access the ensembl data trough
> > >>>>>>> biomart. Works great, thanks for that api.
> > >>>>>>> 
> > >>>>>>> Now, is there a way to get the variantion consequence prediction
> > >>>>>>> without using perl? i didn't find a way trough biomart.
> > >>>>>>> 
> > >>>>>>> Thank you
> > >>>>>>> 
> > >>>>>>> Beat Wolf
> > >>>>>>> 
> > >>>>>>> _______________________________________________
> > >>>>>>> Dev mailing list
> > >>>>>>> Dev at ensembl.org
> > >>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
> > >>>>>> 
> > >>>>>> _______________________________________________
> > >>>>>> Dev mailing list
> > >>>>>> Dev at ensembl.org
> > >>>>>> http://lists.ensembl.org/mailman/listinfo/dev
> > >>>>> 
> > >>>>> _______________________________________________
> > >>>>> Dev mailing list
> > >>>>> Dev at ensembl.org
> > >>>>> http://lists.ensembl.org/mailman/listinfo/dev
> > >>> 
> > >>> _______________________________________________
> > >>> Dev mailing list
> > >>> Dev at ensembl.org
> > >>> http://lists.ensembl.org/mailman/listinfo/dev
> > > 
> > > _______________________________________________
> > > Dev mailing list
> > > Dev at ensembl.org
> > > http://lists.ensembl.org/mailman/listinfo/dev
> 

> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev


-- 
Andreas Kähäri, Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, United Kingdom




More information about the Dev mailing list