[ensembl-dev] versioning of ensembl & biomart

rhoda at ebi.ac.uk rhoda at ebi.ac.uk
Wed Dec 14 12:47:37 GMT 2011


Hi Vincent
I am not planning on reinstating this in the Ensembl gene mart in the near
future as it is not trivial to do this, but if there are sufficient
requests
for canonical transcript stable ids over the coming months we will look
into adding this data once again.
Regards
Rhoda



> Hi,
>
> thanks for your answer. Do you think that the canonical transcript stable
> id will be available through biomart in next ensembl releases. If so I
> will continue to work with release 64 and wait until release 66; or do you
> think that this removal is definitive and in this case I have to find
> alternative solution such as using the perl API as you suggest (but it's
> annoy me to have a mix of biomart and perl API script).
>
> Vincent
>
> Le 14 déc. 2011 à 13:22, rhoda at ebi.ac.uk a écrit :
>
>> Hi Vincent
>> The canonical transcript stable id is no longer available in BioMart as
>> the changes made to the core schema (merging of the stable_id tables
>> with
>> their parent tables) made it impossible to add this data using the
>> martbuilder tool. Apologies, this information was accidentally omitted
>> from the mart news. You can obtain this information using the perl API
>> if
>> you still require it.
>> Regards
>> Rhoda
>>
>>
>>> Hi,
>>>
>>> I adapt my code to handle the special case of the devil. So that know
>>> my
>>> first script is OK. But I encounter a problem when I try to collect
>>> canonical transcript Id. This feature was available in release 64
>>> <Dataset name = "hsapiens_gene_ensembl" interface = "default" >
>>> 		<Attribute name = "ensembl_gene_id" />
>>> 		<Attribute name = "canonical_transcript_stable_id" />
>>> 	</Dataset>
>>>
>>> but it seems to have disappear from release 65 (my script return an
>>> error). I checked the ensembl web interface of biomart and this
>>> "attribute" was present in the graphical interface of v64 but disappear
>>> from the interface of release 65. Is it a bug or is the same
>>> information
>>> available somewhere else under a new name ?
>>>
>>> thank you for your help
>>>
>>> Vincent
>>>
>>> Le 14 déc. 2011 à 11:24, rhoda at ebi.ac.uk a écrit :
>>>
>>>> Hi Vincent
>>>> I am glad that your script now works. With regard to the attribute
>>>> name
>>>> for the Tasmanian devil, this is just an internal naming of this
>>>> attribute
>>>> in the configuration for the website and does not make the data
>>>> retrieved
>>>> from this column in the mart database any less reliable. We have
>>>> recently
>>>> had a discussion with the Ensembl Compara team and are planning to
>>>> tidy
>>>> up
>>>> the configuration for release 66 and I will make sure that the
>>>> internal
>>>> naming is more consistent from release 66 onward. Thank you for your
>>>> feedback.
>>>> Regards
>>>> Rhoda
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> I launch my script this morning on your server and it works fine
>>>>> except
>>>>> for the tasmania devil. Indeed the orthology field for this species
>>>>> does
>>>>> not use the same convention as other species :
>>>>> <Dataset name = "hsapiens_gene_ensembl" interface = "default" >
>>>>> 		<Attribute name = "ensembl_gene_id" />
>>>>> 		<Attribute name = "ensembl_transcript_id" />
>>>>> 		<Attribute name = "devil_ensembl_gene" />
>>>>> 		<Attribute name = "homolog_shar__dm_description_4014" />
>>>>> 		<Attribute name = "cat_ensembl_gene" />
>>>>> 		<Attribute name = "cat_orthology_type" />
>>>>> 		<Attribute name = "chimp_ensembl_gene" />
>>>>> 		<Attribute name = "chimp_orthology_type" />
>>>>> 	</Dataset>
>>>>> So I have to make a special case for this species. This is a minor
>>>>> problem, but I was wondering if it has a special meaning (i.e. less
>>>>> reliable prediction of the homology ?)
>>>>>
>>>>> thanks for your help,
>>>>>
>>>>> Vincent
>>>>>
>>>>> Le 14 déc. 2011 à 09:05, rhoda at ebi.ac.uk a écrit :
>>>>>
>>>>>> Hi Vincent
>>>>>> Unfortunately, someone has been hitting our mart servers with a lot
>>>>>> of
>>>>>> queries over the past few days and we are trying to resolve the
>>>>>> connectivity issue. Can you try your query again today and let me
>>>>>> know
>>>>>> if
>>>>>> you can retrieve your results? You will need to keep an eye on
>>>>>> www.biomart.org to determine when this is updated with the new
>>>>>> release
>>>>>> 65
>>>>>> databases or perhaps you could subscribe to the biomart users
>>>>>> mailing
>>>>>> list
>>>>>> (users at biomart.org). The BioMart team are generally quite quick to
>>>>>> update
>>>>>> these databases once we have added them to the public mysql server
>>>>>> and
>>>>>> they usually let me know when they have been added to the biomart
>>>>>> central
>>>>>> portal. I then email the biomart users mailing list and bioconductor
>>>>>> mailing list to let everyone know about the fixes and additions in
>>>>>> the
>>>>>> new
>>>>>> marts. I hope that helps.
>>>>>> Regards
>>>>>> Rhoda
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> thank you very much for this answer. I tried to set the path to
>>>>>>> "http://www.ensembl.org/biomart/martservice?" but I got error
>>>>>>> message
>>>>>>> 'too
>>>>>>> many connection" it is thus probably wiser to wait a couple of days
>>>>>>> so
>>>>>>> that "http://www.biomart.org/biomart/martservice?" will be updated
>>>>>>> and
>>>>>>> ensembl server less loaded... Is there a way to know when this
>>>>>>> server
>>>>>>> is
>>>>>>> updated (apart from launching a request that give different result
>>>>>>> on
>>>>>>> ensembl v64 and v65) ?
>>>>>>>
>>>>>>> Thank you again, I really appreciate the reactivity of the Ensembl
>>>>>>> team
>>>>>>> on
>>>>>>> this forum I think this is part of Ensembl success.
>>>>>>>
>>>>>>> sincerely,
>>>>>>>
>>>>>>> Vincent
>>>>>>>
>>>>>>>
>>>>>>> Le 13 déc. 2011 à 13:53, Rhoda Kinsella a écrit :
>>>>>>>
>>>>>>>> Hi Vincent
>>>>>>>> In your webExample.pl script you are pointing to
>>>>>>>> "http://www.biomart.org/biomart/martservice?" which should always
>>>>>>>> point
>>>>>>>> to the most recent Ensembl release. As it has only been a few days
>>>>>>>> since
>>>>>>>> the Ensembl release 65, the www.biomart.org central portal has not
>>>>>>>> yet
>>>>>>>> been updated to include the new databases. I expect that these
>>>>>>>> will
>>>>>>>> be
>>>>>>>> updated some time this week. If you would like to use the Ensembl
>>>>>>>> release 65 mart databases, you should set your path to
>>>>>>>> "http://www.ensembl.org/biomart/martservice?" and then run your
>>>>>>>> query.
>>>>>>>> To obtain various archive releases, first determine the URL for
>>>>>>>> the
>>>>>>>> archive you wish to access using the following link:
>>>>>>>>
>>>>>>>> http://www.ensembl.org/info/website/archives/index.html
>>>>>>>>
>>>>>>>> If you select Ensembl release 63 from the list on the right hand
>>>>>>>> side
>>>>>>>> of
>>>>>>>> the screen, and click on the BioMart link at the top of the page
>>>>>>>> this
>>>>>>>> will bring you to this URL:
>>>>>>>>
>>>>>>>> http://jun2011.archive.ensembl.org/biomart/martview/
>>>>>>>>
>>>>>>>> You can use this URL in the webExample.pl script if you modify the
>>>>>>>> URL
>>>>>>>> to have "/martservice?" at the end of the URL, like this:
>>>>>>>>
>>>>>>>> http://jun2011.archive.ensembl.org/biomart/martservice?
>>>>>>>>
>>>>>>>> This will allow you to query the Ensembl release 63 marts. I hope
>>>>>>>> this
>>>>>>>> helps but please don't hesitate to contact me if you have further
>>>>>>>> questions.
>>>>>>>> Regards
>>>>>>>> Rhoda
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 13 Dec 2011, at 12:37, vincent ranwez wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> we are using XML biomart query (and a small perl script to launch
>>>>>>>>> this
>>>>>>>>> XML query) to collect some Ensembl information. I understand that
>>>>>>>>> there
>>>>>>>>> is distinct versioning of Ensembl and biomart, but I would like
>>>>>>>>> to
>>>>>>>>> know
>>>>>>>>> which Ensembl version is queried when using a XML query and  how
>>>>>>>>> to
>>>>>>>>> query peculiar version of Ensembl. It seems to me that my XML
>>>>>>>>> queries
>>>>>>>>> are done on Ensembl v64, is there a way to query v65 by modifying
>>>>>>>>> either the XML file (with the virtualSchemaName attribute ?) or
>>>>>>>>> the
>>>>>>>>> perl script (provided at the end of this mail).
>>>>>>>>>
>>>>>>>>> Biomart web site provides an example to check default
>>>>>>>>> configuration
>>>>>>>>> :
>>>>>>>>> http://www.biomart.org/biomart/martservice?type=configuration&dataset=hsapiens_gene_ensembl
>>>>>>>>> but this does only provide the genome version and not the ensembl
>>>>>>>>> version. For instance this web page indicate that Homo sapiens
>>>>>>>>> genes
>>>>>>>>> (GRCh37.p5) is used but this is common to both version 64 and 65
>>>>>>>>> of
>>>>>>>>> Ensembl that provide different results for simple query such as
>>>>>>>>> the
>>>>>>>>> list of human geneId and transcriptId (v64 178,538 results, v65
>>>>>>>>> 181,745). Moreover this does not explain how to use a specific
>>>>>>>>> configuration pointing toward a given Ensembl release.
>>>>>>>>>
>>>>>>>>> I hope you can help me to solve this problem.
>>>>>>>>>
>>>>>>>>> sincerely,
>>>>>>>>>
>>>>>>>>> Vincent Ranwez
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ###################################
>>>>>>>>> perl script use to run XML query files generated via ensembl web
>>>>>>>>> interface of biomart
>>>>>>>>> ###################################
>>>>>>>>>
>>>>>>>>> use strict;
>>>>>>>>> use LWP::UserAgent;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> open (FH,"$ARGV[0]") || die ("\nUsage: perl webExample.pl
>>>>>>>>> Query.xml
>>>>>>>>> outupFile (pb with arg0)\n\n");
>>>>>>>>> open (FILE,">>","$ARGV[1]") || die ("\nUsage: perl webExample.pl
>>>>>>>>> Query.xml outputFile (pb with arg1)\n\n");
>>>>>>>>> close (FILE);
>>>>>>>>>
>>>>>>>>> my $xml;
>>>>>>>>> while (<FH>){
>>>>>>>>> $xml .= $_;
>>>>>>>>> }
>>>>>>>>> close(FH);
>>>>>>>>>
>>>>>>>>> my $path="http://www.biomart.org/biomart/martservice?";
>>>>>>>>> my $request =
>>>>>>>>> HTTP::Request->new("POST",$path,HTTP::Headers->new(),'query='.$xml."\n");
>>>>>>>>> my $ua = LWP::UserAgent->new;
>>>>>>>>>
>>>>>>>>> my $response;
>>>>>>>>> my $tmp = "$ARGV[1]_tmp";
>>>>>>>>> my $fileRes = $ARGV[1];;
>>>>>>>>> $ua->request($request, "$tmp");
>>>>>>>>> system("cat $tmp >> $fileRes; rm $tmp");
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>>>> List admin (including subscribe/unsubscribe):
>>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>>
>>>>>>>> Rhoda Kinsella Ph.D.
>>>>>>>> Ensembl Production Project Leader,
>>>>>>>> European Bioinformatics Institute (EMBL-EBI),
>>>>>>>> Wellcome Trust Genome Campus,
>>>>>>>> Hinxton
>>>>>>>> Cambridge CB10 1SD,
>>>>>>>> UK.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>






More information about the Dev mailing list