[ensembl-dev] versioning of ensembl & biomart

vincent ranwez vincent.ranwez at univ-montp2.fr
Wed Dec 14 12:10:49 GMT 2011


Hi,

I adapt my code to handle the special case of the devil. So that know my first script is OK. But I encounter a problem when I try to collect canonical transcript Id. This feature was available in release 64
<Dataset name = "hsapiens_gene_ensembl" interface = "default" >
		<Attribute name = "ensembl_gene_id" />
		<Attribute name = "canonical_transcript_stable_id" />
	</Dataset>

but it seems to have disappear from release 65 (my script return an error). I checked the ensembl web interface of biomart and this "attribute" was present in the graphical interface of v64 but disappear from the interface of release 65. Is it a bug or is the same information available somewhere else under a new name ?

thank you for your help

Vincent
 
Le 14 déc. 2011 à 11:24, rhoda at ebi.ac.uk a écrit :

> Hi Vincent
> I am glad that your script now works. With regard to the attribute name
> for the Tasmanian devil, this is just an internal naming of this attribute
> in the configuration for the website and does not make the data retrieved
> from this column in the mart database any less reliable. We have recently
> had a discussion with the Ensembl Compara team and are planning to tidy up
> the configuration for release 66 and I will make sure that the internal
> naming is more consistent from release 66 onward. Thank you for your
> feedback.
> Regards
> Rhoda
> 
> 
>> Hi,
>> 
>> I launch my script this morning on your server and it works fine except
>> for the tasmania devil. Indeed the orthology field for this species does
>> not use the same convention as other species :
>> <Dataset name = "hsapiens_gene_ensembl" interface = "default" >
>> 		<Attribute name = "ensembl_gene_id" />
>> 		<Attribute name = "ensembl_transcript_id" />
>> 		<Attribute name = "devil_ensembl_gene" />
>> 		<Attribute name = "homolog_shar__dm_description_4014" />
>> 		<Attribute name = "cat_ensembl_gene" />
>> 		<Attribute name = "cat_orthology_type" />
>> 		<Attribute name = "chimp_ensembl_gene" />
>> 		<Attribute name = "chimp_orthology_type" />
>> 	</Dataset>
>> So I have to make a special case for this species. This is a minor
>> problem, but I was wondering if it has a special meaning (i.e. less
>> reliable prediction of the homology ?)
>> 
>> thanks for your help,
>> 
>> Vincent
>> 
>> Le 14 déc. 2011 à 09:05, rhoda at ebi.ac.uk a écrit :
>> 
>>> Hi Vincent
>>> Unfortunately, someone has been hitting our mart servers with a lot of
>>> queries over the past few days and we are trying to resolve the
>>> connectivity issue. Can you try your query again today and let me know
>>> if
>>> you can retrieve your results? You will need to keep an eye on
>>> www.biomart.org to determine when this is updated with the new release
>>> 65
>>> databases or perhaps you could subscribe to the biomart users mailing
>>> list
>>> (users at biomart.org). The BioMart team are generally quite quick to
>>> update
>>> these databases once we have added them to the public mysql server and
>>> they usually let me know when they have been added to the biomart
>>> central
>>> portal. I then email the biomart users mailing list and bioconductor
>>> mailing list to let everyone know about the fixes and additions in the
>>> new
>>> marts. I hope that helps.
>>> Regards
>>> Rhoda
>>> 
>>> 
>>>> Hi,
>>>> 
>>>> thank you very much for this answer. I tried to set the path to
>>>> "http://www.ensembl.org/biomart/martservice?" but I got error message
>>>> 'too
>>>> many connection" it is thus probably wiser to wait a couple of days so
>>>> that "http://www.biomart.org/biomart/martservice?" will be updated and
>>>> ensembl server less loaded... Is there a way to know when this server
>>>> is
>>>> updated (apart from launching a request that give different result on
>>>> ensembl v64 and v65) ?
>>>> 
>>>> Thank you again, I really appreciate the reactivity of the Ensembl team
>>>> on
>>>> this forum I think this is part of Ensembl success.
>>>> 
>>>> sincerely,
>>>> 
>>>> Vincent
>>>> 
>>>> 
>>>> Le 13 déc. 2011 à 13:53, Rhoda Kinsella a écrit :
>>>> 
>>>>> Hi Vincent
>>>>> In your webExample.pl script you are pointing to
>>>>> "http://www.biomart.org/biomart/martservice?" which should always
>>>>> point
>>>>> to the most recent Ensembl release. As it has only been a few days
>>>>> since
>>>>> the Ensembl release 65, the www.biomart.org central portal has not yet
>>>>> been updated to include the new databases. I expect that these will be
>>>>> updated some time this week. If you would like to use the Ensembl
>>>>> release 65 mart databases, you should set your path to
>>>>> "http://www.ensembl.org/biomart/martservice?" and then run your query.
>>>>> To obtain various archive releases, first determine the URL for the
>>>>> archive you wish to access using the following link:
>>>>> 
>>>>> http://www.ensembl.org/info/website/archives/index.html
>>>>> 
>>>>> If you select Ensembl release 63 from the list on the right hand side
>>>>> of
>>>>> the screen, and click on the BioMart link at the top of the page this
>>>>> will bring you to this URL:
>>>>> 
>>>>> http://jun2011.archive.ensembl.org/biomart/martview/
>>>>> 
>>>>> You can use this URL in the webExample.pl script if you modify the URL
>>>>> to have "/martservice?" at the end of the URL, like this:
>>>>> 
>>>>> http://jun2011.archive.ensembl.org/biomart/martservice?
>>>>> 
>>>>> This will allow you to query the Ensembl release 63 marts. I hope this
>>>>> helps but please don't hesitate to contact me if you have further
>>>>> questions.
>>>>> Regards
>>>>> Rhoda
>>>>> 
>>>>> 
>>>>> 
>>>>> On 13 Dec 2011, at 12:37, vincent ranwez wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> we are using XML biomart query (and a small perl script to launch
>>>>>> this
>>>>>> XML query) to collect some Ensembl information. I understand that
>>>>>> there
>>>>>> is distinct versioning of Ensembl and biomart, but I would like to
>>>>>> know
>>>>>> which Ensembl version is queried when using a XML query and  how to
>>>>>> query peculiar version of Ensembl. It seems to me that my XML queries
>>>>>> are done on Ensembl v64, is there a way to query v65 by modifying
>>>>>> either the XML file (with the virtualSchemaName attribute ?) or the
>>>>>> perl script (provided at the end of this mail).
>>>>>> 
>>>>>> Biomart web site provides an example to check default configuration :
>>>>>> http://www.biomart.org/biomart/martservice?type=configuration&dataset=hsapiens_gene_ensembl
>>>>>> but this does only provide the genome version and not the ensembl
>>>>>> version. For instance this web page indicate that Homo sapiens genes
>>>>>> (GRCh37.p5) is used but this is common to both version 64 and 65 of
>>>>>> Ensembl that provide different results for simple query such as the
>>>>>> list of human geneId and transcriptId (v64 178,538 results, v65
>>>>>> 181,745). Moreover this does not explain how to use a specific
>>>>>> configuration pointing toward a given Ensembl release.
>>>>>> 
>>>>>> I hope you can help me to solve this problem.
>>>>>> 
>>>>>> sincerely,
>>>>>> 
>>>>>> Vincent Ranwez
>>>>>> 
>>>>>> 
>>>>>> ###################################
>>>>>> perl script use to run XML query files generated via ensembl web
>>>>>> interface of biomart
>>>>>> ###################################
>>>>>> 
>>>>>> use strict;
>>>>>> use LWP::UserAgent;
>>>>>> 
>>>>>> 
>>>>>> open (FH,"$ARGV[0]") || die ("\nUsage: perl webExample.pl Query.xml
>>>>>> outupFile (pb with arg0)\n\n");
>>>>>> open (FILE,">>","$ARGV[1]") || die ("\nUsage: perl webExample.pl
>>>>>> Query.xml outputFile (pb with arg1)\n\n");
>>>>>> close (FILE);
>>>>>> 
>>>>>> my $xml;
>>>>>> while (<FH>){
>>>>>> $xml .= $_;
>>>>>> }
>>>>>> close(FH);
>>>>>> 
>>>>>> my $path="http://www.biomart.org/biomart/martservice?";
>>>>>> my $request =
>>>>>> HTTP::Request->new("POST",$path,HTTP::Headers->new(),'query='.$xml."\n");
>>>>>> my $ua = LWP::UserAgent->new;
>>>>>> 
>>>>>> my $response;
>>>>>> my $tmp = "$ARGV[1]_tmp";
>>>>>> my $fileRes = $ARGV[1];;
>>>>>> $ua->request($request, "$tmp");
>>>>>> system("cat $tmp >> $fileRes; rm $tmp");
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> List admin (including subscribe/unsubscribe):
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>> 
>>>>> Rhoda Kinsella Ph.D.
>>>>> Ensembl Production Project Leader,
>>>>> European Bioinformatics Institute (EMBL-EBI),
>>>>> Wellcome Trust Genome Campus,
>>>>> Hinxton
>>>>> Cambridge CB10 1SD,
>>>>> UK.
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 





More information about the Dev mailing list