[ensembl-dev] versioning of ensembl & biomart
vincent ranwez
vincent.ranwez at univ-montp2.fr
Wed Dec 14 12:10:49 GMT 2011
Hi,
I adapt my code to handle the special case of the devil. So that know my first script is OK. But I encounter a problem when I try to collect canonical transcript Id. This feature was available in release 64
<Dataset name = "hsapiens_gene_ensembl" interface = "default" >
<Attribute name = "ensembl_gene_id" />
<Attribute name = "canonical_transcript_stable_id" />
</Dataset>
but it seems to have disappear from release 65 (my script return an error). I checked the ensembl web interface of biomart and this "attribute" was present in the graphical interface of v64 but disappear from the interface of release 65. Is it a bug or is the same information available somewhere else under a new name ?
thank you for your help
Vincent
Le 14 déc. 2011 à 11:24, rhoda at ebi.ac.uk a écrit :
> Hi Vincent
> I am glad that your script now works. With regard to the attribute name
> for the Tasmanian devil, this is just an internal naming of this attribute
> in the configuration for the website and does not make the data retrieved
> from this column in the mart database any less reliable. We have recently
> had a discussion with the Ensembl Compara team and are planning to tidy up
> the configuration for release 66 and I will make sure that the internal
> naming is more consistent from release 66 onward. Thank you for your
> feedback.
> Regards
> Rhoda
>
>
>> Hi,
>>
>> I launch my script this morning on your server and it works fine except
>> for the tasmania devil. Indeed the orthology field for this species does
>> not use the same convention as other species :
>> <Dataset name = "hsapiens_gene_ensembl" interface = "default" >
>> <Attribute name = "ensembl_gene_id" />
>> <Attribute name = "ensembl_transcript_id" />
>> <Attribute name = "devil_ensembl_gene" />
>> <Attribute name = "homolog_shar__dm_description_4014" />
>> <Attribute name = "cat_ensembl_gene" />
>> <Attribute name = "cat_orthology_type" />
>> <Attribute name = "chimp_ensembl_gene" />
>> <Attribute name = "chimp_orthology_type" />
>> </Dataset>
>> So I have to make a special case for this species. This is a minor
>> problem, but I was wondering if it has a special meaning (i.e. less
>> reliable prediction of the homology ?)
>>
>> thanks for your help,
>>
>> Vincent
>>
>> Le 14 déc. 2011 à 09:05, rhoda at ebi.ac.uk a écrit :
>>
>>> Hi Vincent
>>> Unfortunately, someone has been hitting our mart servers with a lot of
>>> queries over the past few days and we are trying to resolve the
>>> connectivity issue. Can you try your query again today and let me know
>>> if
>>> you can retrieve your results? You will need to keep an eye on
>>> www.biomart.org to determine when this is updated with the new release
>>> 65
>>> databases or perhaps you could subscribe to the biomart users mailing
>>> list
>>> (users at biomart.org). The BioMart team are generally quite quick to
>>> update
>>> these databases once we have added them to the public mysql server and
>>> they usually let me know when they have been added to the biomart
>>> central
>>> portal. I then email the biomart users mailing list and bioconductor
>>> mailing list to let everyone know about the fixes and additions in the
>>> new
>>> marts. I hope that helps.
>>> Regards
>>> Rhoda
>>>
>>>
>>>> Hi,
>>>>
>>>> thank you very much for this answer. I tried to set the path to
>>>> "http://www.ensembl.org/biomart/martservice?" but I got error message
>>>> 'too
>>>> many connection" it is thus probably wiser to wait a couple of days so
>>>> that "http://www.biomart.org/biomart/martservice?" will be updated and
>>>> ensembl server less loaded... Is there a way to know when this server
>>>> is
>>>> updated (apart from launching a request that give different result on
>>>> ensembl v64 and v65) ?
>>>>
>>>> Thank you again, I really appreciate the reactivity of the Ensembl team
>>>> on
>>>> this forum I think this is part of Ensembl success.
>>>>
>>>> sincerely,
>>>>
>>>> Vincent
>>>>
>>>>
>>>> Le 13 déc. 2011 à 13:53, Rhoda Kinsella a écrit :
>>>>
>>>>> Hi Vincent
>>>>> In your webExample.pl script you are pointing to
>>>>> "http://www.biomart.org/biomart/martservice?" which should always
>>>>> point
>>>>> to the most recent Ensembl release. As it has only been a few days
>>>>> since
>>>>> the Ensembl release 65, the www.biomart.org central portal has not yet
>>>>> been updated to include the new databases. I expect that these will be
>>>>> updated some time this week. If you would like to use the Ensembl
>>>>> release 65 mart databases, you should set your path to
>>>>> "http://www.ensembl.org/biomart/martservice?" and then run your query.
>>>>> To obtain various archive releases, first determine the URL for the
>>>>> archive you wish to access using the following link:
>>>>>
>>>>> http://www.ensembl.org/info/website/archives/index.html
>>>>>
>>>>> If you select Ensembl release 63 from the list on the right hand side
>>>>> of
>>>>> the screen, and click on the BioMart link at the top of the page this
>>>>> will bring you to this URL:
>>>>>
>>>>> http://jun2011.archive.ensembl.org/biomart/martview/
>>>>>
>>>>> You can use this URL in the webExample.pl script if you modify the URL
>>>>> to have "/martservice?" at the end of the URL, like this:
>>>>>
>>>>> http://jun2011.archive.ensembl.org/biomart/martservice?
>>>>>
>>>>> This will allow you to query the Ensembl release 63 marts. I hope this
>>>>> helps but please don't hesitate to contact me if you have further
>>>>> questions.
>>>>> Regards
>>>>> Rhoda
>>>>>
>>>>>
>>>>>
>>>>> On 13 Dec 2011, at 12:37, vincent ranwez wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> we are using XML biomart query (and a small perl script to launch
>>>>>> this
>>>>>> XML query) to collect some Ensembl information. I understand that
>>>>>> there
>>>>>> is distinct versioning of Ensembl and biomart, but I would like to
>>>>>> know
>>>>>> which Ensembl version is queried when using a XML query and how to
>>>>>> query peculiar version of Ensembl. It seems to me that my XML queries
>>>>>> are done on Ensembl v64, is there a way to query v65 by modifying
>>>>>> either the XML file (with the virtualSchemaName attribute ?) or the
>>>>>> perl script (provided at the end of this mail).
>>>>>>
>>>>>> Biomart web site provides an example to check default configuration :
>>>>>> http://www.biomart.org/biomart/martservice?type=configuration&dataset=hsapiens_gene_ensembl
>>>>>> but this does only provide the genome version and not the ensembl
>>>>>> version. For instance this web page indicate that Homo sapiens genes
>>>>>> (GRCh37.p5) is used but this is common to both version 64 and 65 of
>>>>>> Ensembl that provide different results for simple query such as the
>>>>>> list of human geneId and transcriptId (v64 178,538 results, v65
>>>>>> 181,745). Moreover this does not explain how to use a specific
>>>>>> configuration pointing toward a given Ensembl release.
>>>>>>
>>>>>> I hope you can help me to solve this problem.
>>>>>>
>>>>>> sincerely,
>>>>>>
>>>>>> Vincent Ranwez
>>>>>>
>>>>>>
>>>>>> ###################################
>>>>>> perl script use to run XML query files generated via ensembl web
>>>>>> interface of biomart
>>>>>> ###################################
>>>>>>
>>>>>> use strict;
>>>>>> use LWP::UserAgent;
>>>>>>
>>>>>>
>>>>>> open (FH,"$ARGV[0]") || die ("\nUsage: perl webExample.pl Query.xml
>>>>>> outupFile (pb with arg0)\n\n");
>>>>>> open (FILE,">>","$ARGV[1]") || die ("\nUsage: perl webExample.pl
>>>>>> Query.xml outputFile (pb with arg1)\n\n");
>>>>>> close (FILE);
>>>>>>
>>>>>> my $xml;
>>>>>> while (<FH>){
>>>>>> $xml .= $_;
>>>>>> }
>>>>>> close(FH);
>>>>>>
>>>>>> my $path="http://www.biomart.org/biomart/martservice?";
>>>>>> my $request =
>>>>>> HTTP::Request->new("POST",$path,HTTP::Headers->new(),'query='.$xml."\n");
>>>>>> my $ua = LWP::UserAgent->new;
>>>>>>
>>>>>> my $response;
>>>>>> my $tmp = "$ARGV[1]_tmp";
>>>>>> my $fileRes = $ARGV[1];;
>>>>>> $ua->request($request, "$tmp");
>>>>>> system("cat $tmp >> $fileRes; rm $tmp");
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list Dev at ensembl.org
>>>>>> List admin (including subscribe/unsubscribe):
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>> Rhoda Kinsella Ph.D.
>>>>> Ensembl Production Project Leader,
>>>>> European Bioinformatics Institute (EMBL-EBI),
>>>>> Wellcome Trust Genome Campus,
>>>>> Hinxton
>>>>> Cambridge CB10 1SD,
>>>>> UK.
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
More information about the Dev
mailing list