[ensembl-dev] recalculated BLAST results

Matthieu Muffato muffato at ebi.ac.uk
Wed Oct 30 17:55:46 GMT 2013


Dear Matthew and Indi,

We usually provide this data only on request, and I'd like to make sure 
that someone is actually going to use it before proceeding.

Let me emphasize that the table is significantly large (about 1 billion 
rows) and has a very limited format:
   sequence_identifier1 sequence_identifier2 log(evalue)
Sequence identifiers are numeric and have to be linked to another 
Compara table, only 1 hit is stored for each pair. As you can see, most 
of the blast output is deliberately thrown away.

This table is currently an intermediate steps of our pipelines and we're 
looking into ways of removing that step because of the heavy computing 
power that it requires. If it is of confirmed use for anyone, we can 
make a dump of the current data (e73) and distribute it. But be aware 
that there is no guarantee we can provide an updated version in the future

Best regards,
Matthieu

On 30/10/13 16:43, Healy, Matthew wrote:
>
> If you could make this available on your ftp site, I think it might be of wider usefulness.
>
> ________________________________________
> From: dev-bounces at ensembl.org [dev-bounces at ensembl.org] On Behalf Of Matthieu Muffato [muffato at ebi.ac.uk]
> Sent: Wednesday, October 30, 2013 12:38 PM
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] recalculated BLAST results
>
> Dear Inti
>
> The comparative genomics team computes all-vs-all blastb for the Ensembl
> and UniProt (SwissProt + TREMBL) proteins. I believe this partly covers
> the NCBI nr protein database.
> However, we only store the e-values, and the data is held internally and
> not released. Please let us know if you are interested in a dump of this
> table.
>
> Best regards,
> Matthieu
>
> On 29/10/13 22:34, Inti pedroso wrote:
>> Dear,
>> Are there available pre-calculated BLAST (or other sequence comparison software) searches results for any species on Ensembl against the NCBI nr database? If there are, how can I access this results in bulk or programatically.
>>
>> BW,
>> Inti Pedroso
>


-- 
Matthieu Muffato, Ph.D.
Ensembl Developer and Ensembl Compara Manager
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom




More information about the Dev mailing list