[ensembl-dev] Question about gene build pipeline - similarity build

Fri Jun 10 10:02:01 BST 2011

Hi Wenkai,

We still use this method for genewise alignments of uniprot proteins.
Genscan tends to overpredict gene structures, which means the likelihood of
missing a region where there is a gene is fairly low.
For other alignments using genewise, we still target specific regions of
the genome but using different tools than Genscan, for example Pmatch.

Also, we use other alignment tools like exonerate, which will be
genome-wide.

By combining different approaches, we try to make sure that what will be
missed out by one will be recovered by an other.

Hope that helps,
mag

On Fri, 10 Jun 2011 08:51:29 +0000, 江JWK <biology0046 at hotmail.com> wrote:
> Hi, all,
> I have read documentation relevant to gene structure prediction jobs.
> 
> For the similarity build section,
> The doc said that genomes were first scanned for putative gene locus
using
> genscan, 
> then evidences like proteins from uniprot are blast agaisnt these
genscan
> derived gene peptides.
> 
> This 'genscan->uniprot blast to genscan locus->genewise' approach was
> supposed to be saving the pipeline running times.
> But this approach may missed several true gene locus as genscan (or
other
> ab initio methods) could not identify them.
> 
> To overcome this 'low predicative power', and with the development of
> computer powers, at least:
> during recent years, I found most genomes were annotated through this
> procedure: 
> uniprot->blast to genome -> target region genewise.
> 
> dose the current genebuild pipeline has some modules or routines can do
> this?
> 
> or currently genes built by ensembl still used the tradition routines
> (genscan->uniprot blast to genscan locus->genewise)?
> 
> best regards!
> 
> Wenkai