[ensembl-dev] Confused by the target/query core database in whole genome alignment based gene build

Zhang Di aureliano.jz at gmail.com
Tue Apr 17 12:47:14 BST 2012


Hi,

I'm using ensembl pipeline for projection genebuild.

when I read the doc low_coverage_gene_build.txt, I was confused by the
target/query genome terms.

It calls our newly sequenced genome the target, calls the reference genome
the query.

It is contrary to lastz terms where target means reference and query means
our sequences.

It just OK if I stick to this convention.

However,

In the whole genome alignment section in the same doc,

It says that :

    "each bp in the target genome should be represented at most once."

What does it mean by saying "target"?

lastz-chain-net produces the lastz termed "target genome" with this
property.

Does it mean that I should set my genome as the reference genome, while the
genome from ensembl such as "human" as the non-reference in the
compara/hive pipeline?

I can project human genes to my genome with this somewhat weird setting, in
the next wga2genes step?

Some slice of human genome containing genes may exist several times in the
compara_db, how can it produce gene projection right here?


Thanks

Best Reguards

-- 
Zhang Di
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120417/42e32181/attachment.html>


More information about the Dev mailing list