[ensembl-dev] Confused by the target/query core database in whole genome alignment based gene build

Dan Barrell db8 at sanger.ac.uk
Wed Apr 18 14:43:11 BST 2012


Hi,

That's right, the Atlantic Cod was the last case of a low coverage 
genome that we projected genes onto. For the time being we are too busy 
with the high coverage genomes though, including the low coverage that 
are now arriving as high coverage.

Dan


On 18/04/12 13:16, Zhang Di wrote:
> Thank you, Dan.
>
> You mentioned that you no longer build on low coverage genomes in 
> Ensembl, what do you guys do with the short reads (such as Illumina GA 
> II/Hiseq) assembled genomes? So far as I know the Atlantic Cod genome 
> which was published last summer in Nature was built based on the 
> projection genebuild.
>
> Best Reguards
>
> On Wed, Apr 18, 2012 at 4:43 PM, Dan Barrell <db8 at sanger.ac.uk 
> <mailto:db8 at sanger.ac.uk>> wrote:
>
>     Hi,
>
>     The document low_coverage_gene_build.txt is quite old and possibly
>     very out of date as we no longer build on low coverage genomes in
>     Ensembl. As far as I know, the reason that the semantics of the
>     reference and target terms got swapped is to do with the
>     importance of directionality in a Net. When dealing with the low
>     coverage genomes the idea was that they wanted the species they
>     were projecting onto as the reference because it is important that
>     each bp in the target species aligns to at most one location in
>     the reference species.
>
>     I would suggest you also look at the Ensembl Compara documentation
>     which is maintained here:
>
>     ensembl-compara/docs/README-low-coverage-genome-aligner
>
>     Dan
>
>
>
>
>
>
>
>
>
>
>     On 17/04/12 12:47, Zhang Di wrote:
>>     Hi,
>>
>>     I'm using ensembl pipeline for projection genebuild.
>>
>>     when I read the doc low_coverage_gene_build.txt, I was confused
>>     by the target/query genome terms.
>>
>>     It calls our newly sequenced genome the target, calls the
>>     reference genome the query.
>>
>>     It is contrary to lastz terms where target means reference and
>>     query means our sequences.
>>
>>     It just OK if I stick to this convention.
>>
>>     However,
>>
>>     In the whole genome alignment section in the same doc,
>>
>>     It says that :
>>
>>         "each bp in the target genome should be represented at most
>>     once."
>>
>>     What does it mean by saying "target"?
>>
>>     lastz-chain-net produces the lastz termed "target genome" with
>>     this property.
>>
>>     Does it mean that I should set my genome as the reference genome,
>>     while the genome from ensembl such as "human" as the
>>     non-reference in the compara/hive pipeline?
>>
>>     I can project human genes to my genome with this somewhat weird
>>     setting, in the next wga2genes step?
>>
>>     Some slice of human genome containing genes may exist several
>>     times in the compara_db, how can it produce gene projection right
>>     here?
>>
>>
>>     Thanks
>>
>>     Best Reguards
>>
>>     -- 
>>     Zhang Di
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>     List admin (including subscribe/unsubscribe):http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog:http://www.ensembl.info/
>
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     List admin (including subscribe/unsubscribe):
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> -- 
> Zhang Di
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120418/52456fd2/attachment.html>


More information about the Dev mailing list