[ensembl-dev] Confused by the target/query core database in whole genome alignment based gene build

Zhang Di aureliano.jz at gmail.com
Wed Apr 18 16:42:23 BST 2012


Hi,

So NOW how do you deal with the scaffolding errors in the hight coverage
genome in the gene build process?

It seems that contigs built from next-generation-sequencing data can be
really solid while scaffolds may contain non-trivial bad linking
information.

On Wed, Apr 18, 2012 at 9:43 PM, Dan Barrell <db8 at sanger.ac.uk> wrote:

>  Hi,
>
> That's right, the Atlantic Cod was the last case of a low coverage genome
> that we projected genes onto. For the time being we are too busy with the
> high coverage genomes though, including the low coverage that are now
> arriving as high coverage.
>
> Dan
>
>
>
> On 18/04/12 13:16, Zhang Di wrote:
>
> Thank you, Dan.
>
>  You mentioned that you no longer build on low coverage genomes in
> Ensembl, what do you guys do with the short reads (such as Illumina GA
> II/Hiseq) assembled genomes? So far as I know the Atlantic Cod genome which
> was published last summer in Nature was built based on the projection
> genebuild.
>
> Best Reguards
>
> On Wed, Apr 18, 2012 at 4:43 PM, Dan Barrell <db8 at sanger.ac.uk> wrote:
>
>>  Hi,
>>
>> The document low_coverage_gene_build.txt is quite old and possibly very
>> out of date as we no longer build on low coverage genomes in Ensembl. As
>> far as I know, the reason that the semantics of the reference and target
>> terms got swapped is to do with the importance of directionality in a Net.
>> When dealing with the low coverage genomes the idea was that they wanted
>> the species they were projecting onto as the reference because it is
>> important that each bp in the target species aligns to at most one location
>> in the reference species.
>>
>> I would suggest you also look at the Ensembl Compara documentation which
>> is maintained here:
>>
>> ensembl-compara/docs/README-low-coverage-genome-aligner
>>
>> Dan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 17/04/12 12:47, Zhang Di wrote:
>>
>>  Hi,
>>
>>  I'm using ensembl pipeline for projection genebuild.
>>
>>  when I read the doc low_coverage_gene_build.txt, I was confused by the
>> target/query genome terms.
>>
>>  It calls our newly sequenced genome the target, calls the reference
>> genome the query.
>>
>>  It is contrary to lastz terms where target means reference and query
>> means our sequences.
>>
>>  It just OK if I stick to this convention.
>>
>>  However,
>>
>>  In the whole genome alignment section in the same doc,
>>
>>  It says that :
>>
>>      "each bp in the target genome should be represented at most once."
>>
>>  What does it mean by saying "target"?
>>
>>  lastz-chain-net produces the lastz termed "target genome" with this
>> property.
>>
>>  Does it mean that I should set my genome as the reference genome, while
>> the genome from ensembl such as "human" as the non-reference in the
>> compara/hive pipeline?
>>
>>  I can project human genes to my genome with this somewhat weird
>> setting, in the next wga2genes step?
>>
>>  Some slice of human genome containing genes may exist several times in
>> the compara_db, how can it produce gene projection right here?
>>
>>
>>  Thanks
>>
>>  Best Reguards
>>
>>  --
>> Zhang Di
>>
>>
>>  _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
>  --
> Zhang Di
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 
Zhang Di
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120418/191350e9/attachment.html>


More information about the Dev mailing list