[ensembl-dev] Confused by the target/query core database in whole genome alignment based gene build
Zhang Di
aureliano.jz at gmail.com
Wed Apr 18 16:42:23 BST 2012
Hi,
So NOW how do you deal with the scaffolding errors in the hight coverage
genome in the gene build process?
It seems that contigs built from next-generation-sequencing data can be
really solid while scaffolds may contain non-trivial bad linking
information.
On Wed, Apr 18, 2012 at 9:43 PM, Dan Barrell <db8 at sanger.ac.uk> wrote:
> Hi,
>
> That's right, the Atlantic Cod was the last case of a low coverage genome
> that we projected genes onto. For the time being we are too busy with the
> high coverage genomes though, including the low coverage that are now
> arriving as high coverage.
>
> Dan
>
>
>
> On 18/04/12 13:16, Zhang Di wrote:
>
> Thank you, Dan.
>
> You mentioned that you no longer build on low coverage genomes in
> Ensembl, what do you guys do with the short reads (such as Illumina GA
> II/Hiseq) assembled genomes? So far as I know the Atlantic Cod genome which
> was published last summer in Nature was built based on the projection
> genebuild.
>
> Best Reguards
>
> On Wed, Apr 18, 2012 at 4:43 PM, Dan Barrell <db8 at sanger.ac.uk> wrote:
>
>> Hi,
>>
>> The document low_coverage_gene_build.txt is quite old and possibly very
>> out of date as we no longer build on low coverage genomes in Ensembl. As
>> far as I know, the reason that the semantics of the reference and target
>> terms got swapped is to do with the importance of directionality in a Net.
>> When dealing with the low coverage genomes the idea was that they wanted
>> the species they were projecting onto as the reference because it is
>> important that each bp in the target species aligns to at most one location
>> in the reference species.
>>
>> I would suggest you also look at the Ensembl Compara documentation which
>> is maintained here:
>>
>> ensembl-compara/docs/README-low-coverage-genome-aligner
>>
>> Dan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 17/04/12 12:47, Zhang Di wrote:
>>
>> Hi,
>>
>> I'm using ensembl pipeline for projection genebuild.
>>
>> when I read the doc low_coverage_gene_build.txt, I was confused by the
>> target/query genome terms.
>>
>> It calls our newly sequenced genome the target, calls the reference
>> genome the query.
>>
>> It is contrary to lastz terms where target means reference and query
>> means our sequences.
>>
>> It just OK if I stick to this convention.
>>
>> However,
>>
>> In the whole genome alignment section in the same doc,
>>
>> It says that :
>>
>> "each bp in the target genome should be represented at most once."
>>
>> What does it mean by saying "target"?
>>
>> lastz-chain-net produces the lastz termed "target genome" with this
>> property.
>>
>> Does it mean that I should set my genome as the reference genome, while
>> the genome from ensembl such as "human" as the non-reference in the
>> compara/hive pipeline?
>>
>> I can project human genes to my genome with this somewhat weird
>> setting, in the next wga2genes step?
>>
>> Some slice of human genome containing genes may exist several times in
>> the compara_db, how can it produce gene projection right here?
>>
>>
>> Thanks
>>
>> Best Reguards
>>
>> --
>> Zhang Di
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> --
> Zhang Di
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
--
Zhang Di
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120418/191350e9/attachment.html>
More information about the Dev
mailing list