[ensembl-dev] Confused by the target/query core database in whole genome alignment based gene build

Zhang Di aureliano.jz at gmail.com
Wed Apr 18 16:47:40 BST 2012


And these unscaffolded contigs. They may be included into genes by the so
called low coverage projection.

Really curious to know.

On Wed, Apr 18, 2012 at 11:42 PM, Zhang Di <aureliano.jz at gmail.com> wrote:

> Hi,
>
> So NOW how do you deal with the scaffolding errors in the hight coverage
> genome in the gene build process?
>
> It seems that contigs built from next-generation-sequencing data can be
> really solid while scaffolds may contain non-trivial bad linking
> information.
>
> On Wed, Apr 18, 2012 at 9:43 PM, Dan Barrell <db8 at sanger.ac.uk> wrote:
>
>>  Hi,
>>
>> That's right, the Atlantic Cod was the last case of a low coverage genome
>> that we projected genes onto. For the time being we are too busy with the
>> high coverage genomes though, including the low coverage that are now
>> arriving as high coverage.
>>
>> Dan
>>
>>
>>
>> On 18/04/12 13:16, Zhang Di wrote:
>>
>> Thank you, Dan.
>>
>>  You mentioned that you no longer build on low coverage genomes in
>> Ensembl, what do you guys do with the short reads (such as Illumina GA
>> II/Hiseq) assembled genomes? So far as I know the Atlantic Cod genome which
>> was published last summer in Nature was built based on the projection
>> genebuild.
>>
>> Best Reguards
>>
>> On Wed, Apr 18, 2012 at 4:43 PM, Dan Barrell <db8 at sanger.ac.uk> wrote:
>>
>>>  Hi,
>>>
>>> The document low_coverage_gene_build.txt is quite old and possibly very
>>> out of date as we no longer build on low coverage genomes in Ensembl. As
>>> far as I know, the reason that the semantics of the reference and target
>>> terms got swapped is to do with the importance of directionality in a Net.
>>> When dealing with the low coverage genomes the idea was that they wanted
>>> the species they were projecting onto as the reference because it is
>>> important that each bp in the target species aligns to at most one location
>>> in the reference species.
>>>
>>> I would suggest you also look at the Ensembl Compara documentation which
>>> is maintained here:
>>>
>>> ensembl-compara/docs/README-low-coverage-genome-aligner
>>>
>>> Dan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 17/04/12 12:47, Zhang Di wrote:
>>>
>>>  Hi,
>>>
>>>  I'm using ensembl pipeline for projection genebuild.
>>>
>>>  when I read the doc low_coverage_gene_build.txt, I was confused by the
>>> target/query genome terms.
>>>
>>>  It calls our newly sequenced genome the target, calls the reference
>>> genome the query.
>>>
>>>  It is contrary to lastz terms where target means reference and query
>>> means our sequences.
>>>
>>>  It just OK if I stick to this convention.
>>>
>>>  However,
>>>
>>>  In the whole genome alignment section in the same doc,
>>>
>>>  It says that :
>>>
>>>      "each bp in the target genome should be represented at most once."
>>>
>>>  What does it mean by saying "target"?
>>>
>>>  lastz-chain-net produces the lastz termed "target genome" with this
>>> property.
>>>
>>>  Does it mean that I should set my genome as the reference genome,
>>> while the genome from ensembl such as "human" as the non-reference in the
>>> compara/hive pipeline?
>>>
>>>  I can project human genes to my genome with this somewhat weird
>>> setting, in the next wga2genes step?
>>>
>>>  Some slice of human genome containing genes may exist several times in
>>> the compara_db, how can it produce gene projection right here?
>>>
>>>
>>>  Thanks
>>>
>>>  Best Reguards
>>>
>>>  --
>>> Zhang Di
>>>
>>>
>>>  _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>>
>>  --
>> Zhang Di
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> --
> Zhang Di
>



-- 
Zhang Di
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120418/99a0fc33/attachment.html>


More information about the Dev mailing list