[ensembl-dev] question about chain-net results in table 'genomic_align'

Javier Herrero jherrero at ebi.ac.uk
Tue Jan 17 09:51:25 GMT 2012


Dear Zhang

Yes, the alignments can span an assembly gap. These are represented as 
N's in the sequence, which is like a hard-masked sequence.

Could please explain when and how you get the exception about not 
finding the sequence pieces?

Kind regards

Javier

On 04/12/11 13:11, Zhang Di wrote:
> Hi,
>
>     Finally I got the compara pipeline for whole genome alignment to work.
>
>     The results of RAW, CHAIN, and NET are all stored in table 
> 'genomic_align' distinguished by different method_link_species_id.
>     I found that the some records of CHAIN and NET, contain a few base 
> pairs belong to gap region of its scaffold.
>
> e.g.
>
>      mysql> select method_link_species_set_id, dnafrag_id,
>     dnafrag_start, dnafrag_end from genomic_align where dnafrag_id =
>     4465 and dnafrag_start=486;
>     +----------------------------+------------+---------------+-------------+
>     | method_link_species_set_id | dnafrag_id | dnafrag_start |
>     dnafrag_end |
>     +----------------------------+------------+---------------+-------------+
>     |                          2 |       4465 |           486 |      
>       567 |
>     |                          3 |       4465 |           486 |      
>       567 |
>     +----------------------------+------------+---------------+-------------+
>
>
> while for the dnafrag_id = 4465 , in my core database it is 
> scaffold_2621 , seq_region_id = 429785:
>
>     mysql> select * from assembly where asm_seq_region_id = 429785;
>     +-------------------+-------------------+-----------+---------+-----------+---------+-----+
>     | asm_seq_region_id | cmp_seq_region_id | asm_start | asm_end |
>     cmp_start | cmp_end | ori |
>     +-------------------+-------------------+-----------+---------+-----------+---------+-----+
>     |            429785 |            181573 |       488 |     717 |  
>           1 |     230 |  -1 |
>     |            429785 |            191688 |         1 |     419 |  
>           1 |     419 |   1 |
>     |            429785 |            220761 |       718 |    1086 |  
>           1 |     369 |   1 |
>     +-------------------+-------------------+-----------+---------+-----------+---------+-----+
>
>
>
> the 420 - 487 interval is a gap.
>
> Is this normal result of CHAIN-NET ?
>
> It is quite annoying because I want to use the compara_db for low 
> coverage gene build, and It will complain:
>
> EXCEPTION:
>      Could not find sequence-level pieces for scaffold_2621/486-744
>
> Best reguards
>
> -- 
> Zhang Di
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-- 
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120117/07055f27/attachment.html>


More information about the Dev mailing list