[ensembl-dev] assembly with multiple mapping paths

Sharon Wei weix at cshl.edu
Thu Jan 13 20:18:40 GMT 2011


Hi Ian,

Thanks for the reply. The case 2 is what we have here. So the solution 
is to create dummy scaffolds for those contigs.


Sharon


On 1/13/11 5:40 AM, ian Longden wrote:
> The mapping code can only deal with going from one coordinate system
> to another via one path else how would it know which is the proper
> one.
>
> In the mapping data do you have the chromosomes being assembled from contigs and
> another scaffolds (case 1)
>
>
>                                          Chr A
>                                  ---------------------------
>
>                                         contig 1
>                                  ---------------------------
>
>
>
> and also
>
>
>                                          Chr A
>                                  ---------------------------
>
>                                         scaffold 1
>                                  ---------------------------
>
>
>
> or is it more like (case 2)
>
>
>
>                                          Chr A
>                                  ---------------------------
>
>                                   contig 1  scaffold 1
>                                  ------------   ---------------
>
>                                                   contig 2
>                                                 ---------------
>
> or  (case 3)
>
>
>
>                                          Chr A
>                                  ---------------------------
>
>                                         scaffold 1
>                                  ---------------------------
>                                         contig 1
>                                  ---------------------------
>
>
>
> Hopefully the email reader will not mess up all the spaces....
>
> Cases 1 (though not very good)  and 3 are doable but case 2 is not.
>
> In case 2 you would have to add a dummy scaffold to make it work:-
>
>
>
>                                          Chr A
>                                  ---------------------------
>
>                                  dummy 1   scaffold 1
>                                  ------------   ---------------
>
>                                   contig 1    contig 2
>                                  -------------- ---------------
>
>
> So that the path is always chromosome ->  scaffold ->  contig.
>
> Does this make sense, if i could get a better idea about the data i
> would be able to help more.
>
>
> -Ian
> Ensembl Developer.
>
>
> On Thu, Jan 13, 2011 at 2:02 AM, Sharon Wei<weix at cshl.edu>  wrote:
>> Dear ensemblers,
>>
>> Does any one know how to use ensembl core API to fetch sequences from
>> assembly with multiple mapping paths? I have trouble getting correct
>> sequences by $SliceAdaptor->fetch_by_region( $cs, $seq_region_name) when the
>> tilling path is made up of 2 different component coordinate systems;
>>
>> In this genome, there are 3 coordinate systems: chromosome, scaffold,
>> contig. The sequence level coordinate system is contig.  Two AGP files, one
>> contains scaffold tiling path from contig, the other contains chromosome
>> tiling path from both scaffolds and contigs. Both tiling paths were loaded
>> into assembly table. In meta table, multiple mapping paths were assigned to
>> "assembly.mapping" including "chromsome|scaffold" and "chromosome|contig",
>> see the following tables.
>>
>> mysql>  select * from coord_system;
>> coord_system_id species_id      name    version rank    attrib
>> 1       1       chromosome      454.2pools.2009 1       default_version
>> 2       1       scaffold        454.2pools.2009 2       default_version
>> 3       1       contig  454.2pools.2009 3
>> default_version,sequence_level
>>
>>
>> mysql>  select * from meta where meta_key='assembly.mapping';
>> meta_id species_id      meta_key        meta_value
>> 30      1       assembly.mapping
>>   chromosome:454.2pools.2009|scaffold:454.2pools.2009
>> 87      1       assembly.mapping
>>   chromosome:454.2pools.2009|scaffold:454.2pools.2009|contig:454.2pools.2009
>> 29      1       assembly.mapping
>>   scaffold:454.2pools.2009|contig:454.2pools.2009
>> 117    1       assembly.mapping
>>   chromosome:454.2pools.2009|contig:454.2pools.2009
>>
>> An excerpt of the chr AGP file is: (notice there are both scaffold and
>> contig):
>> ...
>> O.brachyantha_V1.0      1       152187  1       W
>> Obrachyantha03S_1.scaffold00624 1       152187  +
>> O.brachyantha_V1.0      152188  152287  2       N       100     fragment
>>     no
>> O.brachyantha_V1.0      152288  159378  3       W
>> Obrachyantha03S_1.contig00345   1       7091    -
>> O.brachyantha_V1.0      159379  159478  4       N       100     fragment
>>     no
>> O.brachyantha_V1.0      159479  383477  5       W
>> Obrachyantha03S_1.scaffold00031 1       223999  +
>> O.brachyantha_V1.0      383478  383577  6       N       100     fragment
>>     no
>> O.brachyantha_V1.0      383578  404096  7       W
>> Obrachyantha03S_1.scaffold00086 1       20519   -
>> O.brachyantha_V1.0      404097  404196  8       N       100     fragment
>>     no
>> ...
>>
>> However, when I use $SliceAdaptor->fetch_by_region( chromosome,
>> $seq_region_name) to retrieve chromosome sequences, all the scaffold
>> contributed regions were returned as Ns, only regions assembled directly
>> from contigs have actual sequences. I also got a warning of
>> "Meta table specifies multiple mapping paths between coord systems
>> chromosome and contig.
>>   Choosing shorter path arbitrarily.
>> "
>> If I delete meta_id=117, mapping path of chromosome|contig, the warnings
>> disappeared, but the regions assembled directly from contigs will be Ns.
>>
>> It seems there is no way to fetch the complete correct chromosome sequence
>> made up of both scaffolds and contigs. There is no restriction on AGP
>> specification to prevent multiple component coordinate systems. So this
>> should be a legitimate case.
>>
>> My question is, is this a potential bug in the API? Is there any way to make
>> it work by playing with the mapping paths in the meta or assembly table?
>>
>> Any help is appreciated.
>>
>> Thanks,
>>
>>
>> Sharon
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>>





More information about the Dev mailing list