[ensembl-dev] project()->to_Slice()->start()

Kieron Taylor ktaylor at ebi.ac.uk
Fri Jun 14 15:52:13 BST 2013


Hi Martin,

Thank you for your detailed description. The situation is complex, so my 
response is prefixed with the possibility that I have misunderstood your 
needs. Your code is fine, but some of your expectations about what 
certain values mean, are only partially correct.

from_start() refers to the location in the original slice the current 
Projection Segment is starting. It will always be 1 if there is only one 
Segment. from_end() is also relative to the original Slice.

The true from_start() and from_end() would be these values plus the 
original $slice->start() minus 1.

Once a Slice has been projected, it no longer has any knowledge of the 
original Slice, but is intended to be used in the context of the 
original Slice. It is true that with the projected Slice, you cannot 
retrieve the asm_start, because all of its coordinates are now in their 
own frame of reference, beginning with individual sequence regions.

The highest level of representation ('toplevel') is what dictates the 
asm coordinates, and given that many assemblies are made of more than 
clones and contigs, the asm coordinates are not typically 'correct' at 
the contig level. Therefore in the general case, one must project to 
toplevel to get the asm coordinates.

I believe that in your case, you could compute your asm_coordinates as a 
composite of original contig Slice coordinates, and the projected clone 
Slices, but the officially correct method of deriving asm coordinates is 
by projecting from where you are to top level. This may sound circular 
for your simple case, but it is necessary in the grander scheme of 
things. This is why the Slice object is unable to return the information 
you require.

I believe we can afford to improve our documentation on these methods.


-- 
Kieron Taylor PhD.
Ensembl Core team
EBI

On 14/06/2013 09:43, Martin Ayling (TGAC) wrote:
> Hi,
>
> I'm having trouble retrieving start and end positions of clones within a
> contig, using project() in ensembl release 62.
>
> I have a relatively empty database (only meta, assembly, coord_system
> and seq_region tables are non-empty) which contains the details of an
> FPC generated physical map. The clones have defined start and end
> positions on their respective contigs, but no respective sequence data.
> They are not truncated; the cmp_start and cmp_end values are always the
> beginning and end of the clone, and the asm_start/end positions of a
> clone are permitted to overlap those of other clones.
>
> When I project a contig back onto the 'clone' coordinate level, I
> receive a tiling path for that contig which is consistent with the
> asm_start values of each clone within the contig. However, if I try to
> retrieve the start position of any clone (to_Slice()->start()), it
> returns the wrong value (always '1') and the end position is always the
> length of the clone in question. The values returned by from_start() and
> from_end() are also incorrect (although they are do at least increase
> with each clone in the path, and are consistent with the length of a
> given clone).
>
> ...
> my $fpcCtg = $sa->fetch_by_region('fpc_ctg',$current_ctg);
> foreach my $cloneProj (@{$fpcCtg->project('clone')}){
>          my $clone = $cloneProj->to_Slice();
>          print $fpcCtg->seq_region_name(), ':',
>              $cloneProj->from_start(), '-',
>              $cloneProj->from_end(), ' -> ',
>              $clone->seq_region_name(), ':',
>              $clone->start(), '-',$clone->end(), '-',
>              $clone->strand(), "\n";
> }
> ...
> Output:
> ctg6:1-97000 -> 3DL140_K09:1-97000-1
> ctg6:112001-208000 -> 3DL041_A17:1-96000-1
> ctg6:239001-331000 -> 3DL106_D16:1-92000-1
> ctg6:380001-536000 -> 3DL140_K10:1-156000-1
> ctg6:641001-731000 -> 3DL031_J17:1-90000-1
> ...
>
> Is there any way to retrieve the values of asm_start/end using
> project()? Or should I directly retrieve this using a mysql query?
>
> Thanks,
>
> Martin Ayling
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>






More information about the Dev mailing list