[ensembl-dev] Question Regarding Bio::EnsEMBL::Mapper method fastmap.

ian Longden ianlongden at gmail.com
Thu Apr 26 13:59:21 BST 2012


Will,
You may want to project all your features to the top level, Ensembl did
this some years ago to speed up the mapping process. This improved the speed
greatly for all the feature routines.

-Ian.

On Thu, Apr 26, 2012 at 1:51 PM, Will Chow <wc2 at sanger.ac.uk> wrote:

> Thanks Andy for your detailed email.
>
> We have features that are built on various coord systems in the
> dna_align_feature table, so the dna_align_featurebuild.level key, I'm not
> sure if it will have achieve the desired effect.
>
> Regarding your second point of using project_to_slice method, I think I
> will investigate into this.  Currently I use the generic
> DnaAlignFeatureAdaptor to fetch all the features, but like you said, I may
> have to alter a few things to get it to return the location specific
> results.
>
> thanks again for your help.
>
> Will
>
>
>
> On 26 Apr 2012, at 11:29, Andy Yates wrote:
>
> > Hi Will,
> >
> > As you suspected Ensembl does something slightly differently here. We
> place our features on the top-level; you can see this from the meta table
> where we see the %build.level% keys. When this level is different to the
> one you have retrieved a feature for that's when we start the remapping
> process. Since our features are placed on the top-level we do not see these
> issues when a contig has been mapped to more than one sequence region in a
> coordinate system as we a look at the issue top down not bottom up.
> >
> > The API can deal with these situations but you cannot rely on it
> automatically figuring this out. All features have a project_to_slice()
> method where you can give the target slice you want the feature projected
> to. This does mean you have to find the Slices your contig in question will
> project to; filter them where $slice->is_reference() is true and then give
> this slice to the project_to_slice() method.
> >
> > I would also change the dna_align_featurebuild.level key in your meta
> table to the coordinate system level your align features are on (probably
> contig or seqlevel). That way you avoid the API attempt to remap to the
> top-level.
> >
> > Hope some of this has helped you out,
> >
> > Andy
> >
> > Andrew Yates                   Ensembl Core Software Project Leader
> > EMBL-EBI                       Tel: +44-(0)1223-492538
> > Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> > Cambridge CB10 1SD, UK         http://www.ensembl.org/
> >
> > On 25 Apr 2012, at 12:26, Will Chow wrote:
> >
> >> Hi Andy,
> >>
> >> Much thanks for your explanation, it is very helpful.
> >>
> >> I am exactly in that situation.  I have features on the
> contig(sequence) level, and need to project to top level, in which the
> contig assembles to a chromosome and a patch.  I have been getting
> differing results in terms of features fetched, so I have been following
> through all the code you mention and isolated down to the fastmap method.
>  Perhaps there is something in the schema(meta table) I am missing,
> assembly exceptions table should be nearly identical to what is seen on the
> core database, but I should double check.
> >>
> >> Are the features involved in a patch region, in the core db, mapped on
> the top level, i.e. different from my situation where things will be
> projected?
> >>
> >> again thanks for your help.
> >>
> >> Will
> >>
> >> On 25 Apr 2012, at 12:04, Andy Yates wrote:
> >>
> >>> Hi Will,
> >>>
> >>> The code you have pointed out will only be executed if we are
> projecting between coordinate systems.
> DnaAlignFeatureAdaptor::_objs_from_sth() requires the mapper be passed into
> it from BaseAdaptor::generic_fetch(). This does happen in
> BaseFeatureAdaptor::_slice_fetch() but only when a feature's coordinate
> system is not the same as the querying slice's coordinate systems. The
> schema stores patches/haplotypes as assembly exceptions and therefore are
> held as mappings between the reference chromosome and the exception. We do
> not store the relationship of say chromosome 21 from NCBI36 to
> HSCHR21_2_CTG1_1 in GRCh37.p6 (a coordinate system mapping) so there are no
> multiple mappings.
> >>>
> >>> Hope this helps,
> >>>
> >>> Andy
> >>>
> >>> Andrew Yates                   Ensembl Core Software Project Leader
> >>> EMBL-EBI                       Tel: +44-(0)1223-492538
> >>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> >>> Cambridge CB10 1SD, UK         http://www.ensembl.org/
> >>>
> >>> On 25 Apr 2012, at 11:27, Will Chow wrote:
> >>>
> >>>> Hi dev'ers,
> >>>>
> >>>> Regarding Bio::EnsEMBL::Mapper::fastmap method and its use in the
> DnaAlignFeatureAdaptor.pm.
> >>>>
> >>>> from what I gather, the use is to return seq_region_id, start, end,
> strand information of one coordinate system from information of another
> coord sys (i.e. project).  However fastmap seems to only return one set of
> information, as seen in the code.
> >>>>
> >>>> I was wondering if there are multiple mappings (like the human
> patches), which set of information (seq_region_id) will be returned?
> >>>>
> >>>> perhaps I'm missing something else from the code, which explains
> this, if so perhaps you can point me to this.
> >>>>
> >>>> much thanks
> >>>>
> >>>> Will
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Dev mailing list    Dev at ensembl.org
> >>>> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> >>>> Ensembl Blog: http://www.ensembl.info/
> >>>
> >>>
> >>> _______________________________________________
> >>> Dev mailing list    Dev at ensembl.org
> >>> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> >>> Ensembl Blog: http://www.ensembl.info/
> >>
> >>
> >> _______________________________________________
> >> Dev mailing list    Dev at ensembl.org
> >> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> >> Ensembl Blog: http://www.ensembl.info/
> >
> >
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120426/423c467a/attachment.html>


More information about the Dev mailing list