[ensembl-dev] Question Regarding Bio::EnsEMBL::Mapper method fastmap.

Monika Komorowska monika at ebi.ac.uk
Wed Apr 25 12:41:18 BST 2012


Hi Will

I recently discovered that the ChainedAssemblyMapper has trouble dealing with mappings where the same component region (contig in your case)
to muliple assembled regions, like a chromosome and a patch region.
Using # to separate coordinate systems in the meta_value for meta_key 'assembly.mapping' forces the use of the ChainedAssemblyMapper module.
This results in incorrect mappings being cached where multiple assembled options exist for the same component region.
If you replace # with |, AssemblyMapper will be used instead of the ChainedAssemblyMapper.

I'm in the process of developing documentation on configuring the AssemblyMapper.
I will email the dev list with the location of the doc once it's ready.

Hope this helps

Monika

On 25 Apr 2012, at 12:26, Will Chow wrote:

> Hi Andy,
> 
> Much thanks for your explanation, it is very helpful.
> 
> I am exactly in that situation.  I have features on the contig(sequence) level, and need to project to top level, in which the contig assembles to a chromosome and a patch.  I have been getting differing results in terms of features fetched, so I have been following through all the code you mention and isolated down to the fastmap method.  Perhaps there is something in the schema(meta table) I am missing, assembly exceptions table should be nearly identical to what is seen on the core database, but I should double check.
> 
> Are the features involved in a patch region, in the core db, mapped on the top level, i.e. different from my situation where things will be projected?
> 
> again thanks for your help.
> 
> Will 
> 
> On 25 Apr 2012, at 12:04, Andy Yates wrote:
> 
>> Hi Will,
>> 
>> The code you have pointed out will only be executed if we are projecting between coordinate systems. DnaAlignFeatureAdaptor::_objs_from_sth() requires the mapper be passed into it from BaseAdaptor::generic_fetch(). This does happen in BaseFeatureAdaptor::_slice_fetch() but only when a feature's coordinate system is not the same as the querying slice's coordinate systems. The schema stores patches/haplotypes as assembly exceptions and therefore are held as mappings between the reference chromosome and the exception. We do not store the relationship of say chromosome 21 from NCBI36 to HSCHR21_2_CTG1_1 in GRCh37.p6 (a coordinate system mapping) so there are no multiple mappings.
>> 
>> Hope this helps,
>> 
>> Andy
>> 
>> Andrew Yates                   Ensembl Core Software Project Leader
>> EMBL-EBI                       Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK         http://www.ensembl.org/
>> 
>> On 25 Apr 2012, at 11:27, Will Chow wrote:
>> 
>>> Hi dev'ers,
>>> 
>>> Regarding Bio::EnsEMBL::Mapper::fastmap method and its use in the DnaAlignFeatureAdaptor.pm.
>>> 
>>> from what I gather, the use is to return seq_region_id, start, end, strand information of one coordinate system from information of another coord sys (i.e. project).  However fastmap seems to only return one set of information, as seen in the code.
>>> 
>>> I was wondering if there are multiple mappings (like the human patches), which set of information (seq_region_id) will be returned?
>>> 
>>> perhaps I'm missing something else from the code, which explains this, if so perhaps you can point me to this.
>>> 
>>> much thanks
>>> 
>>> Will
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

Monika Komorowska
EnsEMBL Software Developer

European Bioinformatics Institute (EMBL-EBI)
tel: +44(0) 1233 494 409





More information about the Dev mailing list