[ensembl-dev] database versions and genome builds

Patrick Meidl pmeidl at cemm.oeaw.ac.at
Fri Nov 19 09:50:02 GMT 2010


On Thu, Nov 18 2010, Andrea Edwards <edwardsa at cs.man.ac.uk> wrote:

> Is there some information a gene record that says something like
> 
> 'the position of this gene is bases 100,000- 200,000 on chromosome 7
> but in the last database build it was actually at base 75,000 -
> 175,000 because it was mapped to a different genome build'

if you need this sort of information: there is a set of scripts which
generate a mapping between two assemblies and stores it in an Ensembl
database. you can then use the core API to project features from one
assembly to another.

look in ensembl/misc-scripts/assembly in the core API cvs checkout. the
README describes how to generate the mapping, whereas
EXAMPLE.use_mapping.pl describes how to project features between
assemblies once you have such a mapping.

the Ensembl core team used to generate these mappings for at least human
and mouse when a new assembly was released, so for these species you
could use things like $gene->project('<old_assembly_name>')
out-of-the-box. I don't know if this is still the case.

also note that I don't know if the scripts mentioned still work (I wrote
them several years ago and don't know if they are still maintained).

as an alternative, UCSC also has a program (called "liftover" or similar
IIRC) which such a projection of coordinates across assemblies.

HTH

    patrick

-- 
Patrick Meidl, Mag.
Bioinformatician

Ce-M-M-
Research Centre for Molecular Medicine
of the Austrian Academy of Science

Lazarettgasse 14 / AKH BT 25.3
Vienna, Austria

room 02.205
phone +43 1 40160 70016
email pmeidl at cemm.oeaw.ac.at
web http://www.cemm.at/





More information about the Dev mailing list