[ensembl-dev] Adding an Alternate Assembly for a Species

Bronwen Aken ba1 at sanger.ac.uk
Wed Sep 25 16:39:55 BST 2013


Hi Ben,

It is possible to load more than one assembly into one Ensembl database and we have a few examples, including:
 human (homo_sapiens_core_73_37) - four assemblies 
 mouse (mus_musculus_core_73_38) - three assemblies
 dog (canis_familiaris_core_73_31) - two assemblies
 pig (sus_scrofa_core_73_102) - two assemblies
 rat (rattus_norvegicus_core_73_5) - two assemblies

Are you looking to view two assemblies side-by-side as in our Region Comparison view? Here is pig vs human:
  http://www.ensembl.org/Sus_scrofa/Share/3c8f1202250321838e0bc96d50392e89111929907

and here is a human assembly patch (HG79_PATCH) vs human primary assembly (chromosome 9):
http://www.ensembl.org/Homo_sapiens/Share/ad39ce274b81b9a9247ee1095ec946ad111929907


Below is some additional information on multiple assemblies in one database.

Let's take pig as an example. If you look here:
  mysql -uanonymous -hensembldb.ensembl.org -P3306 -Dsus_scrofa_core_73_102 -e "select * from coord_system"
you will see that we have two different coordinate system versions: Sscrofa9 and Sscrofa10.2. Each of these coordinate system versions represents a different assembly.


The coord_system table links to the seq_region table, which gives a list of all the sequences and components in each assembly. 
  mysql -uanonymous -hensembldb.ensembl.org -P3306 -Dsus_scrofa_core_73_102 -e "select count(*),cs.name,cs.version from seq_region sr, coord_system cs where sr.coord_system_id=cs.coord_system_id group by cs.version,cs.name"
+----------+------------+-------------+
| count(*) | name       | version     |
+----------+------------+-------------+
|   186661 | contig     | NULL        |
|       21 | chromosome | Sscrofa10.2 |
|     9905 | scaffold   | Sscrofa10.2 |
|       20 | chromosome | Sscrofa9    |
+----------+------------+-------------+
(We don't usually give the contigs a coord_system.version because they are often shared between multiple assemblies.)


It sounds like your assemblies have only one coord_system, scaffolds, whereas pig's Sscrofa9 has contigs and chromosomes and pig's Sscrofa10.2  assembly has contigs, scaffolds and chromosomes. The links between contigs, scaffolds and chromosomes are stored in the assembly table.


We also store which parts of the assemblies 'map' to one another. Within one assembly, the mapping between its contigs, scaffolds and chromosomes are provided in an "AGP" file given with the assembly. The chromosome-to-chromosome mapping (top line in the query below) was an additional step added by our Core team to link the two assemblies. We (Ensembl Core team) will be able to give you more information on how to do that.
  mysql -uanonymous -hensembldb.ensembl.org -P3306 -Dsus_scrofa_core_73_102 -e "select meta_value from meta where meta_key = 'assembly.mapping'"
+---------------------------------------------+
| meta_value                                  |
+---------------------------------------------+
| chromosome:Sscrofa10.2#chromosome:Sscrofa9  |
| chromosome:Sscrofa10.2#contig               |
| chromosome:Sscrofa10.2|scaffold:Sscrofa10.2 |
| scaffold:Sscrofa10.2#contig                 |
+---------------------------------------------+

In order to load your two assemblies into one database, you'd need to set them with two different coord_system.version values. (Alternately, you could just load one assembly each into two databases.)

If the next step you're after is to view these two assemblies side-by-side, as in our Region Comparison view, additional information is required. Our Compara team usually align the two assemblies (LASTZ) and store the information in our 'compara' schema database. This information is used by the web code to generate the Region Comparison view. If you already know the scaffold-to-scaffold mapping for your two assemblies then I don't think you'd need to do the LASTZ alignment but I do think you'd need to still store the links in a compara-schema database.  We (Ensembl Compara team) can help with that too.

Hope that helps as a start,
Bronwen (Genebuild team)



On 9 Sep 2013, at 02:18, Ben Warren <Ben.Warren at plantandfood.co.nz> wrote:

> Hi,
>  
> I am running a local Ensembl instance using species data stored in a local database. Is it possible to add an additional ‘alternate’ assembly to an Ensembl species?
>  
> i.e.
>  
> 1.       I have created an assembly of scaffolds
> 2.       I loaded that assembly into Ensembl as a new species
> 3.       I have revised the assembly, altering the scaffolds
> 4.       I would like to view the alternate assembly ‘against’ the original assembly
>  
> Is this possible? If so is there an example of this situation in an existing species on ensembl.org?
>  
>  
>  
> Thanks
>  
> Ben Warren
> Research Technologist
>  
> 
>  
> F: +64 9 925 7001
> ben.warren at plantandfood.co.nz
> www.plantandfood.co.nz
> The New Zealand Institute for Plant & Food Research Limited
>  
> Postal Address: Plant & Food Research Mt Albert
> Private Bag 92169, Auckland, 1142, New Zealand
> Physical Address: Plant & Food Research Mt Albert
> 120 Mt Albert Road, Sandringham, Auckland, 1025, New Zealand
>  
> The contents of this e-mail are confidential and may be subject to legal privilege.
>  If you are not the intended recipient you must not use, disseminate, distribute or
>  reproduce all or any part of this e-mail or attachments.  If you have received this
>  e-mail in error, please notify the sender and delete all material pertaining to this
>  e-mail.  Any opinion or views expressed in this e-mail are those of the individual
>  sender and may not represent those of The New Zealand Institute for Plant and
>  Food Research Limited.
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130925/a4a9acc2/attachment.html>


More information about the Dev mailing list