[ensembl-dev] assembly table in core database

Bronwen Aken ba1 at sanger.ac.uk
Mon Mar 14 12:06:35 GMT 2011


Hi Andrea,

The assembly data for human and all other species are generated by groups external to Ensembl. We import the data as it is provided to us.

In the case of human, the assembly is maintained by the Genome Reference Consortium:
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/index.shtml

If you go to their FTP site, you can find that the synonym for HSCHR13_CTG1 is GL000111.1.
ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37/Primary_Assembly/localID2acc

Now, to find how this scaffold is assembled from the contigs, go to the relevant AGP file and search for AL445212.9 or GL000111.1:
ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37/Primary_Assembly/placed_scaffolds/AGP/chr13.placed.scaf.agp.gz

You will find the following lines :
GL000111.1	13670584	13786049	122	F	AL138692.26	101	115566	+

GL000111.1	13786050	13952606	123	F	AL445212.9	101	166657	+

Many contigs start from 101 in the AGP file. In your example, it is just going to mean that contig AL445212.9 overlaps with the previous contig AL138692.26 by 100 bases, and that they have chosen to use the sequence from AL138692.26 instead of AL445212.9 to contribute toward the scaffold HSCHR13_CTG1 sequence.

Cheers,
Bronwen


On 11 Mar 2011, at 19:36, Andrea Edwards wrote:

> Hello
> 
> Please could you tell me why, for example, the clone AL445212.9 (seq region id = 22114) only has overlap from base 101 with chromosome 13 (seq region id = 27513) in the assembly table rather than from its first base. There is nothing in this table either about its overlap with its neighbouring clones or its supercontig HSCGR13_CTG1. If you look at the clone the annotated region of the clone is from base 101 onwards. I don't understand the significance of base 101
> 
> thanks
> 
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110314/89365cf8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2058 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110314/89365cf8/attachment.p7s>


More information about the Dev mailing list