[ensembl-dev] assembly table in core database

Ewan Birney birney at ebi.ac.uk
Mon Mar 14 12:10:35 GMT 2011


Just an extra piece of history - in the Human Genome sequencing days,
the "standard" overlap between BACs was 100bp, hence all these 101 on
the switch points (there were reasonable complex tricks employed to
optimise how one minimised the shotgun reads on overlapping BACs whilst
still getting this overlap).


Switch points are the points where the reference genome switches between
the underlying clones (the so called golden path of bases from clones,  
came
originally from Phil Green's Phrap program, and the idea was imported  
into
the human assemblies). This was the old way of doing BAC-by-BAC  
assemblies.


This is also the origin of the phrase "AGP" which is

"Accessioned Golden Path".




On 14 Mar 2011, at 12:06, Bronwen Aken wrote:

> Hi Andrea,
>
> The assembly data for human and all other species are generated by  
> groups external to Ensembl. We import the data as it is provided to  
> us.
>
> In the case of human, the assembly is maintained by the Genome  
> Reference Consortium:
> http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/index.shtml
>
> If you go to their FTP site, you can find that the synonym for  
> HSCHR13_CTG1 is GL000111.1.
> ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37/Primary_Assembly/localID2acc
>
> Now, to find how this scaffold is assembled from the contigs, go to  
> the relevant AGP file and search for AL445212.9 or GL000111.1:
> ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37/Primary_Assembly/placed_scaffolds/AGP/chr13.placed.scaf.agp.gz
>
> You will find the following lines :
> GL000111.1	13670584	13786049	122	F	AL138692.26	101	115566	+
>
> GL000111.1	13786050	13952606	123	F	AL445212.9	101	166657	+
>
> Many contigs start from 101 in the AGP file. In your example, it is  
> just going to mean that contig AL445212.9 overlaps with the previous  
> contig AL138692.26 by 100 bases, and that they have chosen to use  
> the sequence from AL138692.26 instead of AL445212.9 to contribute  
> toward the scaffold HSCHR13_CTG1 sequence.
>
> Cheers,
> Bronwen
>
>
> On 11 Mar 2011, at 19:36, Andrea Edwards wrote:
>
>> Hello
>>
>> Please could you tell me why, for example, the clone AL445212.9  
>> (seq region id = 22114) only has overlap from base 101 with  
>> chromosome 13 (seq region id = 27513) in the assembly table rather  
>> than from its first base. There is nothing in this table either  
>> about its overlap with its neighbouring clones or its supercontig  
>> HSCGR13_CTG1. If you look at the clone the annotated region of the  
>> clone is from base 101 onwards. I don't understand the significance  
>> of base 101
>>
>> thanks
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev





More information about the Dev mailing list