[ensembl-dev] assembly table in core database
Ewan Birney
birney at ebi.ac.uk
Mon Mar 14 12:10:35 GMT 2011
Just an extra piece of history - in the Human Genome sequencing days,
the "standard" overlap between BACs was 100bp, hence all these 101 on
the switch points (there were reasonable complex tricks employed to
optimise how one minimised the shotgun reads on overlapping BACs whilst
still getting this overlap).
Switch points are the points where the reference genome switches between
the underlying clones (the so called golden path of bases from clones,
came
originally from Phil Green's Phrap program, and the idea was imported
into
the human assemblies). This was the old way of doing BAC-by-BAC
assemblies.
This is also the origin of the phrase "AGP" which is
"Accessioned Golden Path".
On 14 Mar 2011, at 12:06, Bronwen Aken wrote:
> Hi Andrea,
>
> The assembly data for human and all other species are generated by
> groups external to Ensembl. We import the data as it is provided to
> us.
>
> In the case of human, the assembly is maintained by the Genome
> Reference Consortium:
> http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/index.shtml
>
> If you go to their FTP site, you can find that the synonym for
> HSCHR13_CTG1 is GL000111.1.
> ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37/Primary_Assembly/localID2acc
>
> Now, to find how this scaffold is assembled from the contigs, go to
> the relevant AGP file and search for AL445212.9 or GL000111.1:
> ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37/Primary_Assembly/placed_scaffolds/AGP/chr13.placed.scaf.agp.gz
>
> You will find the following lines :
> GL000111.1 13670584 13786049 122 F AL138692.26 101 115566 +
>
> GL000111.1 13786050 13952606 123 F AL445212.9 101 166657 +
>
> Many contigs start from 101 in the AGP file. In your example, it is
> just going to mean that contig AL445212.9 overlaps with the previous
> contig AL138692.26 by 100 bases, and that they have chosen to use
> the sequence from AL138692.26 instead of AL445212.9 to contribute
> toward the scaffold HSCHR13_CTG1 sequence.
>
> Cheers,
> Bronwen
>
>
> On 11 Mar 2011, at 19:36, Andrea Edwards wrote:
>
>> Hello
>>
>> Please could you tell me why, for example, the clone AL445212.9
>> (seq region id = 22114) only has overlap from base 101 with
>> chromosome 13 (seq region id = 27513) in the assembly table rather
>> than from its first base. There is nothing in this table either
>> about its overlap with its neighbouring clones or its supercontig
>> HSCGR13_CTG1. If you look at the clone the annotated region of the
>> clone is from base 101 onwards. I don't understand the significance
>> of base 101
>>
>> thanks
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
More information about the Dev
mailing list