[ensembl-dev] Centromere regions

Monika Komorowska monika at ebi.ac.uk
Tue Feb 14 16:10:18 GMT 2012


Hi Henrikki

Yes, there should always be two records with the same seq_region_id and stain = 'acen'
Regards
Monika

On 14 Feb 2012, at 14:57, Henrikki Almusa wrote:

> On 2012-02-14 13:59, Monika Komorowska wrote:
>> Hi Henrikki
>> 
>> You can use the fetch_all_by_chr_name method in
>> Bio::EnsEMBL::DBSQL::KaryotypeBandAdaptor to get all KaryotypeBand
>> objects for a chromosome and iterate through the objects until you get 2
>> objects with stain = 'acent'. Their coordinates will give you the
>> location of a chromosome's centromere.
>> 
>> More information on the above objects can be found in the Core API
>> documentation:
>> 
>> http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1KaryotypeBand.html
> 
> Great, this is exactly what I need. Thanks.
> 
>> This is an example MySQL query to get the centromere for chromosome X
>> 
>> select seq_region_start, seq_region_end from karyotype k inner join
>> seq_region sr on k.seq_region_id = sr.seq_region_id inner join
>> coord_system cs on sr.coord_system_id = cs.coord_system_id where cs.name
>> <http://cs.name/> = 'chromosome' and cs.version = 'GRCh37' and sr.name
>> <http://sr.name/> = 'X' and stain = 'acen' order by seq_region_start;
>> 
>> +------------------+----------------+
>> | seq_region_start | seq_region_end |
>> +------------------+----------------+
>> | 58100001 | 60600000 |
>> | 60600001 | 63000000 |
>> +------------------+----------------+
>> 2 rows in set (0.00 sec)
>> 
>> the seq_region_start in the first row is the start co-ordinate of the
>> centromere (58100001), the seq_region_end in the 2 row is the end
>> co-ordinate (63000000)
> 
> Just to be sure. I can assume that for each chromosome will get two rows, right?
> 
> Thanks,
> 
>> Hope this helps
>> 
>> Monika
>> 
>> On 14 Feb 2012, at 11:27, Daniel Lawson wrote:
>> 
>>> Dear Henrikki,
>>> 
>>> Centromeres are amongst the hardest part of a genome to sequence and
>>> assemble as they tend to be highly repetitive. I do not have personal
>>> knowledge of the availability of centromeres in the vertebrate
>>> assemblies but my expectation is that they will be poorly represented.
>>> Someone from the genebuild team or helpdesk may be able to provide
>>> more information.
>>> 
>>> regards
>>> Dan
>>> 
>>> On 14 February 2012 10:18, Henrikki Almusa
>>> <henrikki.almusa at helsinki.fi <mailto:henrikki.almusa at helsinki.fi>> wrote:
>>> 
>>>    Hi all,
>>> 
>>>    I would like to retrieve centromere areas for ensembl genomes, but
>>>    can't seem to find anything how they are marked in database. I
>>>    will use perl api to retrieve them from local copy. How are these
>>>    marked in the database?
>>> 
>>>    Regards,
>>>    --
>>>    Henrikki Almusa
>>> 
>>>    _________________________________________________
>>>    Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>    List admin (including subscribe/unsubscribe):
>>>    http://lists.ensembl.org/__mailman/listinfo/dev
>>>    <http://lists.ensembl.org/mailman/listinfo/dev>
>>>    Ensembl Blog: http://www.ensembl.info/
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Ensembl Genomes | VectorBase | i5K insect genome initiative
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> Monika Komorowska
>> EnsEMBL Software Developer
>> 
>> European Bioinformatics Institute (EMBL-EBI)
>> tel: +44(0) 1233 494 409
>> 
> 
> 
> -- 
> Henrikki Almusa

Monika Komorowska
EnsEMBL Software Developer

European Bioinformatics Institute (EMBL-EBI)
tel: +44(0) 1233 494 409





More information about the Dev mailing list