[ensembl-dev] Centromere regions

Wed Feb 15 08:18:15 GMT 2012

On 2012-02-14 18:10, Monika Komorowska wrote:
> Hi Henrikki
>
> Yes, there should always be two records with the same seq_region_id and stain = 'acen'

Great, thanks.

> Regards
> Monika
>
> On 14 Feb 2012, at 14:57, Henrikki Almusa wrote:
>
>> On 2012-02-14 13:59, Monika Komorowska wrote:
>>> Hi Henrikki
>>>
>>> You can use the fetch_all_by_chr_name method in
>>> Bio::EnsEMBL::DBSQL::KaryotypeBandAdaptor to get all KaryotypeBand
>>> objects for a chromosome and iterate through the objects until you get 2
>>> objects with stain = 'acent'. Their coordinates will give you the
>>> location of a chromosome's centromere.
>>>
>>> More information on the above objects can be found in the Core API
>>> documentation:
>>>
>>> http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1KaryotypeBand.html
>>
>> Great, this is exactly what I need. Thanks.
>>
>>> This is an example MySQL query to get the centromere for chromosome X
>>>
>>> select seq_region_start, seq_region_end from karyotype k inner join
>>> seq_region sr on k.seq_region_id = sr.seq_region_id inner join
>>> coord_system cs on sr.coord_system_id = cs.coord_system_id where cs.name
>>> <http://cs.name/>  = 'chromosome' and cs.version = 'GRCh37' and sr.name
>>> <http://sr.name/>  = 'X' and stain = 'acen' order by seq_region_start;
>>>
>>> +------------------+----------------+
>>> | seq_region_start | seq_region_end |
>>> +------------------+----------------+
>>> | 58100001 | 60600000 |
>>> | 60600001 | 63000000 |
>>> +------------------+----------------+
>>> 2 rows in set (0.00 sec)
>>>
>>> the seq_region_start in the first row is the start co-ordinate of the
>>> centromere (58100001), the seq_region_end in the 2 row is the end
>>> co-ordinate (63000000)
>>
>> Just to be sure. I can assume that for each chromosome will get two rows, right?
>>
>> Thanks,
>>
>>> Hope this helps
>>>
>>> Monika
>>>
>>> On 14 Feb 2012, at 11:27, Daniel Lawson wrote:
>>>
>>>> Dear Henrikki,
>>>>
>>>> Centromeres are amongst the hardest part of a genome to sequence and
>>>> assemble as they tend to be highly repetitive. I do not have personal
>>>> knowledge of the availability of centromeres in the vertebrate
>>>> assemblies but my expectation is that they will be poorly represented.
>>>> Someone from the genebuild team or helpdesk may be able to provide
>>>> more information.
>>>>
>>>> regards
>>>> Dan
>>>>
>>>> On 14 February 2012 10:18, Henrikki Almusa
>>>> <henrikki.almusa at helsinki.fi<mailto:henrikki.almusa at helsinki.fi>>  wrote:
>>>>
>>>>     Hi all,
>>>>
>>>>     I would like to retrieve centromere areas for ensembl genomes, but
>>>>     can't seem to find anything how they are marked in database. I
>>>>     will use perl api to retrieve them from local copy. How are these
>>>>     marked in the database?
>>>>
>>>>     Regards,
>>>>     --
>>>>     Henrikki Almusa
>>>>
>>>>     _________________________________________________
>>>>     Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org>
>>>>     List admin (including subscribe/unsubscribe):
>>>>     http://lists.ensembl.org/__mailman/listinfo/dev
>>>>     <http://lists.ensembl.org/mailman/listinfo/dev>
>>>>     Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ensembl Genomes | VectorBase | i5K insect genome initiative
>>>> _______________________________________________
>>>> Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org>
>>>> List admin (including subscribe/unsubscribe):
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>> Monika Komorowska
>>> EnsEMBL Software Developer
>>>
>>> European Bioinformatics Institute (EMBL-EBI)
>>> tel: +44(0) 1233 494 409
>>>
>>
>>
>> --
>> Henrikki Almusa
>
> Monika Komorowska
> EnsEMBL Software Developer
>
> European Bioinformatics Institute (EMBL-EBI)
> tel: +44(0) 1233 494 409
>

-- 
Henrikki Almusa