[ensembl-dev] Centromere regions
Monika Komorowska
monika at ebi.ac.uk
Tue Feb 14 16:10:18 GMT 2012
Hi Henrikki
Yes, there should always be two records with the same seq_region_id and stain = 'acen'
Regards
Monika
On 14 Feb 2012, at 14:57, Henrikki Almusa wrote:
> On 2012-02-14 13:59, Monika Komorowska wrote:
>> Hi Henrikki
>>
>> You can use the fetch_all_by_chr_name method in
>> Bio::EnsEMBL::DBSQL::KaryotypeBandAdaptor to get all KaryotypeBand
>> objects for a chromosome and iterate through the objects until you get 2
>> objects with stain = 'acent'. Their coordinates will give you the
>> location of a chromosome's centromere.
>>
>> More information on the above objects can be found in the Core API
>> documentation:
>>
>> http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1KaryotypeBand.html
>
> Great, this is exactly what I need. Thanks.
>
>> This is an example MySQL query to get the centromere for chromosome X
>>
>> select seq_region_start, seq_region_end from karyotype k inner join
>> seq_region sr on k.seq_region_id = sr.seq_region_id inner join
>> coord_system cs on sr.coord_system_id = cs.coord_system_id where cs.name
>> <http://cs.name/> = 'chromosome' and cs.version = 'GRCh37' and sr.name
>> <http://sr.name/> = 'X' and stain = 'acen' order by seq_region_start;
>>
>> +------------------+----------------+
>> | seq_region_start | seq_region_end |
>> +------------------+----------------+
>> | 58100001 | 60600000 |
>> | 60600001 | 63000000 |
>> +------------------+----------------+
>> 2 rows in set (0.00 sec)
>>
>> the seq_region_start in the first row is the start co-ordinate of the
>> centromere (58100001), the seq_region_end in the 2 row is the end
>> co-ordinate (63000000)
>
> Just to be sure. I can assume that for each chromosome will get two rows, right?
>
> Thanks,
>
>> Hope this helps
>>
>> Monika
>>
>> On 14 Feb 2012, at 11:27, Daniel Lawson wrote:
>>
>>> Dear Henrikki,
>>>
>>> Centromeres are amongst the hardest part of a genome to sequence and
>>> assemble as they tend to be highly repetitive. I do not have personal
>>> knowledge of the availability of centromeres in the vertebrate
>>> assemblies but my expectation is that they will be poorly represented.
>>> Someone from the genebuild team or helpdesk may be able to provide
>>> more information.
>>>
>>> regards
>>> Dan
>>>
>>> On 14 February 2012 10:18, Henrikki Almusa
>>> <henrikki.almusa at helsinki.fi <mailto:henrikki.almusa at helsinki.fi>> wrote:
>>>
>>> Hi all,
>>>
>>> I would like to retrieve centromere areas for ensembl genomes, but
>>> can't seem to find anything how they are marked in database. I
>>> will use perl api to retrieve them from local copy. How are these
>>> marked in the database?
>>>
>>> Regards,
>>> --
>>> Henrikki Almusa
>>>
>>> _________________________________________________
>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/__mailman/listinfo/dev
>>> <http://lists.ensembl.org/mailman/listinfo/dev>
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>>
>>> --
>>> Ensembl Genomes | VectorBase | i5K insect genome initiative
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>
>> Monika Komorowska
>> EnsEMBL Software Developer
>>
>> European Bioinformatics Institute (EMBL-EBI)
>> tel: +44(0) 1233 494 409
>>
>
>
> --
> Henrikki Almusa
Monika Komorowska
EnsEMBL Software Developer
European Bioinformatics Institute (EMBL-EBI)
tel: +44(0) 1233 494 409
More information about the Dev
mailing list