[ensembl-dev] using the GRCh37 coordinate system inside current ensembl human core

Vivek Iyer vvi at sanger.ac.uk
Fri Apr 17 11:56:50 BST 2015


Hi all,

I have a set of genomic positions in GRCh37 (corresponding to virus insertion sites, which were found by someone else, mapping genomic sequence to the GRCh37 assembly). I am simply annotating these insertions sites with human ensembl genes - using some suitable flanking distance.

I think I have two choices, which seem slightly different:

(1) connect with my API to  ensembl release 75 - the last GRCh37 release - and do everything inside that universe. Load slices given by my reference positions, read off the genes, bob’s your uncle.

(2) connect with the API to the current ensembl release (79, GRCh38). 
Load slices in coordinate_system = chromosome, version = ‘GRCh37’ around my reference positions. 
Then read off genes on those slices as I need.

So I think the difference is that (2) will get me annotations actually done on GRCh38 ‘pulled back’ into GRCh37, whereas (1) will get me annotations actually done on GRCh37 from end-to-end. Is this correct? I’d prefer to use (2) unless someone tells me that’s a baaad idea.

Thanks,

Vivek



More information about the Dev mailing list