[ensembl-dev] transform swissprot protein coordinates into genome coordinates

Andy Yates ayates at ebi.ac.uk
Mon Sep 3 14:42:39 BST 2012


Hi Stephane,

I don't have any easy to access example code but I'm happy to give you a few pointers at where to look & the methods you should be using. You will have to:

1). Convert the UniProtKB accession into an Ensembl transcript ID (even though it's a protein coordinate these mappings are coordinated by Transcript)

my $adaptor = Bio::EnsEMBL::Registry->get_adaptor('mouse', 'core', 'transcript');
my $transcript = $adaptor->fetch_all_by_external_name('Q9JKB1', 'UniProt%');

#This, at the time of writing, will return ENSMUST00000002289 (translation ID ENSMUSP00000002289)

2). Convert the coordinates to genomic

my @locs = $transcript->pep2genomic(5, 215);

The example protein I've used spans 9 exons so you will get 9 coordinates representing where we cross over between exon boundaries.

Hope this helps,

Andy

Andrew Yates                   Ensembl Core Software Project Leader
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensembl.org/

On 31 Aug 2012, at 15:39, Stéphane Plaisance wrote:

> Dear All,
> 
> I have downloaded the PFAM full list for mm9 and would like to obtain the genomic coordinates for each record (from cols 2+3 below )
> 
> Is there someone with recyclable code for doing so?
> 
> My input is in Swissprot format (relative to ATG in the relative Acc as shown below
>> #Pfam-A regions from Pfam version 26.0 for ncbi taxid 10090 'Mus musculus (strain C57BL/6)'						
>> #Total number of proteins in proteome: 47211										
>> #<seq id> <alignment start> <alignment end> <envelope start> <envelope end> <hmm acc> <hmm name> <type> <hmm start> <hmm end> <hmm length> <bit score> <E-value> <clan>
>> Q9JKB1	5	215	5	216	PF01088	Peptidase_C12	Domain	1	213	214	271.10	4.6e-78	CL0125
>> E9Q751	46	129	46	129	PF04822	Takusan	Family	1	84	84	96.90	4.5e-25	No_clan
>> D3YVR1	2	161	2	161	PF00743	FMO-like	Family	1	160	532	325.60	4.1e-94	CL0063
>> B1AQR8	223	351	223	352	PF00337	Gal-bind_lectin	Domain	1	132	133	134.30	1.6e-36	CL0004
> 
> this looks feasible and probably needs some external ref and slice magic, I have not used this for several years now and am a bit rusted
> 
> Thanks for any piece of code to start with
> 
> Stephane
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list