[ensembl-dev] transform swissprot protein coordinates into genome coordinates
Andy Yates
ayates at ebi.ac.uk
Mon Sep 3 14:42:39 BST 2012
Hi Stephane,
I don't have any easy to access example code but I'm happy to give you a few pointers at where to look & the methods you should be using. You will have to:
1). Convert the UniProtKB accession into an Ensembl transcript ID (even though it's a protein coordinate these mappings are coordinated by Transcript)
my $adaptor = Bio::EnsEMBL::Registry->get_adaptor('mouse', 'core', 'transcript');
my $transcript = $adaptor->fetch_all_by_external_name('Q9JKB1', 'UniProt%');
#This, at the time of writing, will return ENSMUST00000002289 (translation ID ENSMUSP00000002289)
2). Convert the coordinates to genomic
my @locs = $transcript->pep2genomic(5, 215);
The example protein I've used spans 9 exons so you will get 9 coordinates representing where we cross over between exon boundaries.
Hope this helps,
Andy
Andrew Yates Ensembl Core Software Project Leader
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensembl.org/
On 31 Aug 2012, at 15:39, Stéphane Plaisance wrote:
> Dear All,
>
> I have downloaded the PFAM full list for mm9 and would like to obtain the genomic coordinates for each record (from cols 2+3 below )
>
> Is there someone with recyclable code for doing so?
>
> My input is in Swissprot format (relative to ATG in the relative Acc as shown below
>> #Pfam-A regions from Pfam version 26.0 for ncbi taxid 10090 'Mus musculus (strain C57BL/6)'
>> #Total number of proteins in proteome: 47211
>> #<seq id> <alignment start> <alignment end> <envelope start> <envelope end> <hmm acc> <hmm name> <type> <hmm start> <hmm end> <hmm length> <bit score> <E-value> <clan>
>> Q9JKB1 5 215 5 216 PF01088 Peptidase_C12 Domain 1 213 214 271.10 4.6e-78 CL0125
>> E9Q751 46 129 46 129 PF04822 Takusan Family 1 84 84 96.90 4.5e-25 No_clan
>> D3YVR1 2 161 2 161 PF00743 FMO-like Family 1 160 532 325.60 4.1e-94 CL0063
>> B1AQR8 223 351 223 352 PF00337 Gal-bind_lectin Domain 1 132 133 134.30 1.6e-36 CL0004
>
> this looks feasible and probably needs some external ref and slice magic, I have not used this for several years now and am a bit rusted
>
> Thanks for any piece of code to start with
>
> Stephane
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list