[ensembl-dev] question on protein domains

Andy Yates ayates at ebi.ac.uk
Mon Jul 29 16:24:42 BST 2013


Hi Nathalie,

On 11 Jul 2013, at 12:48, Nathalie Conte <nconte at ebi.ac.uk> wrote:

> 
> HI,
> I am trying to retrieve the domain ID corresponding to a slice (chromosome, start,end) using ensembl API .
> Here is how I get my domains, using the ensembl APIs- My starting point is a slice, where I get all the genes from, then the transcripts, then the translation and domains.
> 
> my $mouse_query_slice = $mouse_slice_adaptor->fetch_by_region('chromosome',$non_ref_seq_region,
>    $non_ref_start,$non_ref_end);
> my $all_genes=$gene_adaptor->fetch_all_by_Slice($mouse_query_slice);
> 
> if (scalar(@$all_genes)) {
> print_genes($all_genes);
> foreach my $ovegen(@{$all_genes}){
> print "\t",
> my @transcripts = @{ $ovegen->get_all_Transcripts };
> my $transcript;
> foreach  $transcript (@transcripts){
> my $translation = $transcript->translation();
> if ($translation) {
> my @domain_feats = @{$translation->get_all_DomainFeatures};
> my $dom;
> foreach $dom(@domain_feats){
>    print 'transcript'.$transcript->stable_id.'-'.'domain ID'.$dom->hseqname.",";
>    }
> print  "\n";
>               } else {
>                 print 'transcript'.$transcript->stable_id.'-'."Pseudogene\n";
>               }
> 
>                                        }
> }
> 
> and the output looks like this:
> Bio::EnsEMBL::Transcript=HASH(0x49ae4b0)transcriptENSMUST00000118364-domain ID PS50853,transcriptENSMUST00000118364-domain ID SSF49265,
> transcriptENSMUST00000118364-dommain ID SSF49265,transcriptENSMUST00000118364-domain ID PF09240,
> 
> 
> From this I have 2 questions:
> 1-First in the output, I get the $dom->hseqname() , it will display the id- PS50853, the is the domain ID from the PROSITE database.  I was wondering if there is a method to display the description of this domain like Fibronectin type-III domain profile?

We can offer the Interpro display name which is available from $dom->idesc(). In your example it would have returned "Fibronectin_type3". Should you want something better you can retrieve the DBEntry for this Interpro entry and ask for the description:

my $desc;
if($dom->interpro_ac()) {
  my $dbentry_adaptor = $registry->get_adaptor('mouse', 'core', 'dbentry');
  my $dbentry = $dbentry_adaptor->fetch_by_db_accession('Interpro', $dom->interpro_ac());
  $desc = $dbentry->description();
}

This would return Fibronectin, type III.

> 2-Secondely, my starting point is a slice where I want my domain to be from. The way I am accessing the domain is through all genes  , where I get all the genes from, then the transcripts, then the translation and domains.
> The problem with this is that I am going to get all the domains corresponding to the protein features not only the ones corresponding to this particular slice (could be 1 bp in size).
> 
> Could you suggest something?

Firstly if you do not want any Gene information then do not bother querying for them; go straight to the TranscriptAdaptor. I would still use your loops but add a test to ensure the protein domain you have overlaps the bounds of the original Slice. You will need to translate your genomic coordinates to proteins like so:

my $peptide_locations = $transcript->genomic2pep($non_ref_start, $non_ref_end, $requested_strand);

This returns an array of Coordinate & Gap entries. You will get multiple entries in this array because of exon boundaries. Grep the array for those objects which implement Bio::EnsEMBL::Mapper::Coordinate like so:

my $mappings = grep { $_->isa('Bio::EnsEMBL::Mapper::Coordinate') } @{$peptide_locations};

Then for each domain make sure its bounds are overlapped by one element in this array. If it is then it is a domain you are interested in

Hope this helps,

Andy

Andrew Yates                   Ensembl Core Software Project Leader
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensembl.org/


> Many thanks
> Nathalie
> 
> 
> 
> 
> -- 
> Nathalie Conte, PhD
> Bioinformatician BMB (WP3, WP7)
> Functional Genomics group
> EMBL-EBI,UK
> 01223 492562
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list