[ensembl-dev] question on protein domains
Andy Yates
ayates at ebi.ac.uk
Mon Jul 29 16:24:42 BST 2013
Hi Nathalie,
On 11 Jul 2013, at 12:48, Nathalie Conte <nconte at ebi.ac.uk> wrote:
>
> HI,
> I am trying to retrieve the domain ID corresponding to a slice (chromosome, start,end) using ensembl API .
> Here is how I get my domains, using the ensembl APIs- My starting point is a slice, where I get all the genes from, then the transcripts, then the translation and domains.
>
> my $mouse_query_slice = $mouse_slice_adaptor->fetch_by_region('chromosome',$non_ref_seq_region,
> $non_ref_start,$non_ref_end);
> my $all_genes=$gene_adaptor->fetch_all_by_Slice($mouse_query_slice);
>
> if (scalar(@$all_genes)) {
> print_genes($all_genes);
> foreach my $ovegen(@{$all_genes}){
> print "\t",
> my @transcripts = @{ $ovegen->get_all_Transcripts };
> my $transcript;
> foreach $transcript (@transcripts){
> my $translation = $transcript->translation();
> if ($translation) {
> my @domain_feats = @{$translation->get_all_DomainFeatures};
> my $dom;
> foreach $dom(@domain_feats){
> print 'transcript'.$transcript->stable_id.'-'.'domain ID'.$dom->hseqname.",";
> }
> print "\n";
> } else {
> print 'transcript'.$transcript->stable_id.'-'."Pseudogene\n";
> }
>
> }
> }
>
> and the output looks like this:
> Bio::EnsEMBL::Transcript=HASH(0x49ae4b0)transcriptENSMUST00000118364-domain ID PS50853,transcriptENSMUST00000118364-domain ID SSF49265,
> transcriptENSMUST00000118364-dommain ID SSF49265,transcriptENSMUST00000118364-domain ID PF09240,
>
>
> From this I have 2 questions:
> 1-First in the output, I get the $dom->hseqname() , it will display the id- PS50853, the is the domain ID from the PROSITE database. I was wondering if there is a method to display the description of this domain like Fibronectin type-III domain profile?
We can offer the Interpro display name which is available from $dom->idesc(). In your example it would have returned "Fibronectin_type3". Should you want something better you can retrieve the DBEntry for this Interpro entry and ask for the description:
my $desc;
if($dom->interpro_ac()) {
my $dbentry_adaptor = $registry->get_adaptor('mouse', 'core', 'dbentry');
my $dbentry = $dbentry_adaptor->fetch_by_db_accession('Interpro', $dom->interpro_ac());
$desc = $dbentry->description();
}
This would return Fibronectin, type III.
> 2-Secondely, my starting point is a slice where I want my domain to be from. The way I am accessing the domain is through all genes , where I get all the genes from, then the transcripts, then the translation and domains.
> The problem with this is that I am going to get all the domains corresponding to the protein features not only the ones corresponding to this particular slice (could be 1 bp in size).
>
> Could you suggest something?
Firstly if you do not want any Gene information then do not bother querying for them; go straight to the TranscriptAdaptor. I would still use your loops but add a test to ensure the protein domain you have overlaps the bounds of the original Slice. You will need to translate your genomic coordinates to proteins like so:
my $peptide_locations = $transcript->genomic2pep($non_ref_start, $non_ref_end, $requested_strand);
This returns an array of Coordinate & Gap entries. You will get multiple entries in this array because of exon boundaries. Grep the array for those objects which implement Bio::EnsEMBL::Mapper::Coordinate like so:
my $mappings = grep { $_->isa('Bio::EnsEMBL::Mapper::Coordinate') } @{$peptide_locations};
Then for each domain make sure its bounds are overlapped by one element in this array. If it is then it is a domain you are interested in
Hope this helps,
Andy
Andrew Yates Ensembl Core Software Project Leader
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensembl.org/
> Many thanks
> Nathalie
>
>
>
>
> --
> Nathalie Conte, PhD
> Bioinformatician BMB (WP3, WP7)
> Functional Genomics group
> EMBL-EBI,UK
> 01223 492562
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list