[ensembl-dev] genomic2pep and pep2genomic

Wed Mar 9 11:59:32 GMT 2011

The Ensembl API attempts to predict the amino-acid where codons are incomplete. In this case the first 2 bases TC will always encode for Serine for all four of the potential third position nucleotides. 

regards
Dan

-- 
Daniel Lawson
VectorBase | Ensembl Genomes
On Wednesday, 9 March 2011 at 10:24, jana wrote: 
> Hi Hardip,
> 
> > However, normally a peptide should represent three nucleotides, but I am getting a two nucleotide codon (TC) for this position 
> The evidence for that transcript is incomplete and the CDS ends with only two nucleotides TC in the last annotated codon.
> You can check the CDS in the browser:
> http://www.ensembl.org/Homo_sapiens/Transcript/Sequence_cDNA?db=core;g=ENSG00000177000;r=1:11863076-11863097;t=ENST00000413656
> 
> Cheers,
> Jana (Ensembl helpdesk)
> 
> 
> 
> On 9 Mar 2011, at 00:50, Hardip Patel wrote:
> > Hi All
> > 
> >  I am trying to identify the codon triplet in which a single nucleotide genomic position is involved in.
> > 
> >  Rephrased more elaborately, 
> > 
> >  I have genomic loci chr1:11863085:11863085:-1 (chr start end and strand).
> >  This loci is overlapping with transcript ID: ENST00000413656.
> >  The loci is in the protein-coding (CDS) region of the transcript/cDNA.
> > 
> >  Therefore I first call transcript object by stable id using the transcript adaptor.
> >  Then I identify the peptide co-ordinates for the genomic loci by calling transcriptmapper.
> >  Once I get the peptide coordinate, I get the genomic coordinates for that peptide.
> > 
> >  Below is the script to do so.
> > 
> >  my $transcript_adaptor = $reg->get_adaptor("Human", "Core", "Transcript");
> >  my $slice_adaptor = $reg->get_adaptor("Human", "Core", "Slice");
> >  my $transcriptObject = $transcript_adaptor->fetch_by_stable_id("ENST00000413656");
> >  my $transcriptMapper = $transcriptObject->get_TranscriptMapper();
> >  my @pepCoords = $transcriptMapper->genomic2pep(11863085, 11863085, -1);
> >  foreach my $pepinfo (@pepCoords){
> > my @pep2genomic = $transcriptMapper->pep2genomic($pepinfo->start, $pepinfo->end);
> > print "Peptide = ". $pepinfo->start."\t". $pepinfo->end."\t". $pepinfo->id."\t". $pepinfo->length."\t". $pepinfo->strand."\n";
> > foreach my $gc (@pep2genomic){
> > my $codonSlice = $slice_adaptor->fetch_by_region("toplevel", 1 , $gc->start, $gc->end, $gc->strand);
> > print "Genomic = ".$gc->start."\t".$gc->end."\t".$gc->id."\t".$gc->length."\t".$gc->strand."\t".$codonSlice->seq."\n";
> > }
> >  }
> > 
> >  However, normally a peptide should represent three nucleotides, but I am getting a two nucleotide codon (TC) for this position and my script dies with following error message
> > 
> >  Can't locate object method "id" via package "Bio::EnsEMBL::Mapper::Gap".
> > 
> >  Could somebody please explain me what these gaps are? And also, how to overcome this issue of two/one nucleotide codons.
> > 
> >  Any help is greatly appreciated
> > 
> >  Thanking you
> > 
> >  Hardip
> >  _______________________________________________
> > Dev mailing list
> > Dev at ensembl.org
> > http://lists.ensembl.org/mailman/listinfo/dev
> 
> Jana Vandrovcova, Ph.D.
> User Support Officer
> 
> EMBL - European Bioinformatics Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridge CB10 1SD
> United Kingdom
> 
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110309/aa796423/attachment.html>