[ensembl-dev] genomic2pep and pep2genomic

Wed Mar 9 10:24:41 GMT 2011

Hi Hardip,

> However, normally a peptide should represent three nucleotides, but I am getting a two nucleotide codon (TC) for this position 

The evidence for that transcript is incomplete and the CDS ends with only two nucleotides TC  in the last annotated codon.
You can check the CDS in the browser:
http://www.ensembl.org/Homo_sapiens/Transcript/Sequence_cDNA?db=core;g=ENSG00000177000;r=1:11863076-11863097;t=ENST00000413656

Cheers,
Jana (Ensembl helpdesk)

On 9 Mar 2011, at 00:50, Hardip Patel wrote:

> Hi All
> 
> I am trying to identify the codon triplet in which a single nucleotide genomic position is involved in.
> 
> Rephrased more elaborately, 
> 
> I have genomic loci chr1:11863085:11863085:-1 (chr start end and strand).
> This loci is overlapping with transcript ID: ENST00000413656.
> The loci is in the protein-coding (CDS) region of the transcript/cDNA.
> 
> Therefore I first call transcript object by stable id using the transcript adaptor.
> Then I identify the peptide co-ordinates for the genomic loci by calling transcriptmapper.
> Once I get the peptide coordinate, I get the genomic coordinates for that peptide.
> 
> Below is the script to do so.
> 
> my $transcript_adaptor = $reg->get_adaptor("Human", "Core", "Transcript");
> my $slice_adaptor = $reg->get_adaptor("Human", "Core", "Slice");
> my $transcriptObject = $transcript_adaptor->fetch_by_stable_id("ENST00000413656");
> my $transcriptMapper = $transcriptObject->get_TranscriptMapper();
> my @pepCoords = $transcriptMapper->genomic2pep(11863085, 11863085, -1);
> foreach my $pepinfo (@pepCoords){
>   my @pep2genomic = $transcriptMapper->pep2genomic($pepinfo->start, $pepinfo->end);
>   print "Peptide = ". $pepinfo->start."\t". $pepinfo->end."\t". $pepinfo->id."\t". $pepinfo->length."\t". $pepinfo->strand."\n";
>   foreach my $gc (@pep2genomic){
>     my $codonSlice = $slice_adaptor->fetch_by_region("toplevel", 1 , $gc->start, $gc->end, $gc->strand);
>     print "Genomic = ".$gc->start."\t".$gc->end."\t".$gc->id."\t".$gc->length."\t".$gc->strand."\t".$codonSlice->seq."\n";
>   }
> }
> 
> However, normally a peptide should represent three nucleotides, but I am getting a two nucleotide codon (TC) for this position and my script dies with following error message
> 
> Can't locate object method "id" via package "Bio::EnsEMBL::Mapper::Gap".
> 
> Could somebody please explain me what these gaps are? And also, how to overcome this issue of two/one nucleotide codons.
> 
> Any help is greatly appreciated
> 
> Thanking you
> 
> Hardip
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev

Jana Vandrovcova, Ph.D.
User Support Officer

EMBL - European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD
United Kingdom

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110309/88d51e34/attachment.html>