[ensembl-dev] cDNA and CDS lack of total matching for some genes

Manuel Tardáguila Sancho manueltar at hotmail.com
Tue Dec 16 16:27:16 GMT 2014


Thanks Kieron, 
I have solved the problem, I just wanted to know the extension of upstream and downstream UTRs for which I confronted CDS to cDNA. Perhaps that convention of N's should be reflected in the readme file as the heather characteristics do. Best,
Manuel
 

> Date: Tue, 16 Dec 2014 16:18:58 +0000
> From: ktaylor at ebi.ac.uk
> To: dev at ensembl.org
> Subject: Re: [ensembl-dev] cDNA and CDS lack of total matching for some genes
> 
> Dear Manuel,
> 
> The N's present in the CDS are standard procedure for Ensembl. They 
> exist to ensure that translation is in the correct phase when there is 
> ambiguity. The transcript you highlight begins with a phase of 2, hence 
> two N's are required to keep the protein codons correct. It has also 
> been manually annotated as having incomplete CDS up and downstream, 
> which more or less tells us the same thing.
> 
> http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000007350;r=X:154295671-154330350;t=ENST00000426203
> 
> It might be useful for you to tell us what you're aiming for, as we 
> might have better tools for your task.
> 
> 
> Regards,
> 
> 
> Kieron Taylor
> Ensembl Core
> 
> 
> On 16/12/2014 13:20, Manuel Tardáguila Sancho wrote:
> > Hello Ensembl Dev team,
> >
> > I am currently working with two files from Ensembl release
> > 75; Homo_sapiens.GRCh37.75.cds.all.fa, with all the CDS from the human
> > release, and Homo_sapiens.GRCh37.75.cdna.all.fa with all the cDNA.
> >
> > As part of one of my scripts I was checking that the CDS matches into
> > the cDNA and it does so routinely except for some genes for which the
> > CDS begins with one or two N (see example below).
> >
> > Once these N's are removed the CDS matches the cDNA.
> >
> > All of the CDS that I have checked lack a proper ATG as start codon.
> >
> > I don ´t know if these N's are a code to denote transcripts with no
> > canonical start codon, I have checked the accompanying README files and
> > they don't mention them. Best,
> >
> > Manuel Tardaguila
> >
> >  >ENST00000426203 *cdna:*putative
> > chromosome:GRCh37:X:153533396:153539497:1 gene:ENSG00000007350
> > gene_biotype:protein_coding transcript_biotype:protein_coding
> >
> > AGAGGCACAAAGGAAACTTGCCCCGAGTCCACGGTGCTCTGCGGTTAGGAGCTGGCCTCA
> > CTGTGCACAGGGGGAGGGGTGCCACCCTACATCATGTAGCAGTTCTTCTGAGATCATGTC
> > TGTGCTGTTCTTCTACATCATGAGGTACAAGCAGTCAGATCCAGAGAATCCGGACAACGA
> > CCGATTTGTCCTCGCAAAGAGACTGTCGTTTGTGGATGTGGCAACAGGATGGCTCGGACA
> > AGGACTGGGAGTTGCATGTGGAATGGCATATACTGGCAAGTACTTCGACAGGGCCAGCTA
> > CCGGGTGTTCTGCCTCATGAGTGATGGCGAGTCCTCAGAAGGCTCTGTCTGGGAGGCAAT
> > GGCCTTTGCTTCCTACTACAGTCTGGACAATCTTGTGGCAATCTTTGATGTGAACCGCCT
> > GGGACACAGTGGTGCATTGCCCGCCGAGCACTGCATAAACATCTATCAGAGGCGCTGCGA
> > AGCCTTTGGGTGGAACACTTATGTGGTGGACGGCCGGGACGTGGA
> >
> >  >ENST00000426203 *cds*:putative
> > chromosome:GRCh37:X:153533396:153539497:1 gene:ENSG00000007350
> > gene_biotype:protein_coding transcript_biotype:protein_coding
> > *NN*AGAGGCACAAAGGAAACTTGCCCCGAGTCCACGGTGCTCTGCGGTTAGGAGCTGGCCT
> > CACTGTGCACAGGGGGAGGGGTGCCACCCTACATCATGTAGCAGTTCTTCTGAGATCATG
> > TCTGTGCTGTTCTTCTACATCATGAGGTACAAGCAGTCAGATCCAGAGAATCCGGACAAC
> > GACCGATTTGTCCTCGCAAAGAGACTGTCGTTTGTGGATGTGGCAACAGGATGGCTCGGA
> > CAAGGACTGGGAGTTGCATGTGGAATGGCATATACTGGCAAGTACTTCGACAGGGCCAGC
> > TACCGGGTGTTCTGCCTCATGAGTGATGGCGAGTCCTCAGAAGGCTCTGTCTGGGAGGCA
> > ATGGCCTTTGCTTCCTACTACAGTCTGGACAATCTTGTGGCAATCTTTGATGTGAACCGC
> > CTGGGACACAGTGGTGCATTGCCCGCCGAGCACTGCATAAACATCTATCAGAGGCGCTGC
> > GAAGCCTTTGGGTGGAACACTTATGTGGTGGACGGCCGGGACGTGGA
> >
> >
> >
> >
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> >
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141216/dd0e8f15/attachment.html>


More information about the Dev mailing list