[ensembl-dev] cDNA and CDS lack of total matching for some genes

Manuel Tardáguila Sancho manueltar at hotmail.com
Tue Dec 16 13:20:24 GMT 2014


Hello Ensembl Dev team,
I am currently working with two files from Ensembl release 75; Homo_sapiens.GRCh37.75.cds.all.fa, with all the CDS from the human release, and Homo_sapiens.GRCh37.75.cdna.all.fa with all the cDNA.
As part of one of my scripts I was checking that the CDS matches into the cDNA and it does so routinely except for some genes for which the CDS begins with one or two N (see example below).
Once these N's are removed the CDS matches the cDNA.
All of the CDS that I have checked lack a proper ATG as start codon.
I don ´t know if these N's are a code to denote transcripts with no canonical start codon, I have checked the accompanying README files and they don't mention them. Best,
Manuel Tardaguila
>ENST00000426203 cdna:putative chromosome:GRCh37:X:153533396:153539497:1 gene:ENSG00000007350 gene_biotype:protein_coding transcript_biotype:protein_coding
AGAGGCACAAAGGAAACTTGCCCCGAGTCCACGGTGCTCTGCGGTTAGGAGCTGGCCTCACTGTGCACAGGGGGAGGGGTGCCACCCTACATCATGTAGCAGTTCTTCTGAGATCATGTCTGTGCTGTTCTTCTACATCATGAGGTACAAGCAGTCAGATCCAGAGAATCCGGACAACGACCGATTTGTCCTCGCAAAGAGACTGTCGTTTGTGGATGTGGCAACAGGATGGCTCGGACAAGGACTGGGAGTTGCATGTGGAATGGCATATACTGGCAAGTACTTCGACAGGGCCAGCTACCGGGTGTTCTGCCTCATGAGTGATGGCGAGTCCTCAGAAGGCTCTGTCTGGGAGGCAATGGCCTTTGCTTCCTACTACAGTCTGGACAATCTTGTGGCAATCTTTGATGTGAACCGCCTGGGACACAGTGGTGCATTGCCCGCCGAGCACTGCATAAACATCTATCAGAGGCGCTGCGAAGCCTTTGGGTGGAACACTTATGTGGTGGACGGCCGGGACGTGGA
>ENST00000426203 cds:putative chromosome:GRCh37:X:153533396:153539497:1 gene:ENSG00000007350 gene_biotype:protein_coding transcript_biotype:protein_codingNNAGAGGCACAAAGGAAACTTGCCCCGAGTCCACGGTGCTCTGCGGTTAGGAGCTGGCCTCACTGTGCACAGGGGGAGGGGTGCCACCCTACATCATGTAGCAGTTCTTCTGAGATCATGTCTGTGCTGTTCTTCTACATCATGAGGTACAAGCAGTCAGATCCAGAGAATCCGGACAACGACCGATTTGTCCTCGCAAAGAGACTGTCGTTTGTGGATGTGGCAACAGGATGGCTCGGACAAGGACTGGGAGTTGCATGTGGAATGGCATATACTGGCAAGTACTTCGACAGGGCCAGCTACCGGGTGTTCTGCCTCATGAGTGATGGCGAGTCCTCAGAAGGCTCTGTCTGGGAGGCAATGGCCTTTGCTTCCTACTACAGTCTGGACAATCTTGTGGCAATCTTTGATGTGAACCGCCTGGGACACAGTGGTGCATTGCCCGCCGAGCACTGCATAAACATCTATCAGAGGCGCTGCGAAGCCTTTGGGTGGAACACTTATGTGGTGGACGGCCGGGACGTGGA

 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141216/19ebbf33/attachment.html>


More information about the Dev mailing list