[ensembl-dev] Some RefSeq transcripts seem broken
João Eiras
joao.eiras at gmail.com
Tue Sep 27 03:48:42 BST 2016
Hi.
I did a small VEP plugin that outputs the wild type protein sequence
from the database together with its annotations, so then I get extract
some k-mers around annotations.
I got a bit confused to see the amino-acid sequence for some refseq
transcripts containing many stop codons. One such example are the
transcripts ENSMUST00000114099 and NM_172709.3 affected by variant
rs223913170.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
chr5 38300289 rs223913170 TG T 5755.73 . . .
The correct sequence is:
MPGGPGAPSSPAASSGSSRAAPSGIAACPLSPPPLARGSPQASGPRRGASVPQKLAETLSSQYGLNVFVA
GLLFLLAWAVHATGVGKSDLLCVLTALMLLQLLWMLWYVGRSYMQRRLIRPKDTHAGARWLRGSITLFAF
ITVVLGCLKVAYFIGFSECLSATEGVFPVTHAVHTLLQVYFLWGHAKDIIMSFKTLERFGVIHSVFTNLL
LWANSVLNESKHQLNEHKERLITLGFGNITIVLDDHTPQCNCTPPALCSALSHGIYYLYPFNIEYQILAS
TMLYVLWKNIGRRVDSSQHQKMQCRFDGVLVGSVLGLTVLAATIAVVVVYMIHIGRSKSKSESALIMFYL
YAITVLLLMGAAGLVGSWIYRVDEKSLDESKNPARKLDVDLLVATGSGSWLLSWGSILAIACAETRPPYT
WYNLPYSVLVIVEKYVQNIFIIESVHLEPEGVPEDVRTLRVVTVCSSEAAALAASTLGSQGMAQDGSPAV
NGNLCLQQRCGKEDQESGWEGATGTTRCLDFLQGGMKRRLLRNITAFLFLCNISLWIPPAFGCRPEYDNG
LEEIVFGFEPWIIVVNLAMPFSIFYRMHAAAALFEVYCKI
while VEP returns (difference in lower case).
MPGGPGAPSSPAASSGSSRAAPSGIAACPLSPPPLARGSPQASGPRRGASVPQKLAETLSSQYGLNVFVA
GLLFLLAWAVHATGVGKSDLLCVLTALMLLQLLWMLWYVGRSYMQRRLIRPKDTHAGARWLRGSITLFAF
ITVVLGCLKVAYFIGFSECLSATEGVFPVTHAVHTLLQVYFLWGHAKDIIMSFKTLERFGVIHSVFTNLL
LWANSVLNESKHQLNEHKERLITLGFGNITIVLDDHTPQCNCTPPALCSALSHGIYYLYPFNIEYQILAS
TMLYVLWKNIGRRVDSSQHQKMQCRFDGVLVGSVLGLTVLAATIAVVVVYMIHIGRSKSKSESALIMFYL
YAITVLLLMGAAGLVGSWIYRVDEKSLDESKNPARKLDVDLLVATGSGSWLLSWGSILAIACAETRPPYT
WYNLPYSVLVIVEKYVQNIFIIESVHLEPEGVPEDVRTLRVVTV lqqrgcrtgcihsrepgdgpgwvtcc
qwksvsaaevwergpgvwlgrsygdnpmsglpsgrheeeasqkhhglsvslqhlaldspclwlpsrv*qr
iggnclwl*tldncgqpghalfhflpdarsccpl*gll*dl
This was not the only case I saw, but didn't gather any other
examples. Shouldn't be too hard to make a script find refseq
transcripts that start at the same index as some ensembl tramscripts
and compare the AA sequences, but my perl-fu is weak.
What's up with this ?
Thank you.
More information about the Dev
mailing list