[ensembl-dev] Transcript names in the cache

João Eiras joao.eiras at gmail.com
Mon Oct 3 16:06:01 BST 2016


Hi.

I see the common transcript name is not included in the cache, and as
such Bio::EnsEMBL::Transcript->external_name returns nothing.

For instance, transcript ENSMUST00000036136 has the name Colec-201.
However only its gene has the name assigned ("Colec").

This variant on genome GRCm38 happens inside that transcript:
#CHROM POS REF ALT QUAL ID INFO FORMAT
chr12 28612863 C T 999 . . .

Or if you want to poke the cache
$ gzip -dc <path to vep
cache>/mus_musculus_merged/85_GRCm38/12/28000001-29000000.gz | \
  perl -e 'use Storable qw(fd_retrieve);use Data::Dumper;print Dumper(fd_ret
rieve(STDIN));' | \
  grep -a -C 6 -e ENSMUST00000036136 -e Colec

Is there a reason why the name is not included, since that info is
available in the gtf files ? [2]

Thank you.

[1] http://www.ensembl.org/Mus_musculus/Transcript/Summary?db=core;g=ENSMUSG00000036655;r=12:28594173-28623290;t=ENSMUST00000036136

[2] ftp://ftp.ensembl.org/pub/release-85/gtf/mus_musculus/Mus_musculus.GRCm38.85.gtf.gz
line 1323644




More information about the Dev mailing list