[ensembl-dev] Transcript names in the cache

Will McLaren wm2 at ebi.ac.uk
Mon Oct 3 16:34:38 BST 2016


Hi Joao,

The transcripts objects stored in the cache get "stripped" after being
loaded from the database and before being serialized to disk. This has two
functions: firstly to reduce the size of the cache by removing unnecessary
or redundant information, and secondly to prevent the inclusion of
components that cannot be serialized (such as database
connections/adaptors).

This external transcript name was not used by the VEP code, so it falls in
the unnecessary category, hence it is stripped out. You can see which
fields get stripped in the clean_transcript() subroutine of VEP.pm:

https://github.com/Ensembl/ensembl-variation/blob/master/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm#L4907-L4936

Regards

Will McLaren
Ensembl Variation

On 3 October 2016 at 16:06, João Eiras <joao.eiras at gmail.com> wrote:

> Hi.
>
> I see the common transcript name is not included in the cache, and as
> such Bio::EnsEMBL::Transcript->external_name returns nothing.
>
> For instance, transcript ENSMUST00000036136 has the name Colec-201.
> However only its gene has the name assigned ("Colec").
>
> This variant on genome GRCm38 happens inside that transcript:
> #CHROM POS REF ALT QUAL ID INFO FORMAT
> chr12 28612863 C T 999 . . .
>
> Or if you want to poke the cache
> $ gzip -dc <path to vep
> cache>/mus_musculus_merged/85_GRCm38/12/28000001-29000000.gz | \
>   perl -e 'use Storable qw(fd_retrieve);use Data::Dumper;print
> Dumper(fd_ret
> rieve(STDIN));' | \
>   grep -a -C 6 -e ENSMUST00000036136 -e Colec
>
> Is there a reason why the name is not included, since that info is
> available in the gtf files ? [2]
>
> Thank you.
>
> [1] http://www.ensembl.org/Mus_musculus/Transcript/Summary?
> db=core;g=ENSMUSG00000036655;r=12:28594173-28623290;t=ENSMUST00000036136
>
> [2] ftp://ftp.ensembl.org/pub/release-85/gtf/mus_musculus/
> Mus_musculus.GRCm38.85.gtf.gz
> line 1323644
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161003/78bb7583/attachment.html>


More information about the Dev mailing list