[ensembl-dev] gtf2vep / cache / API

Will McLaren wm2 at ebi.ac.uk
Tue Sep 1 16:06:27 BST 2015


Hi Sabrina,

There is indeed a way, though it requires a little knowledge of the
internals of the VEP to get it working.

To retrieve the data off disk is simple enough, but some method calls you
can normally make on a transcript object (e.g. translateable_seq() to get
the CDS sequence) won't work. Best thing to do is to step in with the perl
debugger and explore the contents of the $tr object.

Here's a script to retrieve the transcripts in a given 1MB region and dump
out the CDS sequences as FASTA. You'll need to modify the $config hash to
point to your particular directory, and the region specifics will need
changing according to which you want to dump.

Regards

Will McLaren
Ensembl Variation

###
### BEGIN SCRIPT
###
use Bio::EnsEMBL::Variation::Utils::VEP qw(load_dumped_transcript_cache);

# config hash defines a few things needed for method to work
my $config = {
  'compress' => 'gzip -dc', # required; try 'zcat' instead if this doesn't
work
  'dir'      => '/Users/will/.vep/homo_sapiens/80_GRCh38', # full path to
VEP cache including species, version, assembly
  'quiet'    => 1, # not required but stops splurge
};

# define region
# regions in the cache are of fixed size (1MB) with one cache file per MB
# files are named after these region names under chromosome sub-directories
# in the VEP cache directory, so it's easy to loop over them if you read
# the contents of the directory
my $chr = 1;
my $region_start = 1000001; # region start must be (r * 10^6) + 1
my $region_end   = 2000000; # region end must be (r + 1) * 10^6

# this returns a hashref with one member keyed on chromosome name
my $trs = load_dumped_transcript_cache($config, $chr,
$region_start.'-'.$region_end);

foreach my $tr(@{$trs->{$chr}}) {
  printf(
    ">%s\n%s\n",
    $tr->stable_id, # some methods can be called verbatim
    $tr->{_variation_effect_feature_cache}->{translateable_seq} # others
use internal cache
  );
}

On 31 August 2015 at 08:44, Sabrina Legoueix Rodriguez <
sabrina.rodriguez at toulouse.inra.fr> wrote:

> Dear all,
>
> I am working on a specie whose reference genome is not publicly available.
> I have a .gtf file for CDS annotations and a fasta file for the genome
> sequence.
>
> I am using gtf2vep.pl to generate my .vep file from my .vcf file.
>
> I would like to obtain the transcript nucleotide sequences of my CDSs (
> coding sequences of my genes without UTRs...).
> Is there a way to connect to the cache file generated with gtf2vep.pl
> (instead of the database registry) and use Ensembl API objects to get the
> coding sequences of my genes?
>
> Thanks in advance for your answer.
>
> Best regards,
>
> --
>
> Sabrina
>
> *Attention changement de coordonnées à partir du 15 Juin 2015:*
>
> *Sabrina LEGOUEIX RODRIGUEZ*
> Responsable Plateau Bioinformatique
>
> Tél. : +33 (0) 5 61 28 57 92
> sabrina.legoueix at toulouse.inra.fr <[MAIL]>
> www.toulouse-white-biotechnology.com
>
> LinkedIn <https://www.linkedin.com/company/2757525h>    Twitter
> <https://twitter.com/TWB_Biotech>
> TWB - Parc technologique du canal • Bâtiment NAPA CENTER B • 3, rue Ariane
> • 31520 Ramonville Saint-Agne
> Ce message et ses pièces jointes sont strictement personnels. Ils peuvent
> contenir des informations confidentielles. Si vous avez reçu ce message par
> erreur, merci d'en avertir l'expéditeur et de détruire le message et les
> documents joints. Toute utilisation des informations reçues par erreur est
> interdite. This message and the attachments are strictly personal. They may
> contain confidential information. If you have received this message in
> error, please notify the sender and delete the message and the attachments.
> Any use of this communication received in error is prohibited.
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150901/53e33b77/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 1245 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150901/53e33b77/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 7561 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150901/53e33b77/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 1120 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150901/53e33b77/attachment-0002.png>


More information about the Dev mailing list