[ensembl-dev] ensembl-variation: import_vcf OK, but linking variations to transcripts ?

Wed Dec 19 09:40:48 GMT 2012

Hi Jan,

External users are probably better off sticking to using the "--add_tables
transcript_variation" option.

It is possible (though currently experimental/beta) to vastly speed up this
process by using a VEP cache file to retrieve transcript information.

To do this, you would need to have a VEP cache unpacked (as described here
http://www.ensembl.org/info/docs/variation/vep/vep_script.html#pre), and a
FASTA file (as described here
http://www.ensembl.org/info/docs/variation/vep/vep_script.html#fasta,
though note you should use the "primary_assembly" rather than "toplevel"
FASTA file).

You then add the following options when running import_vcf.pl:

--fasta /path/to/fasta/file.fa --cache ~/.vep/homo_sapiens/69/

assuming you unpacked your cache file to the recommended directory. The
unpacked cache file could be either one of the pre-built ones, or one you
have created from your own core database using "--build all" in the VEP.

If you want SIFT and PolyPhen predictions, you will also need a copy of the
protein_function_predictions and translation_md5 tables from the current
human variation DB in the database you are populating. Note that these will
still work even if you have custom transcripts as the predictions are based
on translated sequence rather than any stable ID.

There is currently no way to limit which transcripts are picked up; you'd
have to root around in Utils/VEP.pm (sub whole_genome_fetch_transcript) to
manually skip.

Hope this helps

Will McLaren
Ensembl Variation

On 18 December 2012 17:53, Jan Vogel <jan.vogel at gmail.com> wrote:

>
>
> Hello,
>
> I have imported a VCF-file into a ensembl-variation database which I
> created from scratch - but i failed to link the variations / alleles /
> variation features to transcript_variations.
>
> The documentation on the import_vcf.pl script can be found here:
>
>
> http://uswest.ensembl.org/info/docs/variation/import_vcf.html#transcript_variation
>
> In the documentation ("transcript_variation" section) it says:
>
> "It may also be faster to do this once the VCF import is finished using
> the standard transcript_variation pipeline."
>
> Can someone point me to some documentation on the standard
> transcript-variation pipeline and how to run it?
>
> I've tried to use the parallel_transcript_variation.pl script but it
> seems that this is working with some old schema (v58?).
>
> Any hint or draft of some documentation / command history is welcome.
> Also, is there a way to limit the linking of variations to genes
> /transcripts to specific logic-names ? I have various gene-sets in my core
> db and like to limit the consequence / transcript variation to only the
> canonical transcripts of one gene set.
>
>
> Thanks,
>
>    Jan
>
>
>
>
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20121219/e92690e5/attachment.html>