[ensembl-dev] ./vep offline problem

Will McLaren wm2 at ebi.ac.uk
Mon May 22 17:02:54 BST 2017


Hi Sabrina,

There's a few issues with your GTF; if you correct them then it should work.

1) IDs should not be shared by transcripts and genes. In your example, I
fixed this by prefixing the gene ID with "g_" and the transcript ID with
"t_"

2) Transcript entries need a valid biotype; typically this will be
"protein_coding" (see
http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gff)

3) The phase field must be correctly set for CDS entries.

These points also apply if you use a GFF format file.

Hope that helps

Will McLaren
Ensembl Variation

On 22 May 2017 at 12:36, Sabrina Legoueix-Rodriguez <
sabrina.legoueix at inra.fr> wrote:

> Dear all,
>
> I have installed on my machine your recent vep API locally to use a home
> made genome in order to get SNPs annotations.
>
> I used the instructions on these pages:
> http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#offline
> http://www.ensembl.org/info/docs/tools/vep/script/index.html
>
> My inputs are:
> -> a home made reference genome in fasta file
> -> a .VCF file with SNPs list on that genome
> -> a .GTF file with genome annotations
>
> My goal is to use vep to generate a .vep file with functionnal annotations
> of my SNPs.
>
> For instance:
>
> my gtf is:
> tig00000004_pilon_pilon    Pacbio    gene    231183    234374    .    +
> .    gene_id "A";
> tig00000004_pilon_pilon    Pacbio    transcript    231183    234374
> .    +    .    gene_id "A";transcript_id "A";
> tig00000004_pilon_pilon    Pacbio    CDS    231183    234374    .    +
> .    gene_id "A";transcript_id "A";
> tig00000004_pilon_pilon    Pacbio    exon    231183    234374    .    +
> .    gene_id "A";transcript_id "A";
>
> ( I also tried with a .gff)
>
> my vcf is:
> ##...
> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
> A_ATTACTCG
> tig00000004_pilon_pilon 232205  .       G       A       9881.15 .
> AC=8;AF=0.800;AN=10;DP=245;FS=0.000;MLEAC=8;MLEAF=0.800;MQ=60.05;QD=25.82;SOR=0.983
> GT:AD:DP:GQ:PL  0:9,0:9:99:0,247
>
> => this snp should be found in the gene "A"
>
> To prepare the gtf (or also .gff), I used:
> grep -v "^#" test.gtf  | sort -k1,1 -k4,4n -k5,5n | bgzip -c > test.gtf.gz
> tabix -p gtf test.gtf.gz
>
> my command line is:
> ./vep -i test.vcf -gtf test.gtf.gz -fasta ref.fasta --force_overwrite
> or
> ./vep -i test.vcf -gff test.gff.gz -fasta ref.fasta --force_overwrite
>
> The result file is:
> #Uploaded_variation     Location        Allele  Gene    Feature
> Feature_type    Consequence     cDNA_position   CDS_position
> Protein_position        Amino_acids     Codons  Existing_variation
> Extra
> .       tig00000004_pilon_pilon:232205  A       -       -       -      *
> intergenic_variant*      -       -       -       -       -       -
> IMPACT=MODIFIER
> variant_effect_output.txt (END)
>
>
> It does not work, it retreives only integenic variants which is wrong as I
> have some SNPs in genes...
>
> When I try the tools on data that I used to work on using gtf2vep.pl a
> few years ago, it does not work either....
>
> Could you please help me and tell me if I am doing something wrong?
>
> Thank you in advance.
>
> Best regards,
>
> Sabrina
> --
>
> Sabrina
>
> *Sabrina LEGOUEIX RODRIGUEZ*
> Responsable Plateau Bioinformatique
>
> Tél. : +33 (0) 5 61 28 57 92 <+33%205%2061%2028%2057%2092>
> sabrina.legoueix at toulouse.inra.fr <[MAIL]>
> <http://www.toulouse-white-biotechnology.com/>www.toulouse-white-
> biotechnology.com
>
> LinkedIn <https://www.linkedin.com/company/2757525h>    Twitter
> <https://twitter.com/TWB_Biotech>
> TWB - Parc technologique du canal • Bâtiment NAPA CENTER B • 3, rue Ariane
> • 31520 Ramonville Saint-Agne
> Ce message et ses pièces jointes sont strictement personnels. Ils peuvent
> contenir des informations confidentielles. Si vous avez reçu ce message par
> erreur, merci d'en avertir l'expéditeur et de détruire le message et les
> documents joints. Toute utilisation des informations reçues par erreur est
> interdite. This message and the attachments are strictly personal. They may
> contain confidential information. If you have received this message in
> error, please notify the sender and delete the message and the attachments.
> Any use of this communication received in error is prohibited.
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170522/ba1008fa/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: linkedin.png
Type: image/png
Size: 1120 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170522/ba1008fa/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: twitter.png
Type: image/png
Size: 1245 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170522/ba1008fa/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: twb-logo.png
Type: image/png
Size: 7561 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170522/ba1008fa/attachment-0002.png>


More information about the Dev mailing list