[ensembl-dev] ./vep offline problem

Will McLaren wm2 at ebi.ac.uk
Mon Jun 12 16:04:48 BST 2017


Hi Sabrina,

The biotype is critical - if a transcript is to be interpreted at a
sequence level it must be protein-coding. For other transcript types VEP
typically reports only that the variant falls within (or near) that
transcript.

Regards

Will

On 12 June 2017 at 15:49, Sabrina Legoueix-Rodriguez <
sabrina.legoueix at inra.fr> wrote:

> Hi Will,
>
> Thanks for your answer.
> How important is the biotype for the predictions ?
>
> Best regrads,
>
> Sabrina
>
>
> Le 22/05/2017 18:02, Will McLaren a écrit :
>
> Hi Sabrina,
>
> There's a few issues with your GTF; if you correct them then it should
> work.
>
> 1) IDs should not be shared by transcripts and genes. In your example, I
> fixed this by prefixing the gene ID with "g_" and the transcript ID with
> "t_"
>
> 2) Transcript entries need a valid biotype; typically this will be
> "protein_coding" (see http://www.ensembl.org/info/docs/tools/vep/script/
> vep_cache.html#gff)
>
> 3) The phase field must be correctly set for CDS entries.
>
> These points also apply if you use a GFF format file.
>
> Hope that helps
>
> Will McLaren
> Ensembl Variation
>
> On 22 May 2017 at 12:36, Sabrina Legoueix-Rodriguez <
> <sabrina.legoueix at inra.fr>sabrina.legoueix at inra.fr> wrote:
>
>> Dear all,
>>
>> I have installed on my machine your recent vep API locally to use a home
>> made genome in order to get SNPs annotations.
>>
>> I used the instructions on these pages:
>> http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#offline
>> http://www.ensembl.org/info/docs/tools/vep/script/index.html
>>
>> My inputs are:
>> -> a home made reference genome in fasta file
>> -> a .VCF file with SNPs list on that genome
>> -> a .GTF file with genome annotations
>>
>> My goal is to use vep to generate a .vep file with functionnal
>> annotations of my SNPs.
>>
>> For instance:
>>
>> my gtf is:
>> tig00000004_pilon_pilon    Pacbio    gene    231183    234374    .
>> +    .    gene_id "A";
>> tig00000004_pilon_pilon    Pacbio    transcript    231183    234374
>> .    +    .    gene_id "A";transcript_id "A";
>> tig00000004_pilon_pilon    Pacbio    CDS    231183    234374    .    +
>> .    gene_id "A";transcript_id "A";
>> tig00000004_pilon_pilon    Pacbio    exon    231183    234374    .
>> +    .    gene_id "A";transcript_id "A";
>>
>> ( I also tried with a .gff)
>>
>> my vcf is:
>> ##...
>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
>> A_ATTACTCG
>> tig00000004_pilon_pilon 232205  .       G       A       9881.15 .
>> AC=8;AF=0.800;AN=10;DP=245;FS=0.000;MLEAC=8;MLEAF=0.800;MQ=60.05;QD=25.82;SOR=0.983
>> GT:AD:DP:GQ:PL  0:9,0:9:99:0,247
>>
>> => this snp should be found in the gene "A"
>>
>> To prepare the gtf (or also .gff), I used:
>> grep -v "^#" test.gtf  | sort -k1,1 -k4,4n -k5,5n | bgzip -c > test.gtf.gz
>> tabix -p gtf test.gtf.gz
>>
>> my command line is:
>> ./vep -i test.vcf -gtf test.gtf.gz -fasta ref.fasta --force_overwrite
>> or
>> ./vep -i test.vcf -gff test.gff.gz -fasta ref.fasta --force_overwrite
>>
>> The result file is:
>> #Uploaded_variation     Location        Allele  Gene    Feature
>> Feature_type    Consequence     cDNA_position   CDS_position
>> Protein_position        Amino_acids     Codons  Existing_variation
>> Extra
>> .       tig00000004_pilon_pilon:232205  A       -       -       -      *
>> intergenic_variant*      -       -       -       -       -       -
>> IMPACT=MODIFIER
>> variant_effect_output.txt (END)
>>
>>
>> It does not work, it retreives only integenic variants which is wrong as
>> I have some SNPs in genes...
>>
>> When I try the tools on data that I used to work on using gtf2vep.pl a
>> few years ago, it does not work either....
>>
>> Could you please help me and tell me if I am doing something wrong?
>>
>> Thank you in advance.
>>
>> Best regards,
>>
>> Sabrina
>> --
>>
>> Sabrina
>>
>> *Sabrina LEGOUEIX RODRIGUEZ*
>> Responsable Plateau Bioinformatique
>>
>> Tél. : +33 (0) 5 61 28 57 92 <+33%205%2061%2028%2057%2092>
>> sabrina.legoueix at toulouse.inra.fr <[MAIL]>
>> www.toulouse-white-biotechnology.com
>>
>> LinkedIn <https://www.linkedin.com/company/2757525h>    Twitter
>> <https://twitter.com/TWB_Biotech>
>> TWB - Parc technologique du canal • Bâtiment NAPA CENTER B • 3, rue
>> Ariane • 31520 Ramonville Saint-Agne
>> Ce message et ses pièces jointes sont strictement personnels. Ils peuvent
>> contenir des informations confidentielles. Si vous avez reçu ce message par
>> erreur, merci d'en avertir l'expéditeur et de détruire le message et les
>> documents joints. Toute utilisation des informations reçues par erreur est
>> interdite. This message and the attachments are strictly personal. They may
>> contain confidential information. If you have received this message in
>> error, please notify the sender and delete the message and the attachments.
>> Any use of this communication received in error is prohibited.
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> <http://lists.ensembl.org/mailman/listinfo/dev>http://lists.ensembl.org/
>> mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> --
>
> Sabrina
>
> *Sabrina LEGOUEIX RODRIGUEZ*
> Responsable Plateau Bioinformatique
>
> Tél. : +33 (0) 5 61 28 57 92 <+33%205%2061%2028%2057%2092>
> sabrina.legoueix at toulouse.inra.fr <[MAIL]>
> <http://www.toulouse-white-biotechnology.com/>www.toulouse-white-
> biotechnology.com
>
> LinkedIn <https://www.linkedin.com/company/2757525h>    Twitter
> <https://twitter.com/TWB_Biotech>
> TWB - Parc technologique du canal • Bâtiment NAPA CENTER B • 3, rue Ariane
> • 31520 Ramonville Saint-Agne
> Ce message et ses pièces jointes sont strictement personnels. Ils peuvent
> contenir des informations confidentielles. Si vous avez reçu ce message par
> erreur, merci d'en avertir l'expéditeur et de détruire le message et les
> documents joints. Toute utilisation des informations reçues par erreur est
> interdite. This message and the attachments are strictly personal. They may
> contain confidential information. If you have received this message in
> error, please notify the sender and delete the message and the attachments.
> Any use of this communication received in error is prohibited.
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170612/9c730daa/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: twb-logo.png
Type: image/png
Size: 7561 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170612/9c730daa/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 1245 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170612/9c730daa/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: linkedin.png
Type: image/png
Size: 1120 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170612/9c730daa/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 7561 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170612/9c730daa/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 1120 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170612/9c730daa/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: twitter.png
Type: image/png
Size: 1245 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170612/9c730daa/attachment-0005.png>


More information about the Dev mailing list