[ensembl-dev] Fwd: Custom annotation with vep

Will McLaren wm2 at ebi.ac.uk
Thu Sep 19 11:17:40 BST 2013


Hello Miguel,

Using a GTF file as a custom annotation will not affect the consequence
type called - all that the VEP can do with this is to annotate whether or
not your input variants overlap with any of the features in your GTF file.
If there is an overlap, it should appear in the Extra column and look
something like

myFeatures=[feature]_[chr]:[start]-[end]

It looks like you are doing the right thing, however I think there is an
issue with the way the tabix indexes GTF files. The example data you give I
presume is for a gene on the reverse strand, since start > end. Tabix
doesn't seem to like this (even though it indexes the file without issue),
and doesn't return any overlapping data. You could get around this by
swapping start and end for those features on the reverse strand in your GTF
file.

It is also possible to create a VEP cache file using a GTF file, and in
this way you can annotate your variants with consequences according to the
gene structures specified in the GTF file; see
http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gtf . Note
the restrictions on formatting in the GTF - you may need to parse yours
judging by the format you have pasted.

Hope this helps

Will McLaren
Ensembl Variation


On 17 September 2013 21:28, Miguel Perez-Enciso <Miguel.Perez at uab.cat>wrote:

>
>
> Dear developers
> I am trying to use variant_effect_predictor.pl with a custom gtf file, I
> run these commands below but it seems that, although it reads my gtf file,
> it does not recognize the genes and returns all intergenic variants. I
> attach also the first rows of my gff file. I have installed latest v. (73).
> Could you help? Thanks a lot!
> Miguel Perez Enciso
>
> sort -k1,1 -k4,4n -k5,5n GFF_NDJ_TR.gtf | bgzip > GFF_NDJ_TR.gtf.gz
> tabix -p gff GFF_NDJ_TR.gtf.gz
>
> file=in.vcf
>
> perl variant_effect_predictor.pl --custom GFF_NDJ_TR.gtf.gz,myFeatures,**gff,overlap,0
> -i $file --species sus_scrofa -o $file.annot --format vcf   --offline
> --force_overwrite --cache
>
> # gtf file
> 1       protein_coding  stop_codon      21476516 21476514        .       -
>       .       gene_id GRM1-201
> 1       protein_coding  exon    21477448        21476379 .       -      .
>       gene_id GRM1-201
> 1       protein_coding  CDS     21477448        21476517 .       -   .
>   gene_id GRM1-201
>
> ...
>
> The first lines of annot  file are ( so it does read my annotation file)
>
> ## ENSEMBL VARIANT EFFECT PREDICTOR v73
> ## Output produced at 2013-09-17 22:04:13
> ## Connected to
> ## Using cache in /home/miguel/.vep/sus_scrofa/**73
> ## Using API version 73, DB version ?
> ## Extra column keys:
> ## DISTANCE : Shortest distance from variant to transcript
> ## myFeatures : /home/miguel/Documents/ngs/**GFF_NDJ_TR.gtf.gz (overlap)
> #Uploaded_variation     Location        Allele  Gene    Feature
> Feature_type    Consequence     cDNA_position   CDS_position
>  Protein_position        Amino_acids Codons  Existing_variation      Extra
> 5_63894199_C/T  5:63894199      T       -       -       -
> intergenic_variant      -       -       -       -       -       -
>
>
>
>
>
>
>
>
>
>
> ______________________________**_________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/**mailman/listinfo/dev<http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130919/1c22639e/attachment.html>


More information about the Dev mailing list