[ensembl-dev] Fwd: Custom annotation with vep
Will McLaren
wm2 at ebi.ac.uk
Thu Sep 19 11:17:40 BST 2013
Hello Miguel,
Using a GTF file as a custom annotation will not affect the consequence
type called - all that the VEP can do with this is to annotate whether or
not your input variants overlap with any of the features in your GTF file.
If there is an overlap, it should appear in the Extra column and look
something like
myFeatures=[feature]_[chr]:[start]-[end]
It looks like you are doing the right thing, however I think there is an
issue with the way the tabix indexes GTF files. The example data you give I
presume is for a gene on the reverse strand, since start > end. Tabix
doesn't seem to like this (even though it indexes the file without issue),
and doesn't return any overlapping data. You could get around this by
swapping start and end for those features on the reverse strand in your GTF
file.
It is also possible to create a VEP cache file using a GTF file, and in
this way you can annotate your variants with consequences according to the
gene structures specified in the GTF file; see
http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gtf . Note
the restrictions on formatting in the GTF - you may need to parse yours
judging by the format you have pasted.
Hope this helps
Will McLaren
Ensembl Variation
On 17 September 2013 21:28, Miguel Perez-Enciso <Miguel.Perez at uab.cat>wrote:
>
>
> Dear developers
> I am trying to use variant_effect_predictor.pl with a custom gtf file, I
> run these commands below but it seems that, although it reads my gtf file,
> it does not recognize the genes and returns all intergenic variants. I
> attach also the first rows of my gff file. I have installed latest v. (73).
> Could you help? Thanks a lot!
> Miguel Perez Enciso
>
> sort -k1,1 -k4,4n -k5,5n GFF_NDJ_TR.gtf | bgzip > GFF_NDJ_TR.gtf.gz
> tabix -p gff GFF_NDJ_TR.gtf.gz
>
> file=in.vcf
>
> perl variant_effect_predictor.pl --custom GFF_NDJ_TR.gtf.gz,myFeatures,**gff,overlap,0
> -i $file --species sus_scrofa -o $file.annot --format vcf --offline
> --force_overwrite --cache
>
> # gtf file
> 1 protein_coding stop_codon 21476516 21476514 . -
> . gene_id GRM1-201
> 1 protein_coding exon 21477448 21476379 . - .
> gene_id GRM1-201
> 1 protein_coding CDS 21477448 21476517 . - .
> gene_id GRM1-201
>
> ...
>
> The first lines of annot file are ( so it does read my annotation file)
>
> ## ENSEMBL VARIANT EFFECT PREDICTOR v73
> ## Output produced at 2013-09-17 22:04:13
> ## Connected to
> ## Using cache in /home/miguel/.vep/sus_scrofa/**73
> ## Using API version 73, DB version ?
> ## Extra column keys:
> ## DISTANCE : Shortest distance from variant to transcript
> ## myFeatures : /home/miguel/Documents/ngs/**GFF_NDJ_TR.gtf.gz (overlap)
> #Uploaded_variation Location Allele Gene Feature
> Feature_type Consequence cDNA_position CDS_position
> Protein_position Amino_acids Codons Existing_variation Extra
> 5_63894199_C/T 5:63894199 T - - -
> intergenic_variant - - - - - -
>
>
>
>
>
>
>
>
>
>
> ______________________________**_________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/**mailman/listinfo/dev<http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130919/1c22639e/attachment.html>
More information about the Dev
mailing list