[ensembl-dev] how to force VEP to only return unique Gene IDs per variant

Will McLaren wm2 at ebi.ac.uk
Thu Jun 25 15:07:50 BST 2015


Hello,

You are seeing multiple results for the same gene because that gene has
multiple alternate transcripts (or splicing variants); VEP by default
produces one result "set" per transcript. Most genes will have more than
one alternate transcript.

There are several ways you can reduce this to one per variant or one per
gene, see the documentation here:
http://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick

Though please do heed the warnings! I'd guess --per_gene would be of most
use to you.

Regards

Will McLaren
Ensembl Variation

On 25 June 2015 at 14:49, Wibo Pipping <wibo at thehyve.nl> wrote:

> Hi Ensembl team,
>
> I have annotated a VCF file with entrez gene IDs using VEP 79. (command
> used: perl ~/variant_effect_predictor.pl -i inputfile.vcf --cache
> --refseq --vcf --offline
>
> The output I get back has some weird results. Basically it is doing this:
>
>
> GID=6790,6790,6790,6790,6790,6790,6790,6790,6790;GS=AURKA,AURKA,AURKA,AURKA,AURKA,AURKA,AURKA,AURKA,AURKA.
>
> This is for one variant. Is there a way I can tell it to not report
> duplicates in the same variant?
>
> Thank you!
>
>
> Regards,
>
> Wibo Pipping
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150625/29ee418e/attachment.html>


More information about the Dev mailing list