[ensembl-dev] how to force VEP to only return unique Gene IDs per variant
Will McLaren
wm2 at ebi.ac.uk
Thu Jun 25 15:07:50 BST 2015
Hello,
You are seeing multiple results for the same gene because that gene has
multiple alternate transcripts (or splicing variants); VEP by default
produces one result "set" per transcript. Most genes will have more than
one alternate transcript.
There are several ways you can reduce this to one per variant or one per
gene, see the documentation here:
http://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick
Though please do heed the warnings! I'd guess --per_gene would be of most
use to you.
Regards
Will McLaren
Ensembl Variation
On 25 June 2015 at 14:49, Wibo Pipping <wibo at thehyve.nl> wrote:
> Hi Ensembl team,
>
> I have annotated a VCF file with entrez gene IDs using VEP 79. (command
> used: perl ~/variant_effect_predictor.pl -i inputfile.vcf --cache
> --refseq --vcf --offline
>
> The output I get back has some weird results. Basically it is doing this:
>
>
> GID=6790,6790,6790,6790,6790,6790,6790,6790,6790;GS=AURKA,AURKA,AURKA,AURKA,AURKA,AURKA,AURKA,AURKA,AURKA.
>
> This is for one variant. Is there a way I can tell it to not report
> duplicates in the same variant?
>
> Thank you!
>
>
> Regards,
>
> Wibo Pipping
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150625/29ee418e/attachment.html>
More information about the Dev
mailing list