[ensembl-dev] VEP - protein "domains" results with the REST API and the command line tool

Pedro Almeida pedro.almeida at heartgenetics.com
Tue Jun 21 18:01:16 BST 2022

Hi all,

I've been trying to get information of overlapping protein domains for one
variant using VEP, but it looks as if the REST API returns more domains
than the command line tool. Domains here means the output of the command
line switch `--domains`, which, as far as I can tell, is the same as
`domains=1` with the `GET vep/:species/id/:id` API request.

For example, for this single variant I'm using for testing, EGFR T790M,
with the GET method above `
the `domains` list of the `transcript_consequences` object, lists several
ENSP_mappings and also information from CDD, Pfam, PROSITE_profiles, and
others. I'm more interested in the Pfam information, which in this case
corresponds to a protein tyrosine and serine/threonine kinase, PF07714.

However, when I run this same variant in the command line (using a VCF file
with this single variant as input), I can only obtain information from the
ENSP_mappings, but all other databases appear to be missing. The command
used was the following:

vep --domains --dir_cache /opt/bioResources/vep_106/ --fasta
--input_file T790M.vcf --output_file T790M.vep.json --cache --offline
--json --force_overwrite

Does anyone know if this is expected, or how to get the same output of the
REST API (regarding the list of protein domains) when using the command
line tool? Are custom annotations needed for these cases?

Many thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20220621/a5d43e6a/attachment.html>

More information about the Dev mailing list