[ensembl-dev] VEP - protein "domains" results with the REST API and the command line tool

Likhitha Surapaneni likhithas at ebi.ac.uk
Wed Jun 22 09:12:24 BST 2022


Hi Pedro,

I am sorry to hear that you are facing an issue with VEP command line.

Could you please confirm if you were using RefSeq cache? RefSeq cache 
lacks classes of data present in the Ensembl transcript cache, one of 
them being Protein domains 
(https://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#refseq). 
Could you please try with Ensembl transcript cache and see if you are 
facing the same issue?

Hope this helps and please let me know if you have further questions.

Thanks and regards,

Likhitha

On 21/06/2022 18:01, Pedro Almeida wrote:
> Hi all,
>
> I've been trying to get information of overlapping protein domains for 
> one variant using VEP, but it looks as if the REST API returns more 
> domains than the command line tool. Domains here means the output of 
> the command line switch `--domains`, which, as far as I can tell, is 
> the same as `domains=1` with the `GET vep/:species/id/:id` API request.
>
> For example, for this single variant I'm using for testing, EGFR 
> T790M, with the GET method above 
> `https://rest.ensembl.org/vep/human/id/rs121434569?domains=1&content-type=application/json` 
> <https://rest.ensembl.org/vep/human/id/rs121434569?domains=1&content-type=application/json`> 
> the `domains` list of the `transcript_consequences` object, lists 
> several ENSP_mappings and also information from CDD, Pfam, 
> PROSITE_profiles, and others. I'm more interested in the Pfam 
> information, which in this case corresponds to a protein tyrosine and 
> serine/threonine kinase, PF07714.
>
> However, when I run this same variant in the command line (using a VCF 
> file with this single variant as input), I can only obtain information 
> from the ENSP_mappings, but all other databases appear to be missing. 
> The command used was the following:
>
> ```
> vep --domains --dir_cache /opt/bioResources/vep_106/ --fasta 
> /opt/bioResources/vep_106/homo_sapiens_refseq/106_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz 
> --input_file T790M.vcf --output_file T790M.vep.json --cache --offline 
> --json --force_overwrite
> ```
>
> Does anyone know if this is expected, or how to get the same output of 
> the REST API (regarding the list of protein domains) when using the 
> command line tool? Are custom annotations needed for these cases?
>
> Many thanks,
> Pedro
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/



More information about the Dev mailing list