[ensembl-dev] VEP command line

Irina Armean iarmean at ebi.ac.uk
Wed Sep 18 10:11:43 BST 2019


Hi Margaret,


just to clarify, there is no stats_html command. VEP will collect and 
write the summary data to the summary file at the same time it annotates 
and writes the results to the results file.


The fields "Variants Processed" in the summary html file represents the 
total number of variants that were read.

I assume by "Variants remaining after filtering" you mean "Variants 
filtered out", This count represents any input variants that were 
filtered out due to using the VEP --freq_filter and --freq_freq options 
(filters on allele frequency).


The same VEP command will process all VCFs the same. There are multiple 
options to filter and do QC using VEP and they are described on the 
following page: 
https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#filt


Kind regards,

Irina


On 17/09/2019 22:51, Linan, Margaret wrote:
>
> Hi Irina,
>
>
> Thanks, also I forgot to ask, regarding the stats_html command, there 
> are two fields in the output html "Variants Processed' and 'Variants 
> remaining after filtering'.
>
>
> What processes are normally used to process and filter these VCFs. 
> Does VEP use the same processing approach for all VCFs?
>
>
> Best Regards,
>
>
> *****Margaret Linan, MPH MS*****
> Independent Consultant
> Serving the CBIPM @ Icahn School of Medicine at Mount Sinai
> Margaret.Linan at mssm.edu
>
> ------------------------------------------------------------------------
> *From:* Irina Armean <iarmean at ebi.ac.uk>
> *Sent:* Tuesday, September 17, 2019 10:01:26 AM
> *To:* Linan, Margaret; Ensembl developers list
> *Subject:* Re: [ensembl-dev] VEP command line
>
> Hi Margaret,
>
>
> To get an accurate count for overlapping transcripts you would need to 
> count the unique identifiers in the 'Feature' column for the records 
> that have '/Transcript/' in the 'Feature_type' column.
>
> A similar count would be needed for regulatory features: counting the 
> unique identifiers in the 'Feature' column for the records that have 
> '/RegulatoryFeature/' in the 'Feature_type' column.
>
> Counting the occurrences of "/Transcript/" in the 'Feature' column 
> will result in double counting any transcript that is affected by more 
> than one of the input variants.
>
>
> 'sense_overlapping' indicates that the variant is on a long non-coding 
> transcript that contains a protein coding gene within its intronic 
> sequence on the same strand, with no overlap of exonic sequence.
>
>
>
> Kind regards,
>
> Irina
>
>
>
>
> On 16/09/2019 18:25, Linan, Margaret wrote:
>>
>> Thanks Irina,
>>
>>
>> Regarding the counting of the overlapped transcripts and regulatory 
>> features (using stats_html), should I just count how many times the 
>> string "transcript" or "regulatory features" appears in the 'Feature' 
>> column?
>>
>> Also, what string would I be searching for in the 'Feature_type' 
>> column? In an example VEP annotated VCF, the only relevant string 
>> was: 'sense_overlapping'
>>
>>
>> Best regards,
>>
>>
>> *****Margaret Linan, MPH MS*****
>> Independent Consultant
>> Serving the CBIPM @ Icahn School of Medicine at Mount Sinai
>> Margaret.Linan at mssm.edu
>>
>> ------------------------------------------------------------------------
>> *From:* Irina Armean <iarmean at ebi.ac.uk>
>> *Sent:* Monday, September 16, 2019 8:22:53 AM
>> *To:* Ensembl developers list; Linan, Margaret
>> *Subject:* Re: [ensembl-dev] VEP command line
>> USE CAUTION: External Message.
>>
>> Hi Margaret,
>>
>>
>> Sorry for the delay.
>>
>> The stats written out in stats_html are collected internally 
>> simultaneously with the VEP annotation and therefore are not 
>> generated based on the VCF columns of the output file.
>>
>>
>> Depending on what VEP run options were selected, the counts could be 
>> reproduced based on the output file. For example the number of 
>> overlapped genes corresponds to the unique count of ENSG identifiers 
>> in the 'Gene' output column. The number of overlapped transcripts and 
>> regulatory features could be computed based on the 'Feature' and 
>> 'Feature_type' columns.
>>
>>
>>
>> Kind regards,
>>
>> Irina
>>
>>
>> On 12/09/2019 19:27, Linan, Margaret wrote:
>>>
>>> Hi -
>>>
>>>
>>> Does anyone know how the VEP command line program's stats_html 
>>> utility calculates the following (i.e., what VCF columns and 
>>> operations it uses)?
>>>
>>>     - VCF file pre-processing
>>>
>>>     - Number of overlapped genes
>>>
>>>     - Number of overlapped transcripts
>>>
>>>     - Number of overlapped regulatory features
>>>
>>>
>>> Thank you,
>>>
>>> Margaret
>>>
>>>
>>> _______________________________________________
>>> Dev mailing listDev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>> Ensembl Blog:http://www.ensembl.info/
>> -- 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190918/b3ac4bca/attachment.html>


More information about the Dev mailing list