[ensembl-dev] VEP command line

Irina Armean iarmean at ebi.ac.uk
Tue Sep 17 15:01:26 BST 2019


Hi Margaret,


To get an accurate count for overlapping transcripts you would need to 
count the unique identifiers in the 'Feature' column for the records 
that have '/Transcript/' in the 'Feature_type' column.

A similar count would be needed for regulatory features: counting the 
unique identifiers in the 'Feature' column for the records that have 
'/RegulatoryFeature/' in the 'Feature_type' column.

Counting the occurrences of "/Transcript/" in the 'Feature' column will 
result in double counting any transcript that is affected by more than 
one of the input variants.


'sense_overlapping' indicates that the variant is on a long non-coding 
transcript that contains a protein coding gene within its intronic 
sequence on the same strand, with no overlap of exonic sequence.



Kind regards,

Irina




On 16/09/2019 18:25, Linan, Margaret wrote:
>
> Thanks Irina,
>
>
> Regarding the counting of the overlapped transcripts and regulatory 
> features (using stats_html), should I just count how many times the 
> string "transcript" or "regulatory features" appears in the 'Feature' 
> column?
>
> Also, what string would I be searching for in the 'Feature_type' 
> column? In an example VEP annotated VCF, the only relevant string was: 
> 'sense_overlapping'
>
>
> Best regards,
>
>
> *****Margaret Linan, MPH MS*****
> Independent Consultant
> Serving the CBIPM @ Icahn School of Medicine at Mount Sinai
> Margaret.Linan at mssm.edu
>
> ------------------------------------------------------------------------
> *From:* Irina Armean <iarmean at ebi.ac.uk>
> *Sent:* Monday, September 16, 2019 8:22:53 AM
> *To:* Ensembl developers list; Linan, Margaret
> *Subject:* Re: [ensembl-dev] VEP command line
> USE CAUTION: External Message.
>
> Hi Margaret,
>
>
> Sorry for the delay.
>
> The stats written out in stats_html are collected internally 
> simultaneously with the VEP annotation and therefore are not generated 
> based on the VCF columns of the output file.
>
>
> Depending on what VEP run options were selected, the counts could be 
> reproduced based on the output file. For example the number of 
> overlapped genes corresponds to the unique count of ENSG identifiers 
> in the 'Gene' output column. The number of overlapped transcripts and 
> regulatory features could be computed based on the 'Feature' and 
> 'Feature_type' columns.
>
>
>
> Kind regards,
>
> Irina
>
>
> On 12/09/2019 19:27, Linan, Margaret wrote:
>>
>> Hi -
>>
>>
>> Does anyone know how the VEP command line program's stats_html 
>> utility calculates the following (i.e., what VCF columns and 
>> operations it uses)?
>>
>>     - VCF file pre-processing
>>
>>     - Number of overlapped genes
>>
>>     - Number of overlapped transcripts
>>
>>     - Number of overlapped regulatory features
>>
>>
>> Thank you,
>>
>> Margaret
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog:http://www.ensembl.info/
> -- 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190917/168ec212/attachment.html>


More information about the Dev mailing list