[ensembl-dev] VEP command line

Linan, Margaret margaret.linan at mssm.edu
Tue Sep 17 22:51:22 BST 2019


Hi Irina,


Thanks, also I forgot to ask, regarding the stats_html command, there are two fields in the output html "Variants Processed' and 'Variants remaining after filtering'.


What processes are normally used to process and filter these VCFs. Does VEP use the same processing approach for all VCFs?


Best Regards,


Margaret Linan, MPH MS
Independent Consultant
Serving the CBIPM @ Icahn School of Medicine at Mount Sinai
Margaret.Linan at mssm.edu

________________________________
From: Irina Armean <iarmean at ebi.ac.uk>
Sent: Tuesday, September 17, 2019 10:01:26 AM
To: Linan, Margaret; Ensembl developers list
Subject: Re: [ensembl-dev] VEP command line


Hi Margaret,


To get an accurate count for overlapping transcripts you would need to count the unique identifiers in the 'Feature' column for the records that have 'Transcript' in the 'Feature_type' column.

A similar count would be needed for regulatory features: counting the unique identifiers in the 'Feature' column for the records that have 'RegulatoryFeature' in the 'Feature_type' column.

Counting the occurrences of "Transcript" in the 'Feature' column will result in double counting any transcript that is affected by more than one of the input variants.


'sense_overlapping' indicates that the variant is on a long non-coding transcript that contains a protein coding gene within its intronic sequence on the same strand, with no overlap of exonic sequence.


Kind regards,

Irina




On 16/09/2019 18:25, Linan, Margaret wrote:

Thanks Irina,


Regarding the counting of the overlapped transcripts and regulatory features (using stats_html), should I just count how many times the string "transcript" or "regulatory features" appears in the 'Feature' column?

Also, what string would I be searching for in the 'Feature_type' column? In an example VEP annotated VCF, the only relevant string was: 'sense_overlapping'


Best regards,


Margaret Linan, MPH MS
Independent Consultant
Serving the CBIPM @ Icahn School of Medicine at Mount Sinai
Margaret.Linan at mssm.edu<mailto:Margaret.Linan at mssm.edu>

________________________________
From: Irina Armean <iarmean at ebi.ac.uk><mailto:iarmean at ebi.ac.uk>
Sent: Monday, September 16, 2019 8:22:53 AM
To: Ensembl developers list; Linan, Margaret
Subject: Re: [ensembl-dev] VEP command line

USE CAUTION: External Message.

Hi Margaret,


Sorry for the delay.

The stats written out in stats_html are collected internally simultaneously with the VEP annotation and therefore are not generated based on the VCF columns of the output file.


Depending on what VEP run options were selected, the counts could be reproduced based on the output file. For example the number of overlapped genes corresponds to the unique count of ENSG identifiers in the 'Gene' output column. The number of overlapped transcripts and regulatory features could be computed based on the 'Feature' and 'Feature_type' columns.



Kind regards,

Irina


On 12/09/2019 19:27, Linan, Margaret wrote:

Hi -


Does anyone know how the VEP command line program's stats_html utility calculates the following (i.e., what VCF columns and operations it uses)?

- VCF file pre-processing

- Number of overlapped genes

- Number of overlapped transcripts

- Number of overlapped regulatory features


Thank you,

Margaret



_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ensembl.org_mailman_listinfo_dev-5Fensembl.org&d=DwMC-g&c=shNJtf5dKgNcPZ6Yh64b-A&r=kRxZpbitOhDkEC3BuUN1vDtzo3iicYrRn6woDJL_jnA&m=w9gjaZF2-WgEeSoFXEwsblFfwJmVFz1CEmhpSp9zXtY&s=SpZOBETLvgtXkDPVAYD1y-NoSVS2-Gm6y5Og0WsbrqU&e=>
Ensembl Blog: http://www.ensembl.info/<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.info_&d=DwMC-g&c=shNJtf5dKgNcPZ6Yh64b-A&r=kRxZpbitOhDkEC3BuUN1vDtzo3iicYrRn6woDJL_jnA&m=w9gjaZF2-WgEeSoFXEwsblFfwJmVFz1CEmhpSp9zXtY&s=5upY6Tga0npIqKtFlwp1cmQIuwbtshPzDJQJPRAHMYg&e=>


--




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190917/be3d4213/attachment.html>


More information about the Dev mailing list