[ensembl-dev] VEP command line
Irina Armean
iarmean at ebi.ac.uk
Tue Sep 17 15:01:26 BST 2019
Hi Margaret,
To get an accurate count for overlapping transcripts you would need to
count the unique identifiers in the 'Feature' column for the records
that have '/Transcript/' in the 'Feature_type' column.
A similar count would be needed for regulatory features: counting the
unique identifiers in the 'Feature' column for the records that have
'/RegulatoryFeature/' in the 'Feature_type' column.
Counting the occurrences of "/Transcript/" in the 'Feature' column will
result in double counting any transcript that is affected by more than
one of the input variants.
'sense_overlapping' indicates that the variant is on a long non-coding
transcript that contains a protein coding gene within its intronic
sequence on the same strand, with no overlap of exonic sequence.
Kind regards,
Irina
On 16/09/2019 18:25, Linan, Margaret wrote:
>
> Thanks Irina,
>
>
> Regarding the counting of the overlapped transcripts and regulatory
> features (using stats_html), should I just count how many times the
> string "transcript" or "regulatory features" appears in the 'Feature'
> column?
>
> Also, what string would I be searching for in the 'Feature_type'
> column? In an example VEP annotated VCF, the only relevant string was:
> 'sense_overlapping'
>
>
> Best regards,
>
>
> *****Margaret Linan, MPH MS*****
> Independent Consultant
> Serving the CBIPM @ Icahn School of Medicine at Mount Sinai
> Margaret.Linan at mssm.edu
>
> ------------------------------------------------------------------------
> *From:* Irina Armean <iarmean at ebi.ac.uk>
> *Sent:* Monday, September 16, 2019 8:22:53 AM
> *To:* Ensembl developers list; Linan, Margaret
> *Subject:* Re: [ensembl-dev] VEP command line
> USE CAUTION: External Message.
>
> Hi Margaret,
>
>
> Sorry for the delay.
>
> The stats written out in stats_html are collected internally
> simultaneously with the VEP annotation and therefore are not generated
> based on the VCF columns of the output file.
>
>
> Depending on what VEP run options were selected, the counts could be
> reproduced based on the output file. For example the number of
> overlapped genes corresponds to the unique count of ENSG identifiers
> in the 'Gene' output column. The number of overlapped transcripts and
> regulatory features could be computed based on the 'Feature' and
> 'Feature_type' columns.
>
>
>
> Kind regards,
>
> Irina
>
>
> On 12/09/2019 19:27, Linan, Margaret wrote:
>>
>> Hi -
>>
>>
>> Does anyone know how the VEP command line program's stats_html
>> utility calculates the following (i.e., what VCF columns and
>> operations it uses)?
>>
>> - VCF file pre-processing
>>
>> - Number of overlapped genes
>>
>> - Number of overlapped transcripts
>>
>> - Number of overlapped regulatory features
>>
>>
>> Thank you,
>>
>> Margaret
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog:http://www.ensembl.info/
> --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190917/168ec212/attachment.html>
More information about the Dev
mailing list