[ensembl-dev] loading VEP output into db

Wed Jul 27 17:22:40 BST 2016

Hi Cyriac,

I don't have any statistics to back this up, but I'd guess most users use
VCF as it's the only one that is standard(ish) and portable.

All have their uses though, and we try to support all equally well. The
only format we do produce that is somewhat disadvantaged compared to others
is GVF.

Personally, unless I needed to send the data through other things that
required VCF, I'd use tab-delimited format. It's human readable, you can
chuck it straight into a DB or spreadsheet, and it doesn't suffer from
delimiter overload and necessary delimiter translation that happens in VCF.
But this is just a personal opinion, and not a recommendation.

Regards

Will

On 27 July 2016 at 17:00, Cyriac Kandoth <kandothc at mskcc.org> wrote:

> Hi Will. Out of curiosity - do you know what is the most popular output
> format among VEP users? And which one do you prefer we all use?
>
> ~C
>
> On Wed, Jul 27, 2016 at 7:41 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>
>> Hi Nikolas,
>>
>> As I mentioned in reply to your previous question, you can use --tab with
>> VEP 84 or later, or in earlier versions use --fields to have all output
>> written to separate tab-separated columns.
>>
>> The limitation of using --fields is that you must specify them manually,
>> but you could do this once and save it to a configuration file [1] or as an
>> ENV variable. You then have the benefit of knowing which columns you would
>> need to create in your DB.
>>
>> Another option would be to use VCF output and a simple shell script to
>> create one line per transcript.
>>
>> Regards
>>
>> Will
>>
>> [1] :
>> http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_config
>>
>>
>>
>> On 27 July 2016 at 12:24, Nikolas Pontikos <n.pontikos at ucl.ac.uk<mailto:
>> n.pontikos at ucl.ac.uk>> wrote:
>> I'm looking for an easy way (with minimal reformatting) of loading the
>> output of VEP into a database.
>>
>> So far I have tried loading the JSON output straight into mongo but
>> was wondering if there was a simpler tab separated format with one
>> transcript per line.
>>
>> Maybe I should resort to only retrieving the most severe transcript
>> consequence per variant.
>>
>> I'd be happy to hear your suggestion.
>>
>> Many Thanks,
>>
>> Nikolas.
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160727/2517b076/attachment.html>