[ensembl-dev] variation database usage

Will McLaren wm2 at ebi.ac.uk
Thu Feb 16 11:49:45 GMT 2012


Hi Hardip,

This looks like a bug, although difficult to tell with your local DB setup.

You might be able to shortcut it by adding the following line to
ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm,
sub-routine check_frequencies

    return 0 unless defined($v);

after the line

    my $v = $config->{va}->fetch_by_name($var_name);

Will

On 16 February 2012 11:30, Hardip Patel <hardip.patel at anu.edu.au> wrote:
> Hi Will
>
> Thank you for your quick response. I tried to run the following command and it failed with a  message.
>
> 2012-02-16 22:13:34 - Starting...
> 2012-02-16 22:13:36 - Read 54138 variants into buffer
> 2012-02-16 22:14:40 - Checking for existing variations
> [==================================================================================================================================]  [ 100% ]
> 2012-02-16 22:14:56 - Analyzing chromosome 22
> 2012-02-16 22:14:57 - Reading transcript data from cache and/or database
> [==================================================================================================================================]  [ 100% ]
> 2012-02-16 22:16:29 - Retrieved 4466 transcripts (0 mem, 0 cached, 8575 DB, 4109 duplicates)
> 2012-02-16 22:16:29 - Analyzing variants
> [==================================================================================================================================]  [ 100% ]
> 2012-02-16 22:18:07 - Reading regulatory data from cache and/or database
> [==================================================================================================================================]  [ 100% ]
> 2012-02-16 22:18:54 - Retrieved 19317 regulatory features (0 mem, 0 cached, 22312 DB, 2995 duplicates)
> 2012-02-16 22:18:54 - Analyzing RegulatoryFeatures
> [==================================================================================================================================]  [ 100% ]
> 2012-02-16 22:18:55 - Analyzing MotifFeatures
> [==================================================================================================================================]  [ 100% ]
> 2012-02-16 22:19:01 - Calculating and writing output
> [>                                                                                                                                 ]    [ 0% ]
> Can't call method "get_all_Alleles" on an undefined value at /home/depressed/ensembl-api/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm line 3400, <GEN0> line 54165.
>
>
>
> COMMAND:
>
> perl5.14.2 variant_effect_predictor.pl --output_file chr22.vep --species homo_sapiens --host cg.anu.edu.au --user compgen --password compgen --port 3306 --db_version 65 --format vcf --buffer 1000000000 --terms ensembl --canonical --hgnc --regulatory --protein --gene --condel b --polyphen b --sift b --force_overwrite --input_file chr22.1000.vcf --check_existing --check_frequency --check_alleles --freq_pop any --per_gene --freq_freq 0 --freq_filter include --freq_gt_lt gt
>
> Could you please let me know what am I doing wrong?
>
> Kind regards
>
>
> Hardip R. Patel, PhD
> Post-doctoral Research Fellow
>
> Genome Discovery Unit and RNA Biology Lab
> Genome Biology Department
> The John Curtin School of Medical Research
> College of Medicine, Biology and Environment
> The Australian National University
> Building 131, Garran Road, ANU Campus, Acton - 0200, ACT, Australia
> Email: hardip.patel at anu.edu.au, patelhardip at gmail.com
> Phone Number: (+61) 0449 180 715
>
>
>
>
> On 16/02/2012, at 9:45 PM, Will McLaren wrote:
>
>> Hi Hardip,
>>
>> You may find it easier to use the VEP for this as it wraps up a lot of
>> the functionality you are interested in already. You could get it to
>> check for phenotypes by creating a bed file or similar of
>> phenotype-associated loci, tabix indexing it and using it as a custom
>> data source for the VEP (see
>> http://www.ensembl.org/info/docs/variation/vep/vep_script.html#custom).
>>
>> The VEP can also compare to existing variations, and their alleles,
>> using --check_existing and --check_alleles.
>>
>> If you do want to continue with the API, here's some code that should
>> get you started - I'm assuming you have your VF object created in
>> $new_vf, and that you are connected to the database already.
>>
>> Cheers
>>
>> Will McLaren
>> Ensembl Variation
>>
>> # attach a slice to the VF, it probably doesn't have one
>> my $sa = $reg->get_adaptor("human","core","slice");
>> my $slice = $sa->fetch_by_region("chromosome", $new_vf->{chr});
>> $new_vf->{slice} = $slice;
>>
>> # get overlapping existing VFs from the variation database by fetching
>> from the feature slice of the new VF
>> foreach my $existing_vf(@{$new_vf->feature_Slice->get_all_VariationFeatures}) {
>>
>>  # compare alleles
>>  print "New alleles!\n" if $new_vf->allele_string ne
>> $existing_vf->allele_string;
>>
>>  # get phenotype annotations via the variation object
>>  foreach my $va(@{$existing_vf->variation->get_all_VariationAnnotations}) {
>>     print $existing_vf->variation_name, " is associated with
>> phenotype ", $va->phenotype_description, "\n";
>>  }
>> }
>>
>> On 16 February 2012 10:21, Hardip Patel <hardip.patel at anu.edu.au> wrote:
>>> Dear all
>>>
>>> I have vcf files generated for individual chromosomes from a human
>>> resequencing project. I was wondering if somebody could get me started with
>>> ways to use the variation api.
>>>
>>> I am mainly interested in knowing following from my vcf files.
>>>
>>> Is the variation in vcf is found in dbSNP and if yes, is it the same
>>> genotype as the one in my vcf file?
>>> Is the variation implicated in NHGRI_GWAS catalogue or not?
>>>
>>> I have tried reading documentation on variation api and i am not able to
>>> come up with a way to do this.
>>>
>>> I am able to use parse_vcf subroutine to parse vcf line and get a variation
>>> feature object. I am getting stuck after that in that I am not sure how to
>>> use the variationfeature to ask the above questions.
>>>
>>> Any help is greatly appreciated.
>>>
>>>
>>> Kind regards
>>>
>>>
>>> Hardip R. Patel, PhD
>>> Post-doctoral Research Fellow
>>>
>>> Genome Discovery Unit and RNA Biology Lab
>>> Genome Biology Department
>>> The John Curtin School of Medical Research
>>> College of Medicine, Biology and Environment
>>> The Australian National University
>>> Building 131, Garran Road, ANU Campus, Acton - 0200, ACT, Australia
>>> Email: hardip.patel at anu.edu.au, patelhardip at gmail.com
>>> Phone Number: (+61) 0449 180 715
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>> Scanned by Messagelabs ***
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list