[ensembl-dev] Fwd: Re: 1000 Genomes SNPS
Andrea Edwards
edwardsa at cs.man.ac.uk
Wed Mar 2 15:05:49 GMT 2011
You could get the variations by variation set i think?
You might then have to check the variations to make sure they were snps
and not indels (note my code below does not do this)
These are the names of the different sets with 1000 genomes in the title.
-------------------------------------------
1000 genomes - High coverage - Trios
1000 genomes - High coverage - Trios - CEU
1000 genomes - High coverage - Trios - YRI
1000 genomes - High coverage exons
1000 genomes - High coverage exons - CEU
1000 genomes - High coverage exons - CHB
1000 genomes - High coverage exons - CHD
1000 genomes - High coverage exons - JPT
1000 genomes - High coverage exons - LWK
1000 genomes - High coverage exons - TSI
1000 genomes - High coverage exons - YRI
1000 genomes - Low coverage
1000 genomes - Low coverage - CEU
1000 genomes - Low coverage - CHB+JPT
1000 genomes - Low coverage - YRI
However i have never had any luck trying to retrieve variations in this
way with the api due to the amount of time it takes (the api masks lots
of joins on the database) - my code has always timed out mid-run. Rather
than wait I have always looked for alternative sources of the data. But
i have posted the code below. Perhaps you or the ensembl team can spot a
better way.
I have tried recently to get all watson variations and all human omim
variations and both timed out. I tried 2 different approaches - both of
which are below
I don't have a local copy of the human variation database - it might be
faster for you if you have local copy.
==============
common code to both approaches
===============
my $species='human';
my $reg = 'Bio::EnsEMBL::Registry';
$reg->load_registry_from_db(-host => 'ensembldb.ensembl.org',-user =>
'anonymous');
my $vfa = $reg->get_adaptor($species, 'variation', 'variationfeature');
my $slice_adaptor = $reg->get_adaptor($species, 'core', 'slice');
my $transcript_adaptor =$reg->get_adaptor($species, 'core','Transcript');
my $gene_adaptor =$reg->get_adaptor($species, 'core','Gene');
my $variation_set_adaptor= $reg->get_adaptor('human', 'variation',
'variationset');
my $omim_set = $variation_set_adaptor->fetch_by_name("OMIM");
=========================
Approach 1
Get variation features by slice
=========================
my @unsorted_slices = @{ $slice_adaptor->fetch_all('chromosome', undef,
0, 1) };
my @sorted_slices = sort by_num_then_letter @unsorted_slices;
#Base pair overlap between returned slices
my $overlap = 0;
## Maximum size of returned slices
my $max_size = 10000;
## Break chromosomal slices into smaller 100k component slices
my @sub_slices = @{split_Slices( \@sorted_slices, $max_size, $overlap ) };
#
foreach my $slice (@sub_slices) {
my @vfs =@{$omim_set->get_all_VariationFeatures_by_Slice($slice)};
my @sorted_vfs = sort { $a->start()<=> $b->start() } @vfs;
foreach my $vf (@sorted_vfs){
}
}
}
===========================
Approach 2
Get all variations for variation set
===========================
my @variations = @{$omim_set->get_all_Variations()};
foreach my $variation (@variations) {
my @vfs = @{$variation->get_all_VariationFeatures()};
foreach my $vf (@vfs) {
}
}
HTH
andrea
On 02/03/2011 14:08, cj5 at sanger.ac.uk wrote:
> Hi,
> Is it possible using the variations API to get a list of SNPS which have
> been submitted from the 1000 Genomes project?
>
> I have a vague idea that it should be possible to retrieve such a list
> using the SS (submission) ID and/or the validation status, however I am
> unsure of the details and what version of the API should be used.
>
> The latest 100 genomes pilot release (2010_07) would be great, but any
> earlier release would also be useful.
>
> Thanks
> Chris
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
More information about the Dev
mailing list