[ensembl-dev] Fwd: Re: 1000 Genomes SNPS

Andrea Edwards edwardsa at cs.man.ac.uk
Wed Mar 2 15:05:49 GMT 2011



You could get the variations by variation set i think?
You might then have to check the variations to make sure they were snps
and not indels (note my code below does not do this)

These are the names of the different sets with 1000 genomes in the title.
-------------------------------------------
  1000 genomes - High coverage - Trios
  1000 genomes - High coverage - Trios - CEU
  1000 genomes - High coverage - Trios - YRI
  1000 genomes - High coverage exons
  1000 genomes - High coverage exons - CEU
  1000 genomes - High coverage exons - CHB
  1000 genomes - High coverage exons - CHD
  1000 genomes - High coverage exons - JPT
  1000 genomes - High coverage exons - LWK
  1000 genomes - High coverage exons - TSI
  1000 genomes - High coverage exons - YRI
  1000 genomes - Low coverage
  1000 genomes - Low coverage - CEU
  1000 genomes - Low coverage - CHB+JPT
  1000 genomes - Low coverage - YRI


However i have never had any luck trying to retrieve variations in this
way with the api due to the amount of time it takes (the api masks lots
of joins on the database) - my code has always timed out mid-run. Rather
than wait I have always looked for alternative sources of the data. But
i have posted the code below. Perhaps you or the ensembl team can spot a
better way.

I have tried recently to get all watson variations and all human omim
variations and both timed out. I tried 2 different approaches - both of
which are below

I don't have a local copy of the human variation database - it might be
faster for you if you have local copy.


==============
common code to both approaches
===============

my $species='human';
my $reg = 'Bio::EnsEMBL::Registry';
$reg->load_registry_from_db(-host =>  'ensembldb.ensembl.org',-user =>
'anonymous');

my $vfa = $reg->get_adaptor($species, 'variation', 'variationfeature');
my $slice_adaptor = $reg->get_adaptor($species, 'core', 'slice');

my $transcript_adaptor =$reg->get_adaptor($species, 'core','Transcript');
my $gene_adaptor =$reg->get_adaptor($species, 'core','Gene');

my $variation_set_adaptor= $reg->get_adaptor('human', 'variation',
'variationset');
my $omim_set = $variation_set_adaptor->fetch_by_name("OMIM");



=========================
Approach 1
Get variation features by slice
=========================


my @unsorted_slices = @{ $slice_adaptor->fetch_all('chromosome', undef,
0, 1) };
my @sorted_slices = sort by_num_then_letter @unsorted_slices;

#Base pair overlap between returned slices
my $overlap = 0;
## Maximum size of returned slices
my $max_size = 10000;
## Break chromosomal slices into smaller 100k component slices
my @sub_slices = @{split_Slices( \@sorted_slices, $max_size, $overlap )  };
#
foreach my $slice (@sub_slices) {


         my @vfs =@{$omim_set->get_all_VariationFeatures_by_Slice($slice)};
         my @sorted_vfs = sort { $a->start()<=>  $b->start() } @vfs;


         foreach my $vf (@sorted_vfs){

        }
     }
}


===========================
Approach 2
Get all variations for variation set
===========================


my @variations = @{$omim_set->get_all_Variations()};
foreach my $variation (@variations) {

     my @vfs = @{$variation->get_all_VariationFeatures()};
     foreach my $vf (@vfs) {

     }
}

HTH
andrea


On 02/03/2011 14:08, cj5 at sanger.ac.uk wrote:
>  Hi,
>  Is it possible using the variations API to get a list of SNPS which have
>  been submitted from the 1000 Genomes project?
>
>  I have a vague idea that it should be possible to retrieve such a list
>  using the SS (submission) ID and/or the validation status, however I am
>  unsure of the details and what version of the API should be used.
>
>  The latest 100 genomes pilot release (2010_07) would be great, but any
>  earlier release would also be useful.
>
>  Thanks
>  Chris
>
>
>  _______________________________________________
>  Dev mailing list
>  Dev at ensembl.org
>  http://lists.ensembl.org/mailman/listinfo/dev






More information about the Dev mailing list