[ensembl-dev] 1000 Genomes SNPS

Pontus Larsson Pontus.Larsson at ebi.ac.uk
Wed Mar 2 17:43:27 GMT 2011


Hi Andrea,

The Iterator code is indeed a new addition that we introduced in release 61
in order to handle this kind of cases that will otherwise typically break
the API. So far we have only implemented them in a few places but we will
keep adding these where we think it is motivated and desired.

I can't say exactly which of your approaches is best and it may depend on
your downstream application but also the size of the set. If, for example,
you need the mapping of the variation and it is logical for you to retrieve
them according to the position along the chromosome, your first approach
might be best. That is, unless you expect the set to be relatively small or
unevenly distributed across the genome (this would not typically be the case
for the Watson or 1000 genomes data). Currently, you can only use the
iterator for the second approach.

/Pontus


2011/3/2 Andrea Edwards <edwardsa at cs.man.ac.uk>

>  HI Pontus
>
> I did in fact run out of memory every time! I wasn't aware of any sort of
> iterator on the api. Are they new as i don't remember them when i read teh
> tutorials late last year?
>
> Do you have any comments about which of my 2 approaches is best? I presume
> you can use an iterator on the second approach as that is basically the code
> you supplied (minus an iterator). Could you use an iterator on the first
> approach?
>
> thanks
>
>
> On 02/03/2011 15:22, Pontus Larsson wrote:
>
> Hi Chris,
>
>  Andrea is right, we have grouped these variations together into various
> variation sets. For example, you can get all variations belonging to the
> different pilots from the sets '1000 genomes - Low coverage', '1000 genomes
> - High coverage - Trios' and '1000 genomes - High coverage exons' for pilot
> 1,2 and 3, respectively. You'll need to use the VariationSet and
> VariationSetAdaptor modules for this. It is not possible to retrieve the
> variations conditional on submission date.
>
>  As Andrea points out, if you call the 'get_all_Variations' method on a
> VariationSet object, the API will create all variation objects and return
> them. For large sets like these, this can easily cause you to run out of
> memory but you can use the 'get_Variation_Iterator' method to get an
> Iterator object and iterate over the variations instead.
>
>  /Pontus
>
>
>
> 2011/3/2 <cj5 at sanger.ac.uk>
>
>> Hi,
>> Is it possible using the variations API to get a list of SNPS which have
>> been submitted from the 1000 Genomes project?
>>
>> I have a vague idea that it should be possible to retrieve such a list
>> using the SS (submission) ID and/or the validation status, however I am
>> unsure of the details and what version of the API should be used.
>>
>> The latest 100 genomes pilot release (2010_07) would be great, but any
>> earlier release would also be useful.
>>
>> Thanks
>> Chris
>>
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>>
>
>
> _______________________________________________
> Dev mailing listDev at ensembl.orghttp://lists.ensembl.org/mailman/listinfo/dev
>
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110302/5291ee6d/attachment.html>


More information about the Dev mailing list