[ensembl-dev] getting gene exons and transcripts that overlap only the original slice
Pablo Marin-Garcia
pg4 at sanger.ac.uk
Wed Jan 12 15:58:04 GMT 2011
On Wed, 12 Jan 2011, Andrea Edwards wrote:
> Hello
>
> When i was looking at just exons I used to use exactly the same approach as
> Alison. Now i want to annotate my snps to store their relationships to the
> exons / genes / transcripts they affect I have decided to approach the
> problem from the other side as it were.
>
> Pablo, the idea about flattening the data once per release is brilliant.
But This should do it in a very polite way. In my case when using
ensembldb.ensembl.org for extraxting whole genome data once, I sleep(1) between
genes and sleep(600) between chromosomes, so it takes 10 hours or so (only in
the waiting). I don't know if nowadays is necessary to be so careful because the
current ensembl servers seems to be powerful, but better be safe than sorry.
> shall defininitely adopt that approach in the long term. Would you be willing
> to post your script to the group? I'm glad I asked now.
I would try to find time next week to upload it to github. If you send me
a personal reminder next week I will tell you where to find it.
> I bet flattening the
> data takes hours off the run time?
Well, genome wide approaches are going to take long time unless you are able to
parallelize. For single user/single multicore computer bioinformatics, still is
safe to use mysql with 26 concurrent scripts (one per autosome, X, XY, Y and Mt)
but this would depend on how powerful is your machine (you can also split
tables per chromosome). Remember that large parallelization against the public
server is not permitted. In order to speed up things, one way to go is to use
parallelization and local copies of the data in mysql or, better, memory hashes
from flat files. If you can not parallelize at all and you computer is not
powerful you will not see much difference, I would say, but YMMV..
-Pablo
------------------------
Pablo Marin-Garcia
Team: EGA (vertebrate genomics)
European Bioinformatics Institute.
Cambrige(UK)
More information about the Dev
mailing list