[ensembl-dev] Retrieve sequence coverage data for strain (resequencing results)
Jason Caravas
jacaravas at gmail.com
Tue Oct 9 20:48:09 BST 2012
Hello,
I'm attempting to build alignments of resequencing results from the 1000
genomes project. The end result I desire should closely resemble this
view from the genome browser
http://browser.1000genomes.org/Homo_sapiens/Location/SequenceAlignment?db=core;g=ENSG00000099984;gene=ENSG00000099984;r=22:24322339-24326106
I am using Perl API version 65 to match the database.
How do I obtain information on whether sequence data exists for a
particular strain? Using code like this
my $sa = $registry->get_adaptor("human", "core", "slice");
my $slice = $sa->fetch_by_region('chromosome', $chrom, $start, $end);
my $str_slice = $slice->get_by_strain("AK1");
my $indiv_seq = $str_slice -> seq ();
simply returns the reference sequence for the region even though the
online genome browser shows missing data for that particular
individual. It also doesn't complement variations on the -1 strand, so
I'm assuming the seq() method is just a bulk application of
SeqEdit::apply_edit() or AlleleFeature::apply_edit() methods to the
reference sequence rather than a function to actually retrieve strain
sequence data.
So my question is twofold. First, is going through the variations API
and adding allele features to the reference the correct approach for
reconstructing strain data? Or is there a pre-existing method to do
this that I am not aware of? I am relatively new to the API and the
presence of a view identical to my desired result in the online genome
browser suggests that this could be the case. Second, if this is the
correct approach and I have to remove missing sequence data from my
alignment myself, how do I determine where the missing data is for a
given strain?
Thanks in advance,
Jason Caravas
More information about the Dev
mailing list