[ensembl-dev] compara blastz coordinates
Javier Herrero
jherrero at ebi.ac.uk
Wed Apr 11 19:38:55 BST 2012
Hi Elena
On 11/04/12 18:16, Elena Grassi wrote:
> Thank you for your exaustive reply.
>
> On Wed, Apr 11, 2012 at 12:56 PM, Javier Herrero<jherrero at ebi.ac.uk> wrote:
>> The coordinates are given with respect to the end of the chromosome when the
>> sequence is in the reverse strand.
> If with "are given" you refer to the hypothetical output of the script
> that I linked I've understood, while the data stored in the mysql
> server and shown on the genome browser are always referring to the
> start of the chromosome "a la entrez" if I've gotten this right.
That is correct. The start and end positions are independent from the
strand. In other words, we refer to a particular locus and then the
strand value is used when we want to specify one particular strand. This
way we can also refer to features that don't have any specific strand
like assembly gaps, tandem repeats, CpG islands, etc.
> For example:
> hsapiens versus ggallus (compara57)
>
> homo_sapiens:22> chromosome:GRCh37:22:16114957:16115029:1
> gallus_gallus:Z> chromosome:WASHUC2:Z:49606588:49606660:-1
>
> homo_sapiens:22
> GCCTGGTAGTAAAGTTGCCCTATTCCACTTCGTTTCTCTTCCTACATTTCTAAAGAGAAGAGACACAAATAAA
> gallus_gallus:Z
> GCTTGTCAGCAATATTCCTCTATTCAATCACATCTCGCTCTTTACATTTCTAAAAAGGCGAGATAAAAATAGA
>
> Blatting the latter sequence against the ggallus genome brings me here:
> ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO
> STRAND START END SPAN
> ---------------------------------------------------------------------------------------------------
> browser details YourSeq 73 1 73 73 100.0% Z
> - 49606588 49606660 73
>
> Sorry for the confusion, I wasn't referring to the output of that
> script (which as is does not run) but to the database data as is.
Indeed, we store the strand for each of the sequences. Note that for
pairwise alignments there are only two solutions: either the two
sequences are on the same strand or not. Then using the Perl API to
retrieve these alignments will re-orient them depending on what was your
query sequence (human or chicken in this case). We store the strand for
each sequence because we use the same object model for pairwise and
multiple alignments and we need more than one strand for these.
Oh, it is too bad that the script doesn't work, although it is
understandable as I don't know of anyone who has used or tested that
script in the last 6 years. We should probably remove it.
Kind regards
Javier
> Thanks,
> E.
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
--
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
More information about the Dev
mailing list