[ensembl-dev] compara blastz coordinates

Javier Herrero jherrero at ebi.ac.uk
Wed Apr 11 19:38:55 BST 2012


Hi Elena

On 11/04/12 18:16, Elena Grassi wrote:
> Thank you for your exaustive reply.
>
> On Wed, Apr 11, 2012 at 12:56 PM, Javier Herrero<jherrero at ebi.ac.uk>  wrote:
>> The coordinates are given with respect to the end of the chromosome when the
>> sequence is in the reverse strand.
> If with "are given" you refer to the hypothetical output of the script
> that I linked I've understood, while the data stored in the mysql
> server and shown on the genome browser are always referring to the
> start of the chromosome "a la entrez" if I've gotten this right.

That is correct. The start and end positions are independent from the 
strand. In other words, we refer to a particular locus and then the 
strand value is used when we want to specify one particular strand. This 
way we can also refer to features that don't have any specific strand 
like assembly gaps, tandem repeats, CpG islands, etc.

> For example:
> hsapiens versus ggallus (compara57)
>
> homo_sapiens:22>   	chromosome:GRCh37:22:16114957:16115029:1
> gallus_gallus:Z>   	chromosome:WASHUC2:Z:49606588:49606660:-1
>
> homo_sapiens:22
> GCCTGGTAGTAAAGTTGCCCTATTCCACTTCGTTTCTCTTCCTACATTTCTAAAGAGAAGAGACACAAATAAA
> gallus_gallus:Z
> GCTTGTCAGCAATATTCCTCTATTCAATCACATCTCGCTCTTTACATTTCTAAAAAGGCGAGATAAAAATAGA
>
> Blatting the latter sequence against the ggallus genome brings me here:
>     ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO
> STRAND  START    END      SPAN
> ---------------------------------------------------------------------------------------------------
> browser details YourSeq           73     1    73    73 100.0%     Z
> -   49606588  49606660     73
>
> Sorry for the confusion, I wasn't referring to the output of that
> script (which as is does not run) but to the database data as is.

Indeed, we store the strand for each of the sequences. Note that for 
pairwise alignments there are only two solutions: either the two 
sequences are on the same strand or not. Then using the Perl API to 
retrieve these alignments will re-orient them depending on what was your 
query sequence (human or chicken in this case). We store the strand for 
each sequence because we use the same object model for pairwise and 
multiple alignments and we need more than one strand for these.

Oh, it is too bad that the script doesn't work, although it is 
understandable as I don't know of anyone who has used or tested that 
script in the last 6 years. We should probably remove it.

Kind regards

Javier

> Thanks,
> E.
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>

-- 
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK





More information about the Dev mailing list