[ensembl-dev] question

Aijing Starr azhang at ncsu.edu
Fri Apr 1 04:09:00 BST 2011


I need to obtain the conservation scores (GERP scores) and was able to
follow the instruction on the FAQ page and download the emf files for homo
sapiens chr16 from Ensembl ftp site.  However, I am having trouble
understanding the data.  Here is the first 25 lines of the emf file.  And my
questons are:

1. What does the numbers on the first line mean?  I image that they have
something to do with the position, but couldn't figue out how.
2. What are the difference between SCORE aligned Watson reads and Venter
reads?  It seems that missing gets assigned a scoe 0, is this correct?
There are instances where the Watson seq is the same as Venter seq, yet they
have different scores, what is the reason?
3. The scores: the references that I've read, it seems that GERP scores have
decimal points, yet the scores listed here are all integers.  How are these
calculated?

Much thanks

SEQ human reference 16 60002 80290 1
SEQ human Watson WGS
SEQ human Venter WGS
SCORE aligned Watson reads
SCORE aligned Venter reads
DATA
A ~ A 0 1
A ~ A 0 1
C ~ C 0 1
C ~ C 0 1
C ~ C 0 1
T ~ T 0 1
A ~ A 0 1
A ~ A 0 1
C ~ C 0 1
C ~ C 0 1
C ~ C 0 1
T ~ T 0 1
A ~ A 0 1
A ~ A 0 1
C ~ C 0 1

-- 
Aijing Zoe Z. Starr
PhD Student
Department of Statistics
North Carolina State Univeristy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110331/77c8ec6a/attachment.html>


More information about the Dev mailing list