[ensembl-dev] Effects predictor version 2

Andrea Edwards edwardsa at cs.man.ac.uk
Tue May 17 15:37:27 BST 2011


Hello

Whilst looking into Stuart's question I looked at the variants on 
chromosome 1 out of curiosity and found that most of them don't have 
sift/polyphen data.
Is this correct or have i made a mistake in my understanding of the schema

variants on chr1 (seq_region_id = 27511)
============================

mysql> select count(*) from transcript_variation tv inner join
homo_sapiens_core_62_37g.transcript_stable_id st on st.stable_id =
tv.feature_stable_id inner join homo_sapiens_core_62_37g.transcript t on
t.transcript_id = st.transcript_id where t.seq_region_id = 27511;
+----------+
| count(*) |
+----------+
| 9633745 |
+----------+
1 row in set (3.34 sec)


variants on chr1 without sift and polyphen
===========================

mysql> select count(*) from transcript_variation tv inner join
homo_sapiens_core_62_37g.transcript_stable_id st on st.stable_id =
tv.feature_stable_id inner join homo_sapiens_core_62_37g.transcript t on
t.transcript_id = st.transcript_id where t.seq_region_id = 27511 and
tv.sift_prediction is null and tv.polyphen_prediction is null;
+----------+
| count(*) |
+----------+
| 9562313 |
+----------+
1 row in set (11.22 sec)


variants on chr1 with sift and polyphen
=========================

mysql> select count(*) from transcript_variation tv inner join
homo_sapiens_core_62_37g.transcript_stable_id st on st.stable_id =
tv.feature_stable_id inner join homo_sapiens_core_62_37g.transcript t on
t.transcript_id = st.transcript_id where t.seq_region_id = 27511 and
tv.sift_prediction is not null and tv.polyphen_prediction is not null;
+----------+
| count(*) |
+----------+
| 67919 |
+----------+
1 row in set (11.19 sec)



thanks


On 17/05/11 13:59, Stuart Meacham wrote:
> Hello,
>
> Thanks for the reply.
>
> On 17/05/11 13:35, Will McLaren wrote:
>
>>
>> This is strange - are you sure you are checking out the branch and not
>> the head of the API? You should be doing something like:
>>
>> cvs checkout -r branch-ensembl-62 ensembl
>> cvs checkout -r branch-ensembl-62 ensembl-variation
>
> Actually I just used the links from the site here:
>
> http://www.ensembl.org/info/docs/api/api_installation.html
>
> the link(s) resolve to things like:
>
> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl.tar.gz?root=ensembl&only_with_tag=branch-ensembl-62&view=tar 
>
>
>>
>>>
>>> The script silently over-writes an existing output file of the same 
>>> name,
>>> this seems a bit brutal, perhaps the default should be to fail if 
>>> the file
>>> exists.
>>
>> I think this is pretty standard behaviour for command-line programs. I
>> could change it to only run if in an output file name is specified
>> perhaps?
>
> Yes, probably it's standard behaviour. I was just imagining 
> accidentally overwriting a file the script had spent 24 hours creating 
> . . .
>
>>
>> That's also odd - any variants classified as non-synonymous coding
>> should have a "SIFT=*" entry in the final column. Can you try the
>> attached file as input on your system?
>>
>
> No problem, the command I used was:
>
> perl ./variant_effect_predictor_2.pl -r reg.pl -i ./test.txt -w -b 
> 100000 --sift=p --polyphen=p --failed=0 -terms=so
>
> and the output (no errors but also no predictions) is attached.
>
> Cheers
>
> Stuart
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110517/5f0c8bc8/attachment.html>


More information about the Dev mailing list