[ensembl-dev] Effects predictor version 2

Andrea Edwards edwardsa at cs.man.ac.uk
Tue May 17 16:01:32 BST 2011


Hi, I appreciate you will only have sift and polyphen for non synonymous 
snps in exons. I was just wonderng if my query was correct or whether i 
needed to link to sift_prediction/polyphen_prediction/protein_position 
tables to get more accurate data. I presumed the predictions from these 
tables was simply copied into the polyphen and sift fields of the 
transcript_variant table and that my query is ok.

On 17/05/11 15:42, Graham Ritchie wrote:
> Hi Andrea,
>
> We only have sift and polyphen predictions for variants which are predicted to result in single amino acid substitutions.
>
> Cheers,
>
> Graham
>
> On 17 May 2011, at 15:37, Andrea Edwards wrote:
>
>> Hello
>>
>> Whilst looking into Stuart's question I looked at the variants on chromosome 1 out of curiosity and found that most of them don't have sift/polyphen data.
>> Is this correct or have i made a mistake in my understanding of the schema
>>
>> variants on chr1 (seq_region_id = 27511)
>> ============================
>>
>> mysql>  select count(*) from transcript_variation tv inner join
>> homo_sapiens_core_62_37g.transcript_stable_id st on st.stable_id =
>> tv.feature_stable_id inner join homo_sapiens_core_62_37g.transcript t on
>> t.transcript_id = st.transcript_id where t.seq_region_id = 27511;
>> +----------+
>> | count(*) |
>> +----------+
>> | 9633745 |
>> +----------+
>> 1 row in set (3.34 sec)
>>
>>
>> variants on chr1 without sift and polyphen
>> ===========================
>>
>> mysql>  select count(*) from transcript_variation tv inner join
>> homo_sapiens_core_62_37g.transcript_stable_id st on st.stable_id =
>> tv.feature_stable_id inner join homo_sapiens_core_62_37g.transcript t on
>> t.transcript_id = st.transcript_id where t.seq_region_id = 27511 and
>> tv.sift_prediction is null and tv.polyphen_prediction is null;
>> +----------+
>> | count(*) |
>> +----------+
>> | 9562313 |
>> +----------+
>> 1 row in set (11.22 sec)
>>
>>
>> variants on chr1 with sift and polyphen
>> =========================
>>
>> mysql>  select count(*) from transcript_variation tv inner join
>> homo_sapiens_core_62_37g.transcript_stable_id st on st.stable_id =
>> tv.feature_stable_id inner join homo_sapiens_core_62_37g.transcript t on
>> t.transcript_id = st.transcript_id where t.seq_region_id = 27511 and
>> tv.sift_prediction is not null and tv.polyphen_prediction is not null;
>> +----------+
>> | count(*) |
>> +----------+
>> | 67919 |
>> +----------+
>> 1 row in set (11.19 sec)
>>
>>
>>
>> thanks
>>
>>
>> On 17/05/11 13:59, Stuart Meacham wrote:
>>> Hello,
>>>
>>> Thanks for the reply.
>>>
>>> On 17/05/11 13:35, Will McLaren wrote:
>>>
>>>> This is strange - are you sure you are checking out the branch and not
>>>> the head of the API? You should be doing something like:
>>>>
>>>> cvs checkout -r branch-ensembl-62 ensembl
>>>> cvs checkout -r branch-ensembl-62 ensembl-variation
>>> Actually I just used the links from the site here:
>>>
>>> http://www.ensembl.org/info/docs/api/api_installation.html
>>>
>>> the link(s) resolve to things like:
>>>
>>> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl.tar.gz?root=ensembl&only_with_tag=branch-ensembl-62&view=tar
>>>
>>>>> The script silently over-writes an existing output file of the same name,
>>>>> this seems a bit brutal, perhaps the default should be to fail if the file
>>>>> exists.
>>>> I think this is pretty standard behaviour for command-line programs. I
>>>> could change it to only run if in an output file name is specified
>>>> perhaps?
>>> Yes, probably it's standard behaviour. I was just imagining accidentally overwriting a file the script had spent 24 hours creating . . .
>>>
>>>> That's also odd - any variants classified as non-synonymous coding
>>>> should have a "SIFT=*" entry in the final column. Can you try the
>>>> attached file as input on your system?
>>>>
>>> No problem, the command I used was:
>>>
>>> perl ./variant_effect_predictor_2.pl -r reg.pl -i ./test.txt -w -b 100000 --sift=p --polyphen=p --failed=0 -terms=so
>>>
>>> and the output (no errors but also no predictions) is attached.
>>>
>>> Cheers
>>>
>>> Stuart
>>>
>>> _______________________________________________
>>> Dev mailing list
>>> Dev at ensembl.org
>>>
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>
>>> Ensembl Blog:
>>> http://www.ensembl.info/
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list