[ensembl-dev] VEP Interpro ID & description

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Thu Jun 27 12:30:37 BST 2013


Hello,

In case someone is using my Interpro plugin for VEP be aware that 
results are *NOT CORRECT.*
I'm getting Intepro reports for intron_variants which makes no sense.

ie:

AKT1 chr14 105238820 G C ENSG00000142208 
v-akt_murine_thymoma_viral_oncogene_homolog_1 
ENST00000407796.2:c.1173-31C>G 11/13 - - intron_variant 0.0298 0.0041 
0.02 0 0.07 *IPR001849 Pleckstrin_homology *- - rs61761201 Transcript 
ENST00000407796 NM_001014431.1 CCDS9994.1 - 1.790

Plugin code is located here: 
https://github.com/guillermomarco/vep_plugins_71/blob/master/Interpro.pm
I would really appreciate if someone could help me to fix this.

Best regards,
Guillermo.

On 06/10/2013 01:02 PM, Guillermo Marco Puche wrote:
> Heelo Laurent,
>
> I'll try and report back.
>
> Thank you very much.
>
> Best regards,
> Guillermo.
>
> On 06/10/2013 11:47 AM, Laurent Gil wrote:
>> Hi Guillermo,
>>
>> I think Sarah was alking about the line 115 of your plugin:
>>
>> elsif  ($interpro_data[0]  =~  /$interpro_ac/  || $interpro_data[1]  
>> =~  /$idesc/)  {
>>
>> Where you need to replace the code "$interpro_data[1] =~/$idesc/" by 
>> "$interpro_data[1] =~ /\Q$idesc\E/"
>>
>>
>> Cheers,
>>
>> Laurent
>>
>> On 08/06/2013 09:06, guillermo.marco at sistemasgenomicos.com wrote:
>>> I've been trying that but I don't know exactly where to use =~ 
>>> /\Q$result\E.
>>> I'm confused.
>>>
>>> I suppose when I save var $idesc
>>>
>>>> Hi Guillermo,
>>>>
>>>> If you paste the string from the error message into the interpro 
>>>> search
>>>> you
>>>> will get the result '(+) RNA virus helicase core domain 
>>>> (IPR027351)'. The
>>>> '+' in the string will be interpreted as a special character unless 
>>>> you
>>>> escape it.
>>>>
>>>> Try something like =~ /\Q$result\E/
>>>> Best wishes,
>>>>
>>>> Sarah
>>>> On Fri, Jun 7, 2013 at 7:23 AM, Guillermo Marco Puche <
>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>
>>>>>   Hello,
>>>>>
>>>>> Using my InterPro plugin I got this error i've never seen before:
>>>>>
>>>>> Plugin 'Interpro' went wrong: Quantifier follows nothing in regex;
>>>>> marked
>>>>> by <-- HERE in m/(+ <-- HERE )RNA_virus_helicase_core_dom/ at
>>>>> ./vep_config/Plugins/Interpro.pm line 112.
>>>>>
>>>>>
>>>>> Here's Interpro plugin code:
>>>>> https://github.com/guillermomarco/vep_plugins_71/blob/master/Interpro.pm 
>>>>>
>>>>>
>>>>> This seems wrong coding error with regex but there's no regex on that
>>>>> line:
>>>>>
>>>>> if (!$interpro_data[0] && !$interpro_data[1])
>>>>>
>>>>> Guille.
>>>>>
>>>>>
>>>>> On 05/17/2013 10:17 AM, Guillermo Marco Puche wrote:
>>>>>
>>>>>   Hello Will,
>>>>>
>>>>> That's seem very logic. But even if I advertise on my plugin I would
>>>>> like
>>>>> to hear the opinion from other devs.
>>>>> I don't want people to use a plugin that isn't working properly or
>>>>> giving
>>>>> wrong information.
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Best regards,
>>>>> Guillermo.
>>>>>
>>>>> On 05/17/2013 10:10 AM, Will McLaren wrote:
>>>>>
>>>>> Hi Guillermo,
>>>>>
>>>>> We're currently working on getting some official guidelines for 
>>>>> external
>>>>> submissions of code in place.
>>>>>
>>>>> Until that happens, we can't put plugins in the Ensembl VEP_plugins
>>>>> repo.
>>>>> However, feel free to advertise your plugins on your own GitHub, 
>>>>> as you
>>>>> have done here!
>>>>>
>>>>> Cheers
>>>>>
>>>>> Will
>>>>>
>>>>>
>>>>> On 17 May 2013 08:22, Guillermo Marco Puche <
>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>
>>>>>>   Still waiting for someone answer before I can push it into VEP 
>>>>>> repo..
>>>>>>
>>>>>>
>>>>>> On 05/15/2013 08:43 AM, Guillermo Marco Puche wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Fixed a bug in script about empty Interpro_ac and idesc.
>>>>>> Git code updated:
>>>>>> https://github.com/guillermomarco/vep_plugins_71/blob/master/Interpro.pm 
>>>>>>
>>>>>>
>>>>>> If someone give me the OK I'll push it to official VEP-plugin
>>>>>> repository.
>>>>>>
>>>>>> Best regards,
>>>>>> Guillermo.
>>>>>>
>>>>>> On 05/14/2013 06:15 PM, Guillermo Marco Puche wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I've come to this plugin:
>>>>>> https://github.com/guillermomarco/vep_plugins_71/blob/master/Interpro.pm 
>>>>>>
>>>>>>
>>>>>> If you could check code and test it would be awesome !
>>>>>>
>>>>>> I'm not 100% sure it's working perfectly.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Best regards,
>>>>>> Guillermo.
>>>>>>
>>>>>> On 05/14/2013 03:38 PM, Will McLaren wrote:
>>>>>>
>>>>>>   Hello,
>>>>>>
>>>>>> Ensembl contains domains mapped from multiple sources - often these
>>>>>> will
>>>>>> be the "same" domain with slightly different coordinates. Here 
>>>>>> you can
>>>>>> see
>>>>>> this on a typical transcript:
>>>>>>
>>>>>>
>>>>>> http://www.ensembl.org/Homo_sapiens/Transcript/ProteinSummary?db=core;g=ENSG00000128573;r=7:114055052-114333823;t=ENST00000403559 
>>>>>>
>>>>>>
>>>>>> You should also check the overlap of your variant with the 
>>>>>> domains, as
>>>>>> you say using translation_start/end and $pf->start/end.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Will
>>>>>>
>>>>>>
>>>>>> On 14 May 2013 14:16, Guillermo Marco Puche <
>>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>>
>>>>>>>   Hello,
>>>>>>>
>>>>>>> This makes a lot more sense.
>>>>>>>
>>>>>>> There's something I still don't understand. For each variation i'm
>>>>>>> getting a lot of Interpro_ac and idesc.
>>>>>>>
>>>>>>> Modified the code to debug but it stills not working since code is
>>>>>>> trying to print undefined values.
>>>>>>>
>>>>>>> Should I compare and verify $tv->translation_start and
>>>>>>> $tv->translation_end with $pf->start and $pf->end to obtain the
>>>>>>> correct
>>>>>>> Interpro_ac and idesc?
>>>>>>>
>>>>>>> Thank you,
>>>>>>>
>>>>>>> Best regards.
>>>>>>> Guillermo.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/14/2013 02:16 PM, Will McLaren wrote:
>>>>>>>
>>>>>>> $translation->get_all_ProteinFeatures();
>>>>>>>
>>>>>>> returns an reference to an array of ProteinFeature objects. You'll
>>>>>>> need
>>>>>>> to iterate over them something like:
>>>>>>>
>>>>>>> foreach my $pf(@{$translation->get_all_ProteinFeatures}) {
>>>>>>>    $interpro{"INTERPRO_AC"} = $pf-> interpro_ac;
>>>>>>>    etc...
>>>>>>> }
>>>>>>>
>>>>>>> There is a mistake in the method docs that says it returns a single
>>>>>>> object, when actually it returns an arrayref.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Will
>>>>>>>
>>>>>>>
>>>>>>> On 14 May 2013 12:44, Guillermo Marco Puche <
>>>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>>>
>>>>>>>>   Hello,
>>>>>>>>
>>>>>>>> Here's Interpro plugin code:
>>>>>>>> https://github.com/guillermomarco/vep_plugins_71/blob/master/Interpro.pm 
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm getting unblessed reference error when trying to extract
>>>>>>>> "interpro_ac" and "idesc" from my $pfeature object.
>>>>>>>>
>>>>>>>> Here's a data dumper extract from $pfeature:
>>>>>>>>
>>>>>>>> 'Bio::EnsEMBL::ProteinFeature' ),
>>>>>>>>            bless( {
>>>>>>>>                     'p_value' => '1.6e-42',
>>>>>>>>                     'coverage' => undef,
>>>>>>>>                     'percent_id' => '0',
>>>>>>>>                     'adaptor' => $VAR1->[0]{'adaptor'},
>>>>>>>>                     'hstrand' => undef,
>>>>>>>>                     'idesc' => 'DH-domain',
>>>>>>>>                     'hdescription' => undef,
>>>>>>>>                     'slice' => undef,
>>>>>>>>                     'dbname' => undef,
>>>>>>>>                     'hspecies' => undef,
>>>>>>>>                     'dbID' => '6415086',
>>>>>>>>                     'strand' => 0,
>>>>>>>>                     'seqname' => '936060',
>>>>>>>>                     'translation_id' => '',
>>>>>>>>                     'external_db_id' => undef,
>>>>>>>>                     'db_display_name' => undef,
>>>>>>>>                     'hend' => 0,
>>>>>>>>                     'hcoverage' => undef,
>>>>>>>>                     'score' => '0',
>>>>>>>>                     'species' => undef,
>>>>>>>>                     'interpro_ac' => 'IPR000219',
>>>>>>>>                     'end' => 985,
>>>>>>>>                     'analysis' =>
>>>>>>>> $VAR1->[0]{'analysis'}{'adaptor'}{'_logic_name_cache'}{'superfamily'}, 
>>>>>>>>
>>>>>>>>                     'hseqname' => 'SSF48065',
>>>>>>>>                     'hstart' => 0,
>>>>>>>>                     'extra_data' => undef,
>>>>>>>>                     'group_id' => undef,
>>>>>>>>                     'level_id' => undef,
>>>>>>>>                     'start' => 803
>>>>>>>>                   },
>>>>>>>>
>>>>>>>> ERROR: Forked process failed
>>>>>>>> Plugin 'Interpro' went wrong: Can't call method "interpro_ac" on
>>>>>>>> unblessed reference at
>>>>>>>> /home/likewise-open/SGNET/gmarco/.vep/Plugins/Interpro.pm line 74
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/13/2013 01:49 PM, Guillermo Marco Puche wrote:
>>>>>>>>
>>>>>>>> Ok I'm gonna give it a shot.
>>>>>>>> I installed latest API downloaded from Ensembl website on friday
>>>>>>>> (10/05/2013) and I'm using a local Ensembl 71 database for VEP, no
>>>>>>>> cache.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Guillermo
>>>>>>>>
>>>>>>>> On 05/13/2013 01:45 PM, Will McLaren wrote:
>>>>>>>>
>>>>>>>> There was a bug in --domains when using the cache that has been
>>>>>>>> recently fixed.
>>>>>>>>
>>>>>>>> Try updating your API and see if that's any better.
>>>>>>>>
>>>>>>>> Will
>>>>>>>>
>>>>>>>>
>>>>>>>> On 13 May 2013 12:38, Guillermo Marco Puche <
>>>>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>>>>
>>>>>>>>>   Hello Will,
>>>>>>>>>
>>>>>>>>> Yes I'm currently running VEP with --domains flag. It always 
>>>>>>>>> shown
>>>>>>>>> empty for the testings for different samples I've done until now.
>>>>>>>>> So domains flag is supposed to display the Interpro_ac for
>>>>>>>>> overlapping
>>>>>>>>> protein domains?
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Guillermo.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/13/2013 01:34 PM, Will McLaren wrote:
>>>>>>>>>
>>>>>>>>> Hi Guillermo,
>>>>>>>>>
>>>>>>>>> Have you tried the --domains flag in the VEP?
>>>>>>>>>
>>>>>>>>> Perhaps this is not enough information for you but it does 
>>>>>>>>> provide
>>>>>>>>> the
>>>>>>>>> display label of overlapping protein domains.
>>>>>>>>>
>>>>>>>>> The protein object is referred to as a translation object in the
>>>>>>>>> Ensembl API; you can retrieve it from the transcript via
>>>>>>>>> $transcript->translation.
>>>>>>>>>
>>>>>>>>> See
>>>>>>>>> http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Translation.html 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> Will
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 13 May 2013 12:15, Guillermo Marco Puche <
>>>>>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>>>>>
>>>>>>>>>>   Hello,
>>>>>>>>>>
>>>>>>>>>> So it seems nobody has done this yet. I'll do it then :)
>>>>>>>>>>
>>>>>>>>>> Does VEP support any kind of """Proteinfeature"""? Checking the
>>>>>>>>>> other
>>>>>>>>>> scripts it seems I must be using Transcript feature_type.
>>>>>>>>>>
>>>>>>>>>> Correct me if I'm wrong please, I'm a bit confused since
>>>>>>>>>> interpro_ac
>>>>>>>>>> is part from Core ProteinFeature.
>>>>>>>>>> (EnsEMBL::ProteinFeature::interpro_ac)
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Guillermo.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/09/2013 04:16 PM, Guillermo Marco Puche wrote:
>>>>>>>>>>
>>>>>>>>>>   Hello,
>>>>>>>>>>
>>>>>>>>>> Does anyone coded a plugin to obtain InterPro ID and description
>>>>>>>>>> for
>>>>>>>>>> VEP?
>>>>>>>>>> I've looked in VEP repo without luck.
>>>>>>>>>>
>>>>>>>>>> I want to know before start coding.
>>>>>>>>>>
>>>>>>>>>> Thank you !
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Guillermo.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Dev mailing list Dev at ensembl.org
>>>>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>>>> Ensembl Blog: http://www.ensembl.info
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130627/f6d2f168/attachment.html>


More information about the Dev mailing list