[ensembl-dev] VEP annotation of ENSP

Will McLaren wm2 at ebi.ac.uk
Wed Nov 16 14:14:42 GMT 2011


Hi Duarte,

This works for me - if I use the following input:

21      40190405        40190405        G/A     1

I get some nice protein stuff:

#Uploaded_variation     Location        Allele  Gene    Feature
Feature_type    Consequence     cDNA_position   CDS_position
Protein_position        Amino_acids     Codons  Existing_variation
 Extra
21_40190405_G/A 21:40190405     A       ENSG00000157557
ENST00000456966 Transcript      NON_SYNONYMOUS_CODING   807     646
 216     G/S     Ggc/Agc -
ENSP=ENSP00000354194:pfam,-,-,88,169:superfamily,-,-,63,173:smart,IPR003118,SAM_PNT,87,170
21_40190405_G/A 21:40190405     A       ENSG00000157557
ENST00000360214 Transcript      NON_SYNONYMOUS_CODING   1106    646
 216     G/S     Ggc/Agc -
ENSP=ENSP00000354194:pfam,-,-,88,169:superfamily,-,-,63,173:smart,IPR003118,SAM_PNT,87,170
21_40190405_G/A 21:40190405     A       ENSG00000160183
ENST00000553129 Transcript      NMD_TRANSCRIPT,INTRONIC -       -
       -       -       -       -
ENSP=ENSP00000354194:pfam,-,-,88,169:superfamily,-,-,63,173:smart,IPR003118,SAM_PNT,87,170
21_40190405_G/A 21:40190405     A       ENSG00000157557
ENST00000432278 Transcript      DOWNSTREAM      -       -       -
       -       -       -
ENSP=ENSP00000354194:pfam,-,-,88,169:superfamily,-,-,63,173:smart,IPR003118,SAM_PNT,87,170
21_40190405_G/A 21:40190405     A       ENSG00000157557
ENST00000360938 Transcript      NON_SYNONYMOUS_CODING   936     646
 216     G/S     Ggc/Agc -
ENSP=ENSP00000354194:pfam,-,-,88,169:superfamily,-,-,63,173:smart,IPR003118,SAM_PNT,87,170

Are you remembering to specify --protein on the command line?

Are you pointing to the public Ensembl DB, or to a local DB, or the
cache (the data isn't currently in the cache, and the script currently
isn't configured to look up annotations that aren't in the cache in
the DB)?

This looks like a good development for the VEP, however, and we were
considering adding protein domain information in a future release of
the script.

Cheers

Will McLaren
Ensembl Variation

On 16 November 2011 11:39, Duarte Molha <Duarte.Molha at ogt.co.uk> wrote:
> Hi there.
>
>
>
> I have been trying to add a bit more functionality to the VEP script and
> have created a script that adds annotation to the protein annotation...
>
>
>
> I have changed the VEP script at lines 754-757 :
>
>
>
>     # protein ID
>
>     if(defined $config->{protein} && $t->translation) {
>
>                  $line->{Extra}->{ENSP} = $t->translation->stable_id;
>
>     }
>
>
>
> To:
>
>
>
>                 # protein ID
>
>                 if(defined $config->{protein} &&
> $tv->transcript->translation) {
>
>                                 my $protein_feature_analysis =
> get_protein_domains($tv);
>
>                                 if ($protein_feature_analysis){
>
>                                                 chomp
> $protein_feature_analysis;
>
>                                                 $line->{Extra}->{ENSP} =
> $protein_feature_analysis;
>
>                                 }
>
>                 }
>
>
>
> And included a sub to get a bit more detail about protein domains it
> overlaps:
>
>
>
> sub get_protein_domains{
>
>                 my $tv = shift;
>
>
>
>                 ###################################  protein ID
> ######################################################
>
>                 my $translation_id =
> $tv->transcript->translation->stable_id;
>
>                 my %protein_features =();
>
>                 my $pfeatures =
> $tv->transcript->translation->get_all_ProteinFeatures();
>
>                 foreach my $pfeature (@{$pfeatures}){
>
>                                 my $logic_name =
> $pfeature->analysis()->logic_name();
>
>                                 if ($pfeature->start >=
> $tv->transcript->translation->start && $pfeature->end <=
> $tv->transcript->translation->end){
>
>
> $protein_features{$logic_name}{ENSP}                = $translation_id || "-"
> ;
>
>
> $protein_features{$logic_name}{interpro_ac}   = $pfeature->interpro_ac() ||
> "-";
>
>
> $protein_features{$logic_name}{idesc}                =
> $pfeature->idesc()||"-";
>
>
> $protein_features{$logic_name}{start}                 = $pfeature->start;
>
>
> $protein_features{$logic_name}{end}                  = $pfeature->end;
>
>                                 }
>
>                 }
>
>                 my $protein_feature_analysis = $translation_id;
>
>
>
>                 for my $analysis ( keys %protein_features ){
>
>                                 $protein_feature_analysis .= ":".$analysis;
>
>                                 $protein_feature_analysis .=
> ",".$protein_features{$analysis}{interpro_ac};
>
>                                 $protein_feature_analysis .=
> ",".$protein_features{$analysis}{idesc};
>
>                                 $protein_feature_analysis .=
> ",".$protein_features{$analysis}{start};
>
>                                 $protein_feature_analysis .=
> ",".$protein_features{$analysis}{end};
>
>                 }
>
>
> #######################################################################################################
>
>
>
>                 return $protein_feature_analysis;
>
> }
>
>
>
> Unfortunately it seems that this annotation is only working for the
> mitochondrial chromosome.
>
> Could you point me to where I might be doing something wrong?
>
>
>
> Best regards
>
>
>
> Duarte Molha
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>




More information about the Dev mailing list