[ensembl-dev] VEP annotation of ENSP
Will McLaren
wm2 at ebi.ac.uk
Wed Nov 16 14:14:42 GMT 2011
Hi Duarte,
This works for me - if I use the following input:
21 40190405 40190405 G/A 1
I get some nice protein stuff:
#Uploaded_variation Location Allele Gene Feature
Feature_type Consequence cDNA_position CDS_position
Protein_position Amino_acids Codons Existing_variation
Extra
21_40190405_G/A 21:40190405 A ENSG00000157557
ENST00000456966 Transcript NON_SYNONYMOUS_CODING 807 646
216 G/S Ggc/Agc -
ENSP=ENSP00000354194:pfam,-,-,88,169:superfamily,-,-,63,173:smart,IPR003118,SAM_PNT,87,170
21_40190405_G/A 21:40190405 A ENSG00000157557
ENST00000360214 Transcript NON_SYNONYMOUS_CODING 1106 646
216 G/S Ggc/Agc -
ENSP=ENSP00000354194:pfam,-,-,88,169:superfamily,-,-,63,173:smart,IPR003118,SAM_PNT,87,170
21_40190405_G/A 21:40190405 A ENSG00000160183
ENST00000553129 Transcript NMD_TRANSCRIPT,INTRONIC - -
- - - -
ENSP=ENSP00000354194:pfam,-,-,88,169:superfamily,-,-,63,173:smart,IPR003118,SAM_PNT,87,170
21_40190405_G/A 21:40190405 A ENSG00000157557
ENST00000432278 Transcript DOWNSTREAM - - -
- - -
ENSP=ENSP00000354194:pfam,-,-,88,169:superfamily,-,-,63,173:smart,IPR003118,SAM_PNT,87,170
21_40190405_G/A 21:40190405 A ENSG00000157557
ENST00000360938 Transcript NON_SYNONYMOUS_CODING 936 646
216 G/S Ggc/Agc -
ENSP=ENSP00000354194:pfam,-,-,88,169:superfamily,-,-,63,173:smart,IPR003118,SAM_PNT,87,170
Are you remembering to specify --protein on the command line?
Are you pointing to the public Ensembl DB, or to a local DB, or the
cache (the data isn't currently in the cache, and the script currently
isn't configured to look up annotations that aren't in the cache in
the DB)?
This looks like a good development for the VEP, however, and we were
considering adding protein domain information in a future release of
the script.
Cheers
Will McLaren
Ensembl Variation
On 16 November 2011 11:39, Duarte Molha <Duarte.Molha at ogt.co.uk> wrote:
> Hi there.
>
>
>
> I have been trying to add a bit more functionality to the VEP script and
> have created a script that adds annotation to the protein annotation...
>
>
>
> I have changed the VEP script at lines 754-757 :
>
>
>
> # protein ID
>
> if(defined $config->{protein} && $t->translation) {
>
> $line->{Extra}->{ENSP} = $t->translation->stable_id;
>
> }
>
>
>
> To:
>
>
>
> # protein ID
>
> if(defined $config->{protein} &&
> $tv->transcript->translation) {
>
> my $protein_feature_analysis =
> get_protein_domains($tv);
>
> if ($protein_feature_analysis){
>
> chomp
> $protein_feature_analysis;
>
> $line->{Extra}->{ENSP} =
> $protein_feature_analysis;
>
> }
>
> }
>
>
>
> And included a sub to get a bit more detail about protein domains it
> overlaps:
>
>
>
> sub get_protein_domains{
>
> my $tv = shift;
>
>
>
> ################################### protein ID
> ######################################################
>
> my $translation_id =
> $tv->transcript->translation->stable_id;
>
> my %protein_features =();
>
> my $pfeatures =
> $tv->transcript->translation->get_all_ProteinFeatures();
>
> foreach my $pfeature (@{$pfeatures}){
>
> my $logic_name =
> $pfeature->analysis()->logic_name();
>
> if ($pfeature->start >=
> $tv->transcript->translation->start && $pfeature->end <=
> $tv->transcript->translation->end){
>
>
> $protein_features{$logic_name}{ENSP} = $translation_id || "-"
> ;
>
>
> $protein_features{$logic_name}{interpro_ac} = $pfeature->interpro_ac() ||
> "-";
>
>
> $protein_features{$logic_name}{idesc} =
> $pfeature->idesc()||"-";
>
>
> $protein_features{$logic_name}{start} = $pfeature->start;
>
>
> $protein_features{$logic_name}{end} = $pfeature->end;
>
> }
>
> }
>
> my $protein_feature_analysis = $translation_id;
>
>
>
> for my $analysis ( keys %protein_features ){
>
> $protein_feature_analysis .= ":".$analysis;
>
> $protein_feature_analysis .=
> ",".$protein_features{$analysis}{interpro_ac};
>
> $protein_feature_analysis .=
> ",".$protein_features{$analysis}{idesc};
>
> $protein_feature_analysis .=
> ",".$protein_features{$analysis}{start};
>
> $protein_feature_analysis .=
> ",".$protein_features{$analysis}{end};
>
> }
>
>
> #######################################################################################################
>
>
>
> return $protein_feature_analysis;
>
> }
>
>
>
> Unfortunately it seems that this annotation is only working for the
> mitochondrial chromosome.
>
> Could you point me to where I might be doing something wrong?
>
>
>
> Best regards
>
>
>
> Duarte Molha
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
More information about the Dev
mailing list