[ensembl-dev] undefined MAF values

Will McLaren wm2 at ebi.ac.uk
Wed Mar 13 16:15:43 GMT 2013


Hi Jens,

The minor allele frequency data is available only for those variants
genotyped in the 1000 Genomes project - variants from many other
sources are found in the Ensembl Variation databases, hence why you
are seeing entries with no MAF data.

Regarding the AA change, this is a little more subtle. When you make
the call to fetch the variation features, you are asking for those
variation features that have a consequence of missense_variant in at
least one transcript. Many variants overlap several transcripts (often
in many genes), and the same genomic change may cause a different
change in a different transcript depending on exon structure and
reading frame etc.

Since transcript variation objects represent the overlap between a
variation feature and a transcript, you can filter at this level in
your loop to only process those that have a consequence type of
missense_variant.

Hope this is clear!

Regards

Will McLaren
Ensembl Variation

On 13 March 2013 13:36, Jens Christian Nielsen <jcfnielsen at gmail.com> wrote:
> Hi Ensemblers,
>
> I have made a script extracting missense mutations and their frequencies of
> different human proteins through the perl api. Below is shown the core of
> the script as well as the output. My question goes to the output, which
> contain an awfull lot of "undefined" especially for the frequencies (MAF),
> but also in some cases for the two other parameters (aa change, and
> postion). Do I do something wrong to get all those undef, or is it just the
> DB that is missing some information? Also, why does the script produce
> multiples of the same output (many lines are the same)? I am not specifying
> it to return anything more than once.
>
> /Jens
>
> my @vfs = @{$vfa-> fetch_all_by_Slice_SO_terms($slice,
> ['missense_variant'])};
> foreach my $vf (@vfs) {
> my $transcript_variations = $vf->get_all_TranscriptVariations;
> if (defined $transcript_variations){
> foreach my $tv (@{$transcript_variations}){
> if (defined $tv->pep_allele_string) { # the AA change
> print $tv->pep_allele_string . "\t";
> } else { print "undef \t"; }
> if (defined $tv->translation_start ) {
> print $tv->translation_start,'-',$tv->translation_end,"\t"; # AA position in
> protein
> } else { print "undef \t"; }
> if (defined $vf->minor_allele_frequency ) {
> print $vf->minor_allele_frequency . "\n"; # the Minor Allele Frequency
> } else { print "undef \n"; }
> }
> }
> }
>>>> perl missens_freq.pl
> aa change position Minor allele freq
> -----------------------------------------
> R/Q 64-64   undef
> R/Q 64-64   undef
> R/Q     64-64   undef
> R/W     64-64   undef
> R/W     64-64   undef
> R/W     64-64   undef
> P/T     46-46   undef
> P/T     46-46   undef
> P/T     46-46   undef
> G/E     40-40   undef
> G/E     40-40   undef
> G/E     40-40   undef
> S/F     32-32   undef
> S/F     32-32   undef
> S/F     32-32   undef
> S/L     25-25   undef
> S/L     25-25   undef
> S/L     25-25   undef
> K/R     17-17   0.0027
> K/R     17-17   0.0027
> K/R     17-17   0.0027
> I/T     6-6     0.0856
> I/T     6-6     0.0856
> I/T     6-6     0.0856
> S/F     3-3     undef
> S/F     3-3     undef
> S/F     3-3     undef
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>




More information about the Dev mailing list