[ensembl-dev] why would a snp have multiple consequences in the same transcript
Andreas Kahari
ak at ebi.ac.uk
Fri Nov 19 15:54:49 GMT 2010
On Fri, Nov 19, 2010 at 03:22:21PM +0000, Andrea Edwards wrote:
> Hi
>
> I think I have asked this question before but I can't find the
> answer in my archive of answers so I'm really really sorry about
> this.
>
> Why would a SNP have multiple consequences in a single transcript?
> This code returns an array of consequences:
>
> my @tvs = @{$vf->get_all_TranscriptVariations};
> foreach my $tv (@tvs) {
> my @consequences = @{$tv->consequence_type};
>
> where $vf is a variation feature and $tv is a transcript variant.
>
> I thought perhaps this could be a convention issue with the api as
> you generally return array references from functions
>
> I understand a SNP could have different consequences in the same
> gene as it might have a different impact on each splice variant, but
> how can it have multiple consequences in a single transcript?
>
> I've looked at the consequence types and they do appear to be
> mutually exclusive. At best something could be both synonymous and
> splice site (or non-syn and splice site) if it occurs in the
> first/last few bases of an exon
>
> I apologise for the duplicate question.
>
> Thanks in advance for your help
Not knowing much about the theory behind the variation data but just
looking at the latest released variation database for human, we have:
(picking and counting the transcript variation features that has more
than one consequence, i.e. a consequence_type with a comma in it)
mysql> select count(1), consequence_type from transcript_variation where consequence_type like "%,%" group by consequence_type;
+----------+--------------------------------------------------+
| count(1) | consequence_type |
+----------+--------------------------------------------------+
| 128 | STOP_GAINED,FRAMESHIFT_CODING |
| 263 | STOP_GAINED,SPLICE_SITE |
| 47 | COMPLEX_INDEL,SPLICE_SITE |
| 1609 | FRAMESHIFT_CODING,SPLICE_SITE |
| 6187 | NON_SYNONYMOUS_CODING,SPLICE_SITE |
| 3496 | SPLICE_SITE,SYNONYMOUS_CODING |
| 1263 | SPLICE_SITE,5PRIME_UTR |
| 349 | SPLICE_SITE,3PRIME_UTR |
| 31937 | ESSENTIAL_SPLICE_SITE,INTRONIC |
| 57738 | SPLICE_SITE,INTRONIC |
| 607 | STOP_GAINED,NMD_TRANSCRIPT |
| 1 | STOP_LOST,NMD_TRANSCRIPT |
| 5 | COMPLEX_INDEL,NMD_TRANSCRIPT |
| 2076 | FRAMESHIFT_CODING,NMD_TRANSCRIPT |
| 4 | STOP_GAINED,FRAMESHIFT_CODING,NMD_TRANSCRIPT |
| 15385 | NON_SYNONYMOUS_CODING,NMD_TRANSCRIPT |
| 11 | STOP_GAINED,SPLICE_SITE,NMD_TRANSCRIPT |
| 1 | COMPLEX_INDEL,SPLICE_SITE,NMD_TRANSCRIPT |
| 71 | FRAMESHIFT_CODING,SPLICE_SITE,NMD_TRANSCRIPT |
| 247 | NON_SYNONYMOUS_CODING,SPLICE_SITE,NMD_TRANSCRIPT |
| 7647 | SYNONYMOUS_CODING,NMD_TRANSCRIPT |
| 101 | SPLICE_SITE,SYNONYMOUS_CODING,NMD_TRANSCRIPT |
| 5016 | 5PRIME_UTR,NMD_TRANSCRIPT |
| 33 | SPLICE_SITE,5PRIME_UTR,NMD_TRANSCRIPT |
| 46181 | 3PRIME_UTR,NMD_TRANSCRIPT |
| 434 | SPLICE_SITE,3PRIME_UTR,NMD_TRANSCRIPT |
| 2213159 | INTRONIC,NMD_TRANSCRIPT |
| 1948 | ESSENTIAL_SPLICE_SITE,INTRONIC,NMD_TRANSCRIPT |
| 4024 | SPLICE_SITE,INTRONIC,NMD_TRANSCRIPT |
+----------+--------------------------------------------------+
29 rows in set (0.00 sec)
So there's quite a lot of variations that have more than one type of
consequence in a transcript.
Let's look at one group of these, the four variations with consequence
"STOP_GAINED,FRAMESHIFT_CODING,NMD_TRANSCRIPT":
mysql> select tv.transcript_stable_id, vf.allele_string, vf.variation_name from transcript_variation tv join variation_feature vf using (variation_feature_id) where tv.consequence_type = 'STOP_GAINED,FRAMESHIFT_CODING,NMD_TRANSCRIPT';
+----------------------+---------------+----------------+
| transcript_stable_id | allele_string | variation_name |
+----------------------+---------------+----------------+
| ENST00000458701 | C/A/T/-/G | rs41556120 |
| ENST00000426590 | C/A/T/- | rs41559415 |
| ENST00000466779 | G/A/- | rs6474 |
| ENST00000469053 | G/A/- | rs6474 |
+----------------------+---------------+----------------+
4 rows in set (0.00 sec)
So, a SNP can obviously have more than one consequence because it does
not necessarily provide only one other possible base in the given
location.
Andreas
--
Andreas Kähäri, Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD, United Kingdom
More information about the Dev
mailing list