[ensembl-dev] why would a snp have multiple consequences in the same transcript

Andreas Kahari ak at ebi.ac.uk
Fri Nov 19 15:54:49 GMT 2010


On Fri, Nov 19, 2010 at 03:22:21PM +0000, Andrea Edwards wrote:
> Hi
> 
> I think I have asked this question before but I can't find the
> answer in my archive of answers so I'm really really sorry about
> this.
> 
> Why would a SNP have multiple consequences in a single transcript?
> This code returns an array of consequences:
> 
> my @tvs = @{$vf->get_all_TranscriptVariations};
>  foreach my $tv (@tvs) {
>     my @consequences = @{$tv->consequence_type};
> 
> where $vf is a variation feature and $tv is a transcript variant.
> 
> I thought perhaps this could be a convention issue with the api as
> you generally return array references from functions
> 
> I understand a SNP could have different consequences in the same
> gene as it might have a different impact on each splice variant, but
> how can it have multiple consequences in a single transcript?
> 
> I've looked at the consequence types and they do appear to be
> mutually exclusive. At best something could be both synonymous and
> splice site (or non-syn and  splice site) if it occurs in the
> first/last few bases of an exon
> 
> I apologise for the duplicate question.
> 
> Thanks in advance for your help

Not knowing much about the theory behind the variation data but just
looking at the latest released variation database for human, we have:

(picking and counting the transcript variation features that has more
than one consequence, i.e. a consequence_type with a comma in it)
mysql> select count(1), consequence_type from transcript_variation where consequence_type like "%,%" group by consequence_type;
+----------+--------------------------------------------------+
| count(1) | consequence_type                                 |
+----------+--------------------------------------------------+
|      128 | STOP_GAINED,FRAMESHIFT_CODING                    |
|      263 | STOP_GAINED,SPLICE_SITE                          |
|       47 | COMPLEX_INDEL,SPLICE_SITE                        |
|     1609 | FRAMESHIFT_CODING,SPLICE_SITE                    |
|     6187 | NON_SYNONYMOUS_CODING,SPLICE_SITE                |
|     3496 | SPLICE_SITE,SYNONYMOUS_CODING                    |
|     1263 | SPLICE_SITE,5PRIME_UTR                           |
|      349 | SPLICE_SITE,3PRIME_UTR                           |
|    31937 | ESSENTIAL_SPLICE_SITE,INTRONIC                   |
|    57738 | SPLICE_SITE,INTRONIC                             |
|      607 | STOP_GAINED,NMD_TRANSCRIPT                       |
|        1 | STOP_LOST,NMD_TRANSCRIPT                         |
|        5 | COMPLEX_INDEL,NMD_TRANSCRIPT                     |
|     2076 | FRAMESHIFT_CODING,NMD_TRANSCRIPT                 |
|        4 | STOP_GAINED,FRAMESHIFT_CODING,NMD_TRANSCRIPT     |
|    15385 | NON_SYNONYMOUS_CODING,NMD_TRANSCRIPT             |
|       11 | STOP_GAINED,SPLICE_SITE,NMD_TRANSCRIPT           |
|        1 | COMPLEX_INDEL,SPLICE_SITE,NMD_TRANSCRIPT         |
|       71 | FRAMESHIFT_CODING,SPLICE_SITE,NMD_TRANSCRIPT     |
|      247 | NON_SYNONYMOUS_CODING,SPLICE_SITE,NMD_TRANSCRIPT |
|     7647 | SYNONYMOUS_CODING,NMD_TRANSCRIPT                 |
|      101 | SPLICE_SITE,SYNONYMOUS_CODING,NMD_TRANSCRIPT     |
|     5016 | 5PRIME_UTR,NMD_TRANSCRIPT                        |
|       33 | SPLICE_SITE,5PRIME_UTR,NMD_TRANSCRIPT            |
|    46181 | 3PRIME_UTR,NMD_TRANSCRIPT                        |
|      434 | SPLICE_SITE,3PRIME_UTR,NMD_TRANSCRIPT            |
|  2213159 | INTRONIC,NMD_TRANSCRIPT                          |
|     1948 | ESSENTIAL_SPLICE_SITE,INTRONIC,NMD_TRANSCRIPT    |
|     4024 | SPLICE_SITE,INTRONIC,NMD_TRANSCRIPT              |
+----------+--------------------------------------------------+
29 rows in set (0.00 sec)

So there's quite a lot of variations that have more than one type of
consequence in a transcript.

Let's look at one group of these, the four variations with consequence
"STOP_GAINED,FRAMESHIFT_CODING,NMD_TRANSCRIPT":

mysql> select tv.transcript_stable_id, vf.allele_string, vf.variation_name from transcript_variation tv join variation_feature vf using (variation_feature_id) where tv.consequence_type = 'STOP_GAINED,FRAMESHIFT_CODING,NMD_TRANSCRIPT';
+----------------------+---------------+----------------+
| transcript_stable_id | allele_string | variation_name |
+----------------------+---------------+----------------+
| ENST00000458701      | C/A/T/-/G     | rs41556120     |
| ENST00000426590      | C/A/T/-       | rs41559415     |
| ENST00000466779      | G/A/-         | rs6474         |
| ENST00000469053      | G/A/-         | rs6474         |
+----------------------+---------------+----------------+
4 rows in set (0.00 sec)


So, a SNP can obviously have more than one consequence because it does
not necessarily provide only one other possible base in the given
location.


Andreas

-- 
Andreas Kähäri, Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD, United Kingdom




More information about the Dev mailing list