[ensembl-dev] Variant Consequence Predictor

Will McLaren wm2 at ebi.ac.uk
Fri Feb 25 11:02:16 GMT 2011


Hello,

Strands are dealt with internally by the API; if you submit a variant
on the forward strand (as all variants in a VCF file are expected to
be) then it will be "flipped" to the reverse strand before comparison
with a transcript that is on the reverse strand. Rest assured that if
you submit the correct strand for your variant the script and
underlying API will deal with this correctly.

This looks like some mixed up submissions in dbSNP resulting in an odd
set of alleles coming out at the end; ss202899005 has C/G reported on
the forward strand, whereas all other submissions report C/T (or
equivalent G/A on reverse). Why dbSNP have then reported the reference
alleles as C/G (especially considering the ssID with the longest
flank, ss202899005, has C/T) is a question that can only be answered
by dbSNP I'm afraid.

Will

On 25 February 2011 10:32, Stuart Meacham <sm766 at cam.ac.uk> wrote:
> On 25/02/11 10:26, Laura Clarke wrote:
>>
>> The dbsnp page for this site would suggest this is a strand issue
>>
>>
>> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs79525962
>> ss126705640     1000GENOMES|SRP_031_6362630_chr11_76049066      byFreq
>>  fwd/B   C/T     ggttgagccgctgcaggctggccagattgg
>>  aaaggtgtatgggggcaggtcccgcagggc  04/18/09        03/08/10        131
>> Genomic                 unknown
>> ss161043141
>> ILLUMINA|HumanOmni1-Quad_v1-0_B_SNP11-76049066-128_T_R_1587991382
>>     rev/T   A/G     gccctgcgggacctgcccccatacaccttt
>>  ccaatctggccagcctgcagcggctcaacc  08/04/09        10/05/09        131
>> Genomic                 unknown
>> ss168944051     ILLUMINA|Human1M-Duov3_B_GA010171-0_T_R_1533432755
>>      rev/T   A/G     gccctgcgggacctgcccccatacaccttt
>>  ccaatctggccagcctgcagcggctcaacc  10/01/09        10/01/09        132
>> Genomic                 unknown
>> ss202899005     BUSHMAN|BUSHMAN-chr11-76049065          fwd/    C/G
>> ggttgagccgctgcaggctggccagattgg  aaaggtgtatgggggcaggtcccgcagggc  02/16/10
>>    03/08/10        132     Genomic                 unknown
>> ss235617451     1000GENOMES|pilot_1_CEU_5222080_chr11_76049066
>>  fwd/    C/T     ggttgagccgctgcaggctggccagattgg
>>  aaaggtgtatgggggcaggtcccgcagggc  05/01/10        05/01/10        132
>> Genomic                 unknown
>> ss242238239     1000GENOMES|pilot_1_CHB+JPT_4123316_chr11_76049066
>>      fwd/    C/T     ggttgagccgctgcaggctggccagattgg
>>  aaaggtgtatgggggcaggtcccgcagggc  05/01/10        05/01
>>
>> The C/T ss ids are all forward strand but the A/G ones are reverse strand
>>
>> The transcript itself is on the reverse strand
>>
>>
>> http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000137507;r=11:76368568-76381791;t=ENST00000260061;v=rs79525962;vdb=variation;vf=1164142
>>
>> which means for the transcript the correct consequence is
>> rs79525962 407 C T A/P NON_SYNONYMOUS_CODING
>
>
> Does this mean that when passed a VCF any consequence prediction for a
> Transcript on the reverse strand is possibly incorrect!? This would be
> misleading as the script actually returns the transcript on which the SNP
> and consequence are predicted to effect . . .
>
> Stuart
>




More information about the Dev mailing list