[ensembl-dev] Variant Consequence Predictor
Will McLaren
wm2 at ebi.ac.uk
Fri Feb 25 11:02:16 GMT 2011
Hello,
Strands are dealt with internally by the API; if you submit a variant
on the forward strand (as all variants in a VCF file are expected to
be) then it will be "flipped" to the reverse strand before comparison
with a transcript that is on the reverse strand. Rest assured that if
you submit the correct strand for your variant the script and
underlying API will deal with this correctly.
This looks like some mixed up submissions in dbSNP resulting in an odd
set of alleles coming out at the end; ss202899005 has C/G reported on
the forward strand, whereas all other submissions report C/T (or
equivalent G/A on reverse). Why dbSNP have then reported the reference
alleles as C/G (especially considering the ssID with the longest
flank, ss202899005, has C/T) is a question that can only be answered
by dbSNP I'm afraid.
Will
On 25 February 2011 10:32, Stuart Meacham <sm766 at cam.ac.uk> wrote:
> On 25/02/11 10:26, Laura Clarke wrote:
>>
>> The dbsnp page for this site would suggest this is a strand issue
>>
>>
>> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs79525962
>> ss126705640 1000GENOMES|SRP_031_6362630_chr11_76049066 byFreq
>> fwd/B C/T ggttgagccgctgcaggctggccagattgg
>> aaaggtgtatgggggcaggtcccgcagggc 04/18/09 03/08/10 131
>> Genomic unknown
>> ss161043141
>> ILLUMINA|HumanOmni1-Quad_v1-0_B_SNP11-76049066-128_T_R_1587991382
>> rev/T A/G gccctgcgggacctgcccccatacaccttt
>> ccaatctggccagcctgcagcggctcaacc 08/04/09 10/05/09 131
>> Genomic unknown
>> ss168944051 ILLUMINA|Human1M-Duov3_B_GA010171-0_T_R_1533432755
>> rev/T A/G gccctgcgggacctgcccccatacaccttt
>> ccaatctggccagcctgcagcggctcaacc 10/01/09 10/01/09 132
>> Genomic unknown
>> ss202899005 BUSHMAN|BUSHMAN-chr11-76049065 fwd/ C/G
>> ggttgagccgctgcaggctggccagattgg aaaggtgtatgggggcaggtcccgcagggc 02/16/10
>> 03/08/10 132 Genomic unknown
>> ss235617451 1000GENOMES|pilot_1_CEU_5222080_chr11_76049066
>> fwd/ C/T ggttgagccgctgcaggctggccagattgg
>> aaaggtgtatgggggcaggtcccgcagggc 05/01/10 05/01/10 132
>> Genomic unknown
>> ss242238239 1000GENOMES|pilot_1_CHB+JPT_4123316_chr11_76049066
>> fwd/ C/T ggttgagccgctgcaggctggccagattgg
>> aaaggtgtatgggggcaggtcccgcagggc 05/01/10 05/01
>>
>> The C/T ss ids are all forward strand but the A/G ones are reverse strand
>>
>> The transcript itself is on the reverse strand
>>
>>
>> http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000137507;r=11:76368568-76381791;t=ENST00000260061;v=rs79525962;vdb=variation;vf=1164142
>>
>> which means for the transcript the correct consequence is
>> rs79525962 407 C T A/P NON_SYNONYMOUS_CODING
>
>
> Does this mean that when passed a VCF any consequence prediction for a
> Transcript on the reverse strand is possibly incorrect!? This would be
> misleading as the script actually returns the transcript on which the SNP
> and consequence are predicted to effect . . .
>
> Stuart
>
More information about the Dev
mailing list