[ensembl-dev] SIFT-PolyPhen Scores

Fiona Cunningham fiona at ebi.ac.uk
Tue Jun 12 13:59:13 BST 2012


Dear Mark,

We store SIFT and PolyPHEN scores as detailed here:
http://www.ensembl.org/info/docs/variation/predicted_data.html#nsSNP_data_format

We do not store a score for the reference amino acid.

Best wishes,
Fiona

-----------------------------------------------------------------------------------
Fiona Cunningham
Ensembl Coordinator, Ensembl Variation Project Leader.
EMBL-EBI, Genome Campus, Hinxton, UK
www.ensembl.org || www.lrg-sequence.org
fiona at ebi.ac.uk   || t: +44 1223 494612

On 07/06/2012 14:37, Mark Aquino wrote:
> Hi All,
>
> This question sort of falls in the realm of the VEP but not quite.  Some of you may have seen a recent PloSone paper where a researcher at Genentech took the Mutant allele SIFT scores and subtracted them from the wild type (reference) SIFT scores to obtain a "bi-directional" BSIFT score in which protein activation could be predicted with more or less equivalent sensitivity and specificity to SIFT.  Thus, I was interested in testing if this would work with Condel and producing a BCondel tool to predict activation.  So my question, finally, is whether or not the ensembl database stores the full, including reference amino acid, PSSM matrices SIFT generates (to retrieve SIFT reference allele scores as well as mutant scores) and the matrices (PSSM or otherwise) that are used to get PolyPhen-2 scores.
> On Jun 6, 2012, at 4:44 PM, William Spooner wrote:
>
>> Hi Allan,
>>
>> Please see comments inline…
>>
>> On 6 Jun 2012, at 09:33, Allan Kamau wrote:
>>
>>> Thank you Andy for your reply.
>>> I am interested in human genes and in my case a gene specified by
>>> entrezGene as I take entrezGene Ids then I query for transcripts from
>>> Ensembl using the ensembl API.
>>>
>>>
>>> If I understand correctly.
>>>
>>> 1 /
>>> A gene is either a protein structure or a RNA structure that that does
>>> something in the cell. A gene is defined by its function and not
>>> strictly it's structure/chemical composition.
>>
>> In Ensembl terms, a gene is a genomic locus from which RNA is transcribed. Function is not implied. I.e. genes are defined by genomic location _not_ function.
>>
>>
>>> This means that several
>>> independent protein structures or RNA sequence based structures
>>> performing very similar function may each be termed as the same gene.
>>> For example if gene "A" is observed to have some well defined
>>> phenotype characteristics, then all protein structures and or RNA
>>> sequence based structures having these characteristics can each be
>>> termed as instances of gene "A".
>>>
>>> 2 /
>>> For humans several transcripts may yield a independent instances of a
>>> particular gene individually.
>>
>> No - several transcripts may yield independent instances of a particular protein, but if the transcripts are transcribed from different genomic locations (which is entirely possible), then they are from different genes.
>>
>>
>>>
>>> If the above understanding is correct, is it possible to have a
>>> protein structure and a RNA based structure each having the same
>>> phenotype characteristics and therefore yielding a instances of the
>>> same gene. I am asking this because a manual query using gene
>>> "ENSG00000133997" did match some transcripts based on exons (and
>>> produce a protein product) and some transcripts based on introns (not
>>> producing any protein product).
>>
>> The gene in Ensembl is the locus. A single locus can result in multiple transcripts (RNAs) with different functions.
>>
>> I hope that helps,
>>
>> Will
>>
>>
>>
>>>
>>>
>>> Allan.
>>>
>>>
>>>
>>> On 6/5/12, LAW Andy<andy.law at roslin.ed.ac.uk>  wrote:
>>>> Allan,
>>>>
>>>> This all depends on what you mean by "gene".
>>>>
>>>> I think the common understanding is that a "gene" is a piece of the genome
>>>> that does "stuff". There are (protein) coding genes and there are non-coding
>>>> (RNA) genes. They are bits of the genome that get transcribed into RNA (into
>>>> transcripts).
>>>>
>>>> There may be (and often are) more than one transcript from a given "gene".
>>>> These different transcripts arise from either variations in splicing
>>>> (transcript A may have exons 1, 2 and 3 whereas transcript B may have exons
>>>> 1, 3 and 4), from differential use of multiple transcription start sites
>>>> (often one start site is active in one set of cell types/tissues and a
>>>> different start site is active in a separate set of tissues) or from
>>>> different transcription end sites.
>>>>
>>>> A transcript is thus something that is transcribed from the "gene". One gene
>>>> may have many transcripts. A given transcript only comes from one gene (this
>>>> is a one-to-many (gene-to-transcript) relationship).
>>>>
>>>> In general, in Eukaryotes, a coding transcript will give rise to a single
>>>> protein product. I believe that there are examples where this is not
>>>> strictly true in that one transcript may have alternate translation start
>>>> sites, but I'm not confident enough of this to be able to give you an
>>>> example. In Prokaryotes, multiple protein products can be derived from a
>>>> single transcript.
>>>>
>>>> Hope that helps some.
>>>>
>>>>
>>>> On 5 Jun 2012, at 10:10am, Allan Kamau wrote:
>>>>
>>>>> I am still struggling to understand the transcript data in relation to
>>>>> a Gene, to simplify my preceding question, I would to ask.
>>>>> 1)Is a transcript strictly a pre-step to a variant of gene.
>>>>> 2)Is there a one to one relationship between a transcript and the
>>>>> gene. A single transcript gets processed (in case of protein genes)
>>>>> into a single gene and for RNA based genes the transcript may undergo
>>>>> minimal processing to yield a single RNA based gene.
>>>>>
>>>>> Allan.
>>>>>
>>>>> On 6/4/12, Allan Kamau<kamauallan at gmail.com>  wrote:
>>>>>> I am a non-biologist and I would like to get better understanding of a
>>>>>> transcript in relation to a gene.
>>>>>> According to a search on Ensembl, the gene "ENSG00000133997" reports
>>>>>> that it has 11 transcripts and a listing of transcript_ids of these
>>>>>> transcripts is provided. Some of the entries in this transcript
>>>>>> listing have protein product ids and advises that "A protein coding
>>>>>> transcript is a spliced mRNA that leads to a protein product" and the
>>>>>> other entries are described as "No protein product" and have the text
>>>>>> "Retained intronNoncoding transcript containing intronic sequence"
>>>>>> along side them.
>>>>>>
>>>>>> My initial understanding was as follows (which I now think is wrong).
>>>>>> One transcript is one mature mRNA which is composed of one or more
>>>>>> exons observed to join together and have a polyA tail. And that this
>>>>>> single transcript would be translated into a protein based gene. There
>>>>>> could be several such transcripts (for different situations) all
>>>>>> producing perhaps different protein products but each of these protein
>>>>>> products is independently a variant of the the same gene to which
>>>>>> these transcripts are known to yield.
>>>>>>
>>>>>> Or one transcript could be composed of an intron (is it possible to
>>>>>> have multiple introns joined together) that would represent a single
>>>>>> non-protein based (or simply RNA based) gene. And that there could be
>>>>>> multiple such transcripts yielding the same gene (may have different
>>>>>> structure but performs the same function).
>>>>>> And that a gene may either be enzyme based or RNA based, which means
>>>>>> that for a given gene we may not have transcripts representing mRNA
>>>>>> and transcripts representing RNA gene products at the same time.
>>>>>>
>>>>>> Kindly advise.
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> List admin (including subscribe/unsubscribe):
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>> Later,
>>>>
>>>> Andy
>>>> --------
>>>> Yada, yada, yada...
>>>>
>>>> The University of Edinburgh is a charitable body, registered in Scotland,
>>>> with registration number SC005336
>>>> Disclaimer: This e-mail and any attachments are confidential and intended
>>>> solely for the use of the recipient(s) to whom they are addressed. If you
>>>> have received it in error, please destroy all copies and inform the sender.
>>>>
>>>>
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>
>> ---
>> William Spooner           http://eaglegenomics.com
>> M:07779-663045 E:william.spooner at eaglegenomics.com
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list