[ensembl-dev] from hgvs to vcf: Gene set information (genePred, RefFlat)
Guillermo Marco Puche
guillermo.marco at sistemasgenomicos.com
Mon Dec 23 16:35:19 GMT 2013
Dear all,
I'm trying to translate the HGVS data I'm getting in my annotations with
the Ensembl Database to VCF format, so that I can assign a VCF
alternative allele to a Ensembl annotated consequence. Please consider
the following example:
chr1 154164465 . C *A,G* 0.04 SNP_AF
AC=1,1;AF=0.125,0.125;AN=8;BaseQRankSum=-0.769;DP=34;Dels=0.0;FS=0.0;HaplotypeScore=0.0;MLEAC=1,1;MLEAF=0.125,0.125;MQ=60.0;MQ0=0;MQRankSum=-0.329;QD=0.0;ReadPosRankSum=0.256;SDP=11;SFREQ=0.111;set=FilteredInAll;CSQ=*TPM3|ENSG00000143549|tropomyosin_3|**ENST00000515609.1:c.30G>T*||2/3|ENSP00000426306.1:p.Gln10His|missense_variant|||||||||||Transcript|ENST00000515609||||3.270|24|deleterious(0.798)|deleterious(0)|possibly_damaging(0.896)|Coiled-coils_(Ncoils):ncoils|||||TCAGCTTGCTCTGCCCGATCCAGAGCATTCTCCTTGTCTAACTTCAGCAT[C/A&G]TGCATCTTTTTCTTGATGGCCTCCATCATGAGCAGTGGCTGTTGGTAGGC
GT:AD:DP:FREQ:GQ:PL 0/2:8,0,1:9:0,0.111:10:10,34,307,0,273,270
0/0:4,0,0:4:0,0:12:0,12,150,12,150,150
0/1:9,1,0:10:0.1,0:11:11,0,287,36,290,326
0/0:11,0,0:11:0,0:30:0,30,398,30,398,398
From the HGVS data (*ENST00000515609.1:c.30G>T*) I can use existent
hgvs code libraries for obtaining the corresponding VCF-formatted
variant (*chr1 154164465 C > A*), and thus being able to select the
alternative allele (*A*) relative to the consequence. The problem I'm
finding here, is that I need to retrieve the gene set information, also
including transcript versions (ENST00000515609*.1*) since different
versions may yield different results in the transformation from
transcript coordinates to genomic coordinates. The only resource I've
found in Ensembl is the gene set in .gtf format
(ftp://ftp.ensembl.org/pub/release-74/gtf/homo_sapiens/Homo_sapiens.GRCh37.74.gtf.gz),
but no transcript information is available in this file. Is there any
other file containing this information (genePred or RefFlat, for
example)?? Any other hints??
Thank you in advance,
--
Guillermo Marco Puche
------------------------------------------------------------------------
Guillermo Marco Puche
Bioinformatician, Computer Science Engineer.
Sistemas Genómicos S.L.
Phone: +34 902 364 669
Fax: +34 902 364 670
www.sistemasgenomicos.com
<https://www.sistemasgenomicos.com/web_sg/web/areas-bioinformatica.php>
------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20131223/a1dbaa39/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bioinfo.png
Type: image/png
Size: 27377 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20131223/a1dbaa39/attachment.png>
More information about the Dev
mailing list