[ensembl-dev] from hgvs to vcf: Gene set information (genePred, RefFlat)

mr6 at ebi.ac.uk mr6 at ebi.ac.uk
Mon Dec 23 17:48:49 GMT 2013


Hi Guillermo,

The gtf file should contain both gene and transcript information for each
gene.
Typically, a line looks like this:
MT      Mt_tRNA exon    3230    3304    .       +       .       gene_id
"ENSG00000209082"; transcript_id "ENST00000386347"; exon_number "1";
gene_name "MT-TL1"; gene_biotype "Mt_tRNA"; transcript_name "MT-TL1-201";
exon_id "ENSE00002006242";
with gene and transcript it.

We do not provide version numbers in the gtf files.
However, if you use the same ensembl release for both the gene annotations
and the genomic sequences, this should guarantee the results are
consistent.

Alternatively, you can see which version of a gene/transcript was used in
which version, using our rest service:
http://beta.rest.ensembl.org/archive/id/ENST00000515609?content-type=application/json


Hope that helps,
Magali

> Dear all,
>
> I'm trying to translate the HGVS data I'm getting in my annotations with
> the Ensembl Database to VCF format, so that I can assign a VCF
> alternative allele to a Ensembl annotated consequence. Please consider
> the following example:
>
> chr1    154164465    .    C *A,G*    0.04    SNP_AF
> AC=1,1;AF=0.125,0.125;AN=8;BaseQRankSum=-0.769;DP=34;Dels=0.0;FS=0.0;HaplotypeScore=0.0;MLEAC=1,1;MLEAF=0.125,0.125;MQ=60.0;MQ0=0;MQRankSum=-0.329;QD=0.0;ReadPosRankSum=0.256;SDP=11;SFREQ=0.111;set=FilteredInAll;CSQ=*TPM3|ENSG00000143549|tropomyosin_3|**ENST00000515609.1:c.30G>T*||2/3|ENSP00000426306.1:p.Gln10His|missense_variant|||||||||||Transcript|ENST00000515609||||3.270|24|deleterious(0.798)|deleterious(0)|possibly_damaging(0.896)|Coiled-coils_(Ncoils):ncoils|||||TCAGCTTGCTCTGCCCGATCCAGAGCATTCTCCTTGTCTAACTTCAGCAT[C/A&G]TGCATCTTTTTCTTGATGGCCTCCATCATGAGCAGTGGCTGTTGGTAGGC
> GT:AD:DP:FREQ:GQ:PL    0/2:8,0,1:9:0,0.111:10:10,34,307,0,273,270
> 0/0:4,0,0:4:0,0:12:0,12,150,12,150,150
> 0/1:9,1,0:10:0.1,0:11:11,0,287,36,290,326
> 0/0:11,0,0:11:0,0:30:0,30,398,30,398,398
>
>  From the HGVS data (*ENST00000515609.1:c.30G>T*) I can use existent
> hgvs code libraries for obtaining the corresponding VCF-formatted
> variant (*chr1 154164465 C > A*), and thus being able to select the
> alternative allele (*A*) relative to the consequence. The problem I'm
> finding here, is that I need to retrieve the gene set information, also
> including transcript versions (ENST00000515609*.1*) since different
> versions may yield different results in the transformation from
> transcript coordinates to genomic coordinates. The only resource I've
> found in Ensembl is the gene set in .gtf format
> (ftp://ftp.ensembl.org/pub/release-74/gtf/homo_sapiens/Homo_sapiens.GRCh37.74.gtf.gz),
> but no transcript information is available in this file. Is there any
> other file containing this information (genePred or RefFlat, for
> example)?? Any other hints??
>
> Thank you in advance,
> --
> Guillermo Marco Puche
> ------------------------------------------------------------------------
>
> Guillermo Marco Puche
> Bioinformatician, Computer Science Engineer.
> Sistemas Genómicos S.L.
> Phone: +34 902 364 669
> Fax: +34 902 364 670
> www.sistemasgenomicos.com
>
>
>
> <https://www.sistemasgenomicos.com/web_sg/web/areas-bioinformatica.php>
>
> ------------------------------------------------------------------------
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>





More information about the Dev mailing list