[ensembl-dev] VEP Cache Gene List

Fergal fergal at ebi.ac.uk
Fri Aug 5 09:12:06 BST 2016


Hi Michael,

These gene symbols do occur in the e75 human database, they are cross references in the otherfeatures database and come from the import of RefSeq gene models we did for e75. The gene symbols come from the RefSeq GTF files that were loaded as part of this analysis.

With more recent imports (ones on GRCh38) the RefSeq annotation may have changed and the gene symbol may be updated as a result.

Thanks,

Fergal.


On 4 Aug 2016, at 03:32, Michael Milton <michael.milton at unimelb.edu.au> wrote:

> Yes we are using GRCh37. So we looked at the Ensembl release 75 genes (just using biomart) and there are still a number of gene symbols that VEP is outputting that don't occur in the GRCh37 Ensembl database at all (as far as I can tell anyway).
> 
> Namely:
> ARMCX5-GPRASP2
> C7orf10
> PCDHGB5
> PHOSPHO2-KLHL23
> THEG5
> UQCRHL
> ZSCAN26
> There are others (100-200 I believe?) but this is just a small sample.
> 
> Any ideas where these symbols are derived from?
> From: dev-bounces at ensembl.org <dev-bounces at ensembl.org> on behalf of Will McLaren <wm2 at ebi.ac.uk>
> Sent: Wednesday, 3 August 2016 10:46:02 PM
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] VEP Cache Gene List
>  
> Hi Michael,
> 
> I assume from this that you are using the GRCh37 assembly? Ensembl and therefore VEP annotations on GRCh37 were frozen at Ensembl release 75 as we have moved our main annotation pipelines to GRCh38.
> 
> See the about box on the RH side of http://grch37.ensembl.org/index.html
> 
> Regards
> 
> Will McLaren
> Ensembl Variation
> 
> On 3 August 2016 at 06:41, Michael Milton <michael.milton at unimelb.edu.au> wrote:
> Hi, I'm using VEP with a downloaded cache, and it seems that VEP is annotating some non-hgnc gene symbols or at least outdated symbols. I need a gene list including all possible genes that VEP could output for a downstream application. Is there a definitive list of genes that VEP could output? Otherwise, what source is used to build the VEP cache?
> 
> Some of the superceded gene symbols it's outputting, using the Ensembl 83 refseq cache, are:
> GLTPD1
> C1orf86
> C7orf63
> PCNXL3
> Thanks!
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160805/d86d2336/attachment.html>


More information about the Dev mailing list