[ensembl-dev] VEP Cache Gene List

Michael Milton michael.milton at unimelb.edu.au
Mon Aug 8 05:23:12 BST 2016


Thanks for your reply. If those genes do occur in the database as cross references, where can I find a copy of them? Biomart doesn't seem to show genes with those symbols.

________________________________
From: dev-bounces at ensembl.org <dev-bounces at ensembl.org> on behalf of Fergal <fergal at ebi.ac.uk>
Sent: Friday, 5 August 2016 6:12:06 PM
To: Ensembl developers list
Subject: Re: [ensembl-dev] VEP Cache Gene List

Hi Michael,

These gene symbols do occur in the e75 human database, they are cross references in the otherfeatures database and come from the import of RefSeq gene models we did for e75. The gene symbols come from the RefSeq GTF files that were loaded as part of this analysis.

With more recent imports (ones on GRCh38) the RefSeq annotation may have changed and the gene symbol may be updated as a result.

Thanks,

Fergal.


On 4 Aug 2016, at 03:32, Michael Milton <michael.milton at unimelb.edu.au<mailto:michael.milton at unimelb.edu.au>> wrote:

Yes we are using GRCh37. So we looked at the Ensembl release 75 genes (just using biomart) and there are still a number of gene symbols that VEP is outputting that don't occur in the GRCh37 Ensembl database at all (as far as I can tell anyway).

Namely:

  *   ARMCX5-GPRASP2
  *   C7orf10
  *   PCDHGB5
  *   PHOSPHO2-KLHL23
  *   THEG5
  *   UQCRHL
  *   ZSCAN26

There are others (100-200 I believe?) but this is just a small sample.

Any ideas where these symbols are derived from?

________________________________
From: dev-bounces at ensembl.org<mailto:dev-bounces at ensembl.org> <dev-bounces at ensembl.org<mailto:dev-bounces at ensembl.org>> on behalf of Will McLaren <wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>>
Sent: Wednesday, 3 August 2016 10:46:02 PM
To: Ensembl developers list
Subject: Re: [ensembl-dev] VEP Cache Gene List

Hi Michael,

I assume from this that you are using the GRCh37 assembly? Ensembl and therefore VEP annotations on GRCh37 were frozen at Ensembl release 75 as we have moved our main annotation pipelines to GRCh38.

See the about box on the RH side of http://grch37.ensembl.org/index.html

Regards

Will McLaren
Ensembl Variation

On 3 August 2016 at 06:41, Michael Milton <michael.milton at unimelb.edu.au<mailto:michael.milton at unimelb.edu.au>> wrote:
Hi, I'm using VEP with a downloaded cache, and it seems that VEP is annotating some non-hgnc gene symbols or at least outdated symbols. I need a gene list including all possible genes that VEP could output for a downstream application. Is there a definitive list of genes that VEP could output? Otherwise, what source is used to build the VEP cache?

Some of the superceded gene symbols it's outputting, using the Ensembl 83 refseq cache, are:

  *   GLTPD1
  *   C1orf86
  *   C7orf63
  *   PCNXL3

Thanks!


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160808/62f4750a/attachment.html>


More information about the Dev mailing list