[ensembl-dev] Refseq cache file

Andrew Parton aparton at ebi.ac.uk
Fri Apr 17 17:41:23 BST 2020

Hi Duarte,

Unfortunately we don’t have one GFF file that covers all transcripts within our GRCh37 cache files. Additionally, we will be providing significant updates to these files very soon.

For release 100, scheduled for release at the end of April, a new set of RefSeq transcripts are included within our GRCh37 cache files. They can be found here: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/all_assembly_versions/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gff.gz <ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/all_assembly_versions/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gff.gz>

As for release 99, the GRCh37 RefSeq cache contains 2 different RefSeq versions
the last annotation on GRCh37 
ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/H_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/GFF/ref_GRCh37.p13_top_level.gff3.gz <ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/H_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/GFF/ref_GRCh37.p13_top_level.gff3.gz>
a GRCh38 annotation projected to GRCh37
ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/H_sapiens/ARCHIVE/ANNOTATION_RELEASE.109/GRCh37.p13_interim_annotation/ <ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/H_sapiens/ARCHIVE/ANNOTATION_RELEASE.109/GRCh37.p13_interim_annotation/>

If you would like to have a closer look at the exact data included within the RefSeq cache file, you can access our publicly available mysql database by following these instructions: https://www.ensembl.org/info/data/mysql.html <https://www.ensembl.org/info/data/mysql.html> - the homo_sapiens_otherfeatures_99_37 database contains the transcript sets included within our cache files.

Kind Regards,

> On 16 Apr 2020, at 17:01, Duarte Molha <duartemolha at gmail.com> wrote:
> Dear Devs
> I was wondering if you could help me with the source of the cache data for VEP
> ON this link https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html <https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html>
> you list the refseq source of the transcripts used to this file:
> 2019-06-28
> (GCF_000001405.39_GRCh38.p13_genomic.gff)
> This is great but I am interested in also getting the correct source for the hg19 version
> You have simply listed it as :
> 2015-01
> And I have not been able to match this date to any of the GCF files 
> The latest I could find for GRCh37 is 
> https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.25/ <https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.25/> 
> but this file dates to   2013/06/28 
> Can you please point me where I can get the 2015-01 refseq GFF source file you have used for the cache?
> Best regards
> Duarte
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200417/57e631cd/attachment.html>

More information about the Dev mailing list