[ensembl-dev] Refseq cache file

Duarte Molha duartemolha at gmail.com
Mon Apr 20 11:33:14 BST 2020


Many thanks

Good to know that going forward there will be a single source file.

Unfortunately right now we are using version 98 and will not be
transitioning in the short term.

Could you let me know the source ref files for version 98 cache?
Are they the same as version 99 you listed?

Many thanks again

Duarte


On Fri, 17 Apr 2020 at 17:42, Andrew Parton <aparton at ebi.ac.uk> wrote:

> Hi Duarte,
>
> Unfortunately we don’t have one GFF file that covers all transcripts
> within our GRCh37 cache files. Additionally, we will be providing
> significant updates to these files very soon.
>
> For release 100, scheduled for release at the end of April, a new set of
> RefSeq transcripts are included within our GRCh37 cache files. They can be
> found here:
> ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/all_assembly_versions/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gff.gz
>
> As for release 99, the GRCh37 RefSeq cache contains 2 different RefSeq
> versions
>
>    - the last annotation on GRCh37
>       -
>       ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/H_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/GFF/ref_GRCh37.p13_top_level.gff3.gz
>    - a GRCh38 annotation projected to GRCh37
>    -
>       ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/H_sapiens/ARCHIVE/ANNOTATION_RELEASE.109/GRCh37.p13_interim_annotation/
>
>
>
> If you would like to have a closer look at the exact data included within
> the RefSeq cache file, you can access our publicly available mysql database
> by following these instructions:
> https://www.ensembl.org/info/data/mysql.html - the
> homo_sapiens_otherfeatures_99_37 database contains the transcript sets
> included within our cache files.
>
> Kind Regards,
> Andrew
>
>
>
>
> On 16 Apr 2020, at 17:01, Duarte Molha <duartemolha at gmail.com> wrote:
>
> Dear Devs
>
> I was wondering if you could help me with the source of the cache data for
> VEP
>
> ON this link
> https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html
>
> you list the refseq source of the transcripts used to this file:
>
>
>
> *2019-06-28(GCF_000001405.39_GRCh38.p13_genomic.gff)*
>
> This is great but I am interested in also getting the correct source for
> the hg19 version
>
> You have simply listed it as :
>
> *2015-01*
>
> And I have not been able to match this date to any of the GCF files
>
> The latest I could find for GRCh37 is
>
> https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.25/
>
> but this file dates to   2013/06/28
>
> Can you please point me where I can get the 2015-01 refseq GFF source file
> you have used for the cache?
>
> Best regards
>
> Duarte
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200420/68ad3a79/attachment.html>


More information about the Dev mailing list