[ensembl-dev] duplicate entries in ftp file ensembl vertebrates

Rizwan Ishtiaq rizwan.ishtiaq at ebi.ac.uk
Tue Nov 1 11:47:21 GMT 2022


Hi team,

It seems you have duplicate entries in ensembl vertebrates files which 
we are downloading from FTP location 
ftp.ensembl.org:/pub/rapid-release/species/{{name}}/*/geneset/*/*genes.embl.gz

There are total 61,891 duplicate accessions. Some problematic protein_id 
examples are following

  * ENSCCRP00000037442
  * ENSCCRP00000031589
  * ENSCCRP00000015336
  * ENSCCRP00000036697
  * ENSCCRP00000080301
  * ENSCCRP00000039261
  * ENSCCRP00000020308
  * ENSCCRP00000014142
  * ENSCCRP00000025673
  * ENSCCRP00000005781

Kindly can you fix it and let me know, as we are unable to load data 
into uniparc.

Regards,
Rizwan Ishtiaq
uniprot team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20221101/2a055124/attachment.html>


More information about the Dev mailing list