[ensembl-dev] duplicate entries in ftp file ensembl vertebrates
Rizwan Ishtiaq
rizwan.ishtiaq at ebi.ac.uk
Tue Nov 1 11:47:21 GMT 2022
Hi team,
It seems you have duplicate entries in ensembl vertebrates files which
we are downloading from FTP location
ftp.ensembl.org:/pub/rapid-release/species/{{name}}/*/geneset/*/*genes.embl.gz
There are total 61,891 duplicate accessions. Some problematic protein_id
examples are following
* ENSCCRP00000037442
* ENSCCRP00000031589
* ENSCCRP00000015336
* ENSCCRP00000036697
* ENSCCRP00000080301
* ENSCCRP00000039261
* ENSCCRP00000020308
* ENSCCRP00000014142
* ENSCCRP00000025673
* ENSCCRP00000005781
Kindly can you fix it and let me know, as we are unable to load data
into uniparc.
Regards,
Rizwan Ishtiaq
uniprot team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20221101/2a055124/attachment.html>
More information about the Dev
mailing list