[ensembl-dev] duplicate entries in ftp file ensembl vertebrates

Marc Chakiachvili mchakiachvili at ebi.ac.uk
Tue Nov 1 13:43:17 GMT 2022


Hi Rizwan, sorry to hear that,

Can you be more specifi and tell me which file is causing the
duplicates? Are they all in the same file you download, or across
multiple?

Thanks for your inputs. 

Kind regards, 

Marc 

On Tue, 2022-11-01 at 11:47 +0000, Rizwan Ishtiaq wrote:
> Hi team,
> It seems you have duplicate entries in ensembl vertebrates files
> which we are downloading from FTP
> locationftp.ensembl.org:/pub/rapid-release/species/{{name}}/*/geneset
> /*/*genes.embl.gz
> There are total 61,891 duplicate accessions. Some problematic
> protein_id examples are following
>  * ENSCCRP00000037442
>  * ENSCCRP00000031589
>  * ENSCCRP00000015336
>  * ENSCCRP00000036697
>  * ENSCCRP00000080301
>  * ENSCCRP00000039261
>  * ENSCCRP00000020308
>  * ENSCCRP00000014142
>  * ENSCCRP00000025673
>  * ENSCCRP00000005781
> Kindly can you fix it and let me know, as we are unable to load data
> into uniparc.
> Regards,
> Rizwan Ishtiaq
> uniprot team
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20221101/bda090bb/attachment.html>


More information about the Dev mailing list