[ensembl-dev] Error on using parse_ncbi_gff3.pl

Herzig, David david.herzig at roche.com
Wed Sep 21 15:36:49 BST 2016


Hi

ensembl-pipeline: master / b1ca6be3ad6dfc4e960bce7fce5733f745102710
ensembl-io: release/85 / 9eacea74ccaf8480aafc82c6b0ffc626a1537b29

thx.
David

On Wed, Sep 21, 2016 at 3:18 PM, Thibaut Hourlier <thibaut at ebi.ac.uk> wrote:

> Hi David,
> Unfortunately NCBI does not always write their GFF the same way for all
> their species so a fix for a species could bring a bug for another species.
> Could you please tell us which branch/last commit for your
> ensembl-pipeline and ensembl-io repositories?
>
> Thanks
> Thibaut
>
> On 20 Sep 2016, at 13:26, Daniel Barrell <daniel.barrell at eaglegenomics.com>
> wrote:
>
> Odd, D. rerio should have also failed then if my suspicions were correct.
> Guess there must be something else going on here.
>
> Dan
>
>
>
>
>
> *Daniel Barrell*
> Platform Specialist
> <E_Email_Sig.jpg>
> *eagle*discover Best of Show Winner at Bio-IT World 2016
>
> *Eagle Genomics Ltd*
> T: +44 (0)1223 654481
> http://www.eaglegenomics.com
> Disclaimer: http://www.eaglegenomics.com/about/privacy-statement/
>
> https://youtu.be/rPdgFTo0FZM
>
> On 20 September 2016 at 12:14, Herzig, David <david.herzig at roche.com>
> wrote:
>
>> Hi Daniel
>>
>> Thx for the feedback.
>>
>> I was able to use it for:
>> - d rerio
>> - m musculus
>> - r norvegicus
>>
>> regards,
>> David
>>
>>
>> On Tue, Sep 20, 2016 at 1:11 PM, Daniel Barrell <
>> daniel.barrell at eaglegenomics.com> wrote:
>>
>>> Hi David,
>>>
>>> Line 1184334 is the last line of the GFF3 file and contains '###'.
>>> There used to be code to ignore lines like these:
>>>
>>> + next if $line =~ /^#/;
>>>
>>> When the script moved to use ensembl-io I think it may have lost this
>>> check, however I would expect ensembl-io to handle the '###'. Which species
>>> files worked? I checked on NCBI and other species (e.g. horse) would also
>>> fail in the same way.
>>>
>>> Dan
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Daniel Barrell*
>>> Platform Specialist
>>> <E_Email_Sig.jpg>
>>> *eagle*discover Best of Show Winner at Bio-IT World 2016
>>>
>>> *Eagle Genomics Ltd*
>>> T: +44 (0)1223 654481
>>> http://www.eaglegenomics.com
>>> Disclaimer: http://www.eaglegenomics.com/about/privacy-statement/
>>>
>>> https://youtu.be/rPdgFTo0FZM
>>>
>>> On 20 September 2016 at 11:16, Herzig, David <david.herzig at roche.com>
>>> wrote:
>>>
>>>> Hi Ensembl Users
>>>>
>>>> I have setup the ensembl environment for several species. Everything is
>>>> ok.
>>>> After that I imported data from NCBI by using the parse_ncbi_gff3.pl
>>>> script. Works fine for almost all species. But for the specie sus scrofa I
>>>> have the following issue:
>>>>
>>>> I downloaded the file from NCBI:
>>>> /ftp.ncbi.nlm.nih.gov/genomes/Sus_scrofa/GFF/ref_Sscrofa10.2
>>>> _top_level.gff3
>>>>
>>>> I used the parse_ncbi_gff3.pl script to import it.
>>>>
>>>> The process starts successfully but after a while I get the following
>>>> error message and the process stops:
>>>>
>>>> Can't call method "phase" on an undefined value at
>>>> /home/ensembl/release-85/ensembl-pipeline/scripts/refseq_import/
>>>> parse_ncbi_gff3.pl line 882, <__ANONIO__> line 1184334.
>>>>
>>>> Any ideas?
>>>>
>>>> regards,
>>>> David
>>>>
>>>>
>>>> --
>>>>
>>>> David Herzig
>>>> Scientist, pRED Informatics
>>>> Roche Pharma Research and Early Development
>>>>
>>>> Roche Innovation Center Basel
>>>>
>>>> F. Hoffmann-La Roche Ltd
>>>> Grenzacherstrasse 124
>>>> 4070 Basel
>>>> Switzerland
>>>> Phone +41 61 687 31 70
>>>>
>>>> Learn more about pRED Informatics at http://go.roche.com/*pREDi*
>>>> <http://go.roche.com/pREDi>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>>
>> --
>>
>> David Herzig
>> Scientist, pRED Informatics
>> Roche Pharma Research and Early Development
>>
>> Roche Innovation Center Basel
>>
>> F. Hoffmann-La Roche Ltd
>> Grenzacherstrasse 124
>> 4070 Basel
>> Switzerland
>> Phone +41 61 687 31 70
>>
>> Learn more about pRED Informatics at http://go.roche.com/*pREDi*
>> <http://go.roche.com/pREDi>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 

David Herzig
Scientist, pRED Informatics
Roche Pharma Research and Early Development

Roche Innovation Center Basel

F. Hoffmann-La Roche Ltd
Grenzacherstrasse 124
4070 Basel
Switzerland
Phone +41 61 687 31 70

Learn more about pRED Informatics at http://go.roche.com/*pREDi*
<http://go.roche.com/pREDi>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160921/bca50693/attachment.html>


More information about the Dev mailing list