[ensembl-dev] Blastminwise run error

wenkai jiang biology0046 at gmail.com
Mon Jun 13 12:58:59 BST 2011


I guess that the problem was caused by alternative spilcing models.

i use all known peptides for this species to run PMATCH and BESTPMATCH, for
some seq_region, i found that more than one peptides aligned to the same
region on the scaffold, these peptides are indeed alternative splicing
isoforms.
for example,
|                   108014 |         96739 |            10794 |
11831 |                 1 |         1 |     346 | 13102.m03974 |           8
|  64.5 |      0 |        100 | 1038M      |           NULL |      NULL |
|                    64369 |         96739 |            10794 |
11831 |                 1 |         1 |     346 | 13102.m03975 |           8
|  69.5 |      0 |        100 | 1038M      |           NULL |      NULL |

these two peptides all aligned to region 10794-11831.

When we use genewise, the two peptides are blasted to the target region,
then wise was used to build transcript models, i guess one of the peptide
might build exons overlapped to the exons built from another peptides.

my current question is:
when using PMATCH or BESTPMATCH, should i used only one of the AS isoforms
from the known gene locus, instead of using all protein isoforms?


2011/6/13 wenkai jiang <biology0046 at gmail.com>

> Hi,
>
> I run the wise on scaffolds, but not sequence slices at 1MB etc.
> I have checked the seq_region_id in exon table, but no exon info related to
> the problem scaffold can be found in this table.
>
> for example, the problem scaffold is bambus_1325,
> -------------------- EXCEPTION --------------------
> MSG: Problems running BlastMiniGenewise for
> scaffold:june:bambus_1325:1:47095:1 [
>
> then i trace the exon table for 'bambus_1325', no exon can be found, also
> no transcript can be found to related to bambus_1325 in transcript table.
>
> indeed, i do run the analysis several times,
> rulemanager has try to rerun the failed jobs for 3 times, but they still
> fails, so i use job_submission to rerun the analysis.
>
> But as the case mentioned here, no exon was found to be related to the
> problematic region, so it won't be caused by database operations.
>
> fortunately, i almost resolved these problem jobs by changing the wise
> parameters, eg, the default gap extention penalty is 2, when i increase this
> to 8, only 3 jobs remained as problematic.
>
> all failed jobs still have overlap problems, but all overlaps are only 1
> nucleotide, eg:
>
> > Transcript Exons:
> >   5108-5252 (-1)
> >   4984-4988 (-1)
> >
> > This Exon:
> >   4053-4985 (-1)
>
> I don't understand why could these problems arise.
>
> what dose 'dummy transcript' mean?
>
>
>
> 2011/6/13 Thibaut Hourlier <th3 at sanger.ac.uk>
>
>> Hi Wenkai,
>>
>> Have you looked if the two exons come from the same analysis?
>> If yes, did you have rerun several times this analysis?
>> Are you fetching these exons from the same database?
>>
>> What you can do is create a dummy transcript and you will link the
>> problematic exon to the dummy transcript.
>>
>> You will have to modify in your database:
>> transcript
>> exon_transcript
>> transcript_supporting_feature
>> exon
>>
>> Cheers
>> Thibaut
>>
>> On Mon, 2011-06-13 at 01:55 +0800, wenkai jiang wrote:
>> > I have finished PMATCH,BESTPMACH run, then i run targetedgenewise,
>> > for some jobs (about 50 jobs), i got errors:
>> >
>> > -------------------- EXCEPTION --------------------
>> > MSG: Running
>> > Bio::EnsEMBL::Analysis::Runnable::BlastMiniGenewise=HASH(0x2879ac0)
>> > failed error:
>> > -------------------- EXCEPTION --------------------
>> > MSG: Failed Bio::EnsEMBL::Analysis::Runnable::Genewise=HASH(0x2917c50)
>> > run
>> > -------------------- EXCEPTION --------------------
>> > MSG: Exon overlaps with other exon in same transcript.
>> > Transcript Exons:
>> >   5108-5252 (-1)
>> >   4984-4988 (-1)
>> >
>> > This Exon:
>> >   4053-4985 (-1)
>> >
>> > some settings for targetedgenewise are:
>> >
>> >              EXON_BASED_MASKING => 1,
>> >              GENE_BASED_MASKING => 0,
>> >              PRE_GENEWISE_MASK => 1,
>> >              POST_GENEWISE_MASK => 1,
>> >              REPEATMASKING => [],
>> >              SOFTMASKING => 0,
>> >
>> >              GENEWISE_PARAMETERS => {
>> >                                      # pass parameters go genewise
>> > here, i.e. -program =>"/usr/local/ensembl/bin/genewiseXXX"
>> >                                      # for more options which can be
>> > passed see Runnable/Genewise.pm
>> >                                      #-endbias => 1,
>> >                                      #-matrix => 'BLOSUM80.bla',
>> >                                      #-gap => 20,
>> >                                      #-extension => 8,
>> >                                      #-splice_model => 1
>> >                                     },
>> >
>> > any suggestions?
>> > _______________________________________________
>> > Dev mailing list    Dev at ensembl.org
>> > List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> > Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> --
>>  The Wellcome Trust Sanger Institute is operated by Genome Research
>>  Limited, a charity registered in England with number 1021457 and a
>>  company registered in England with number 2742969, whose registered
>>  office is 215 Euston Road, London, NW1 2BE.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110613/9cbeb3d2/attachment.html>


More information about the Dev mailing list