[ensembl-dev] Ensembl annotation pipeline software available?

Ben Moore bmoore at ebi.ac.uk
Tue Apr 17 10:57:26 BST 2018


Hello,

You have received these e-mails because you are subscribed to the Ensembl developers list. Subscribe/unsubscribe information can be found at the bottom of each e-mail.

Best wishes

Ben

> On 17 Apr 2018, at 10:51, mawenlong <mawenlong_nwsuaf at 163.com> wrote:
> 
> Hi.
> Recently, I kept receving your emails.
> But, I have done nothing about this work.
> So, is there any wrong message?
> Best wish for your reaserch.
> 
> 
> At 2018-04-17 03:15:51, "David Mathog" <mathog at caltech.edu> wrote:
> >On 16-Apr-2018 06:46, Thibaut Hourlier wrote:
> >> • Mask the genome using RepeatMasker and repbase. In some case we
> >> would use repeatmodeler to create a repeat library
> >
> >NP_999661.1.fasta    (NCBI predicted protein 3908aa)
> >dystrophin_pep.fasta (evigene predicted protein, 3908aa,
> >    differs from preceding by 12 aa changes, no indels).
> >Scaffold162.fasta    (Sea urchin genome DNA for region, not masked)
> >
> >
> >> • Align species specific data with exonerate/genewise
> >
> >Tried NP_999661 and Scaffold162 on the genewise web
> >service with these results:
> >
> >https://www.ebi.ac.uk/Tools/services/web_genewise/toolresult.ebi?jobId=genewise-E20180416-173516-0354-73422181-p2m&analysis=alignment
> >
> >The prediction is on the wrong strand.  I don't see a way to set that.
> >
> >exonerate gives slightly different, and not so great, results with
> >these two peptide sequences:
> >
> >exonerate --model protein2genome --percent 20 -q dystrophin_pep.fasta \
> >   -Q protein -T dna -t Scaffold162.fasta >/tmp/evigenePep_vs_genome.txt
> >#maps 486-2794  402842-264692
> >#maps 2798-2845 392816-392724 with 141348bp intron and then
> >#maps 2846-3668 251325-193226
> >
> >So missing 485aa at the start, 4aa in the middle, 240aa at the end, plus 
> >a few aa inside the alignments.  When run instead with NP_999661.1.fasta 
> >the first and last alignments are still found but the center one is not. 
> >  If the missing pieces of the protein query are extracted and run 
> >separately more alignments are found, but exonerate doesn't map 
> >everything in one pass.  The "--percent 20" came from Maker.  If it is 
> >lowered to 2 from 20, or omitted, then NP_999661 maps like:
> >
> >    0 -   29  517984 - 517897
> >    0 -  455  513283 - 404660
> >  485 - 2794  402842 - 264691
> >2845 - 3668  251324 - 193225
> >3667 - 3835  186617 - 380919
> >3749 - 3908  184419 - 181360
> >
> >and dystrophin_pep like:
> >    0 -   29  517984 - 517897
> >    0 -  455  513283 - 404660
> >  485 - 2794  402842 - 264691
> >2797 - 3668  392816 - 193225
> >3667 - 3835  186617 - 380919
> >3749 - 3908  184419 - 181360
> >
> >Which has things in more or less the right order, but it seems pretty 
> >confused about the 181k-392k region.
> >
> >> • Align protein from other species with genBlast
> >
> >Cheating here, using the same species, positive control...
> >
> >#downloaded genblastg, link "genblast" to it.
> >ln -s /home/mathog/src/genblast/alignscore.txt alignscore.txt
> >ln -s /home/mathog/bin/blastall blastall
> >genblast -p genblastg -q NP_999661.1.fasta -t Scaffold162.fasta \
> >   -o myoutput -gff -id 58 -cdna -pro
> >
> >and
> >
> >genblast -p genblastg -q NP_999661.1.fasta -t Scaffold162.fasta \
> >   -o myoutput -gff -id 58 -cdna -pro
> >
> >Both have the same 3725aa peptide for the best prediction.  See the 
> >attached dotplot to see which pieces are missing, which not surprisingly 
> >are somewhat correlated with the ends of the exonerate ranges. So far 
> >genblast has given the best non-NCBI mapping of this protein to the 
> >genome.  Unfortunately it is still missing chunks here and there.  
> >Perhaps one of genblast's many parameters compensates for bad sequence 
> >and will put them back in?
> >
> >Thanks,
> >
> >David Mathog
> >mathog at caltech.edu
> >Manager, Sequence Analysis Facility, Biology Division, Caltech
> 
> 
>  
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

Ben Moore
Ensembl Outreach Officer

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge
CB10 1SD
UK

bmoore at ebi.ac.uk
+44 (0)1223 494265

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180417/caf69413/attachment.html>


More information about the Dev mailing list