[ensembl-dev] Ensembl annotation pipeline software available?
Ben Moore
bmoore at ebi.ac.uk
Tue Apr 17 10:57:26 BST 2018
Hello,
You have received these e-mails because you are subscribed to the Ensembl developers list. Subscribe/unsubscribe information can be found at the bottom of each e-mail.
Best wishes
Ben
> On 17 Apr 2018, at 10:51, mawenlong <mawenlong_nwsuaf at 163.com> wrote:
>
> Hi.
> Recently, I kept receving your emails.
> But, I have done nothing about this work.
> So, is there any wrong message?
> Best wish for your reaserch.
>
>
> At 2018-04-17 03:15:51, "David Mathog" <mathog at caltech.edu> wrote:
> >On 16-Apr-2018 06:46, Thibaut Hourlier wrote:
> >> • Mask the genome using RepeatMasker and repbase. In some case we
> >> would use repeatmodeler to create a repeat library
> >
> >NP_999661.1.fasta (NCBI predicted protein 3908aa)
> >dystrophin_pep.fasta (evigene predicted protein, 3908aa,
> > differs from preceding by 12 aa changes, no indels).
> >Scaffold162.fasta (Sea urchin genome DNA for region, not masked)
> >
> >
> >> • Align species specific data with exonerate/genewise
> >
> >Tried NP_999661 and Scaffold162 on the genewise web
> >service with these results:
> >
> >https://www.ebi.ac.uk/Tools/services/web_genewise/toolresult.ebi?jobId=genewise-E20180416-173516-0354-73422181-p2m&analysis=alignment
> >
> >The prediction is on the wrong strand. I don't see a way to set that.
> >
> >exonerate gives slightly different, and not so great, results with
> >these two peptide sequences:
> >
> >exonerate --model protein2genome --percent 20 -q dystrophin_pep.fasta \
> > -Q protein -T dna -t Scaffold162.fasta >/tmp/evigenePep_vs_genome.txt
> >#maps 486-2794 402842-264692
> >#maps 2798-2845 392816-392724 with 141348bp intron and then
> >#maps 2846-3668 251325-193226
> >
> >So missing 485aa at the start, 4aa in the middle, 240aa at the end, plus
> >a few aa inside the alignments. When run instead with NP_999661.1.fasta
> >the first and last alignments are still found but the center one is not.
> > If the missing pieces of the protein query are extracted and run
> >separately more alignments are found, but exonerate doesn't map
> >everything in one pass. The "--percent 20" came from Maker. If it is
> >lowered to 2 from 20, or omitted, then NP_999661 maps like:
> >
> > 0 - 29 517984 - 517897
> > 0 - 455 513283 - 404660
> > 485 - 2794 402842 - 264691
> >2845 - 3668 251324 - 193225
> >3667 - 3835 186617 - 380919
> >3749 - 3908 184419 - 181360
> >
> >and dystrophin_pep like:
> > 0 - 29 517984 - 517897
> > 0 - 455 513283 - 404660
> > 485 - 2794 402842 - 264691
> >2797 - 3668 392816 - 193225
> >3667 - 3835 186617 - 380919
> >3749 - 3908 184419 - 181360
> >
> >Which has things in more or less the right order, but it seems pretty
> >confused about the 181k-392k region.
> >
> >> • Align protein from other species with genBlast
> >
> >Cheating here, using the same species, positive control...
> >
> >#downloaded genblastg, link "genblast" to it.
> >ln -s /home/mathog/src/genblast/alignscore.txt alignscore.txt
> >ln -s /home/mathog/bin/blastall blastall
> >genblast -p genblastg -q NP_999661.1.fasta -t Scaffold162.fasta \
> > -o myoutput -gff -id 58 -cdna -pro
> >
> >and
> >
> >genblast -p genblastg -q NP_999661.1.fasta -t Scaffold162.fasta \
> > -o myoutput -gff -id 58 -cdna -pro
> >
> >Both have the same 3725aa peptide for the best prediction. See the
> >attached dotplot to see which pieces are missing, which not surprisingly
> >are somewhat correlated with the ends of the exonerate ranges. So far
> >genblast has given the best non-NCBI mapping of this protein to the
> >genome. Unfortunately it is still missing chunks here and there.
> >Perhaps one of genblast's many parameters compensates for bad sequence
> >and will put them back in?
> >
> >Thanks,
> >
> >David Mathog
> >mathog at caltech.edu
> >Manager, Sequence Analysis Facility, Biology Division, Caltech
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
Ben Moore
Ensembl Outreach Officer
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge
CB10 1SD
UK
bmoore at ebi.ac.uk
+44 (0)1223 494265
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180417/caf69413/attachment.html>
More information about the Dev
mailing list