[ensembl-dev] algorithm of FindSplitGenesOnTree
pengchy at gmail.com
Wed Mar 6 13:20:36 GMT 2013
Thank you for your reply. I am clear now.
I have noted that 236 genes in 20322 were split genes in Horse genome
(ensembl 2009 paper). Do you have a report of the results from the
Find*OnTree modules for Ensembl species. Another question is how do you
treat the split genes. Manually correct or other method.
On 2013/3/6 4:31, Matthieu Muffato wrote:
> Dear Pengcheng Yang,
> The FindSplitGenesOnTree module, as long as the FindCoreRegionLength,
> FindPartialGenesOnTree and FindSingleGenesOnTree modules, are part of
> an ongoing project to identify partial / split genes. These modules
> are not yet used in our production pipelines.
> We are currently using
> (which contains more comments).
> Right after the multiple alignment step, we search, in every tree,
> pairs of genes from the same species that satisfy one of the two
> - The two sequences do not overlap at all, and the genes are close to
> each other (less than 1 Mb), with at most 1 gene in between
> - The two sequences slightly overlap, and the genes are consecutive
> in the genome and less than 500 kb apart
> All those pairs are grouped under "split_gene" nodes in the gene
> trees, and tagged as "contiguous_gene_split" homologies.
> Hope this helps,
> Best regards,
> On 05/03/13 17:49, Pengcheng Yang wrote:
>> I want to know the algorithm of the FindSplitGenesOnTree class, so I
>> read the comments in the file
>> However, I still unclear of the algorithm background of it.
>> My understanding is:
>> 1. for the genes in one family, do multiple alignment and construct tree
>> using TreeBeST
>> 2. find the gene ids with shortest (A) and longest (B) length.
>> 3. get the gene ids (C) that next to gene A in the same branch in the
>> 4. check whether C and A have overlap greater than x aa in the multiple
>> alignment. If not, they may be one split_gene pair.
>> Is it? And where to found the documentation of the algorithm? I know one
>> way was to read the source code, but it will be understood quickly if
>> there is a documentation.
>> Thank you.
>> Pengcheng Yang
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev