Document Type : Research Article


Information Technology Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.


Motif discovery is a challenging problem in bioinformatics. It is an essential step towards understanding gene regulation. Although numerous algorithms and tools have been proposed in the literature, the accuracy of motif finding is still low. In this paper, we tackle the motif discovery problem using ensemble methods. A review and classification of current ensemble motif discovery tools is presented. We then propose our Cluster-based Ensemble Motif Discovery Tool (CEMD) which is based on k-medoids clustering of state-of-art stand-alone motif finding tools. We evaluate the performance of CEMD on benchmark datasets and compare the results to both stand-alone and similar ensemble tools. Experimental results indicate that CEMD has better sensitivity than state-of-art stand-alone tools when dealing with human datasets. CEMD also obtains better values of sensitivity when motifs are implanted in real promoter sequences. As for the comparison of CEMD with ensemble motif discovery tools, results indicate that CEMD achieves better results than MEME-ChIP on all evaluation measures. CEMD shows comparable performance to RSAT peak-motifs and MODSIDE.