Abstract:
Motif discovery in unaligned DNA sequences is a challenging problem in computer science and molecular biology. Finding a cluster of numerous similar subsequences in a set of biopolymer sequences is evidence that the subsequences occur not by chance but because they share some biological function. Motifs can be used to determine evolutionary and functional relationships. Over the past few years, many motif discovery tools have been designed and make available to public. In this paper, we represent an algorithm on motif discovery developed using Genetic Algorithm (GA). Our algorithm is originally based on a popular motif finding algorithm “Finding Motifs by Genetic Algorithm” (FMGA) developed by Falcon F.M Liu, with a handful of modifications to get better result. In our approach, we try to find potential Motifs from a group of promoter sequences of transcription start site (TSS). The Genetic operations such as mutation, crossover is performed using position weight matrix to reserve the completely conserved position. A rearrangement method is used to avoid the presence of a very stable local minimum. A preprocessing function is used to relate randomly generated initial motifs with the promoter sequences and a discursion function is used to minimize the computational time. We evaluated our result based on a fitness score and occurrence frequency of a candidate motif in a group of promoter sequence. Our approach give better result than the original FMGA algorithm which itself showed superior result with comparison to two other Motif finding algorithm namely Multiple Em for motif Elicitation (MEME) and Gibbs Sampler.
Description:
This thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering of East West University, Dhaka, Bangladesh