In this thesis, we explore the impact of problem representation on the ability for the genetic algorithms (GA) to evolve a binary prediction model to predict whether a physical therapist is paid above or below the median amount from Medicare. We explore three different problem representations, the vector GA (VGA), the binary GA (BGA), and the proportional GA (PGA). We find that all three representations can produce models with high accuracy and low loss that are better than Scikit-Learn’s logistic regression model and that all three representations select the same features; however, the PGA representation tends to create lower weights than the VGA and BGA. We also find that mutation rate creates more of a difference in accuracy when comparing the individual with the best fitness (lowest binary cross entropy loss) and the most accurate solution when the mutation rate is higher. We then explore potential of biases in the PGA mapping functions that may encourage the lower values. We find that the PGA has biases on the values they can encode depending on the mapping function; however, since we do not find a bias towards lower values for all tested mapping functions, it is more likely that it is more difficult for the PGA to encode more extreme values given crossover tends to have an averaging effect on the PGA chromosome.
Bachelor of Science (B.S.)
College of Engineering and Computer Science
Maldonado, Stephen V., "Genetic Algorighm Representation Selection Impact on Binary Classification Problems" (2022). Honors Undergraduate Theses. 1249.