Keywords

Physical activity, machine learning, sensor data.

Description

In this study, we developed a machine learning-based framework for detecting human physical activities—specifically distinguishing between walking and running—using time-series sensor data collected from wearable devices. The dataset utilized in this research was obtained from Kaggle and consists of approximately 88,588 samples collected via iPhone 5c sensors, including accelerometer and gyroscope readings. To improve the predictive capability of our models, we implemented advanced data preprocessing techniques, including sorting data based on timestamps, feature scaling, and extensive feature engineering. We created lag features and rolling statistics to better capture temporal dependencies and motion patterns inherent in the data. Multiple supervised machine learning models were trained and evaluated, including Logistic Regression, Random Forest, Gaussian Naive Bayes, k-Nearest Neighbors (KNN), and Extreme Gradient Boosting (XGBoost). Hyperparameter tuning was performed for each model to optimize performance. The models were evaluated using key classification metrics such as accuracy, precision, recall, and F1-score. The addition of lag and rolling statistical features significantly improved model performance, with Logistic Regression achieving perfect classification metrics after feature augmentation. Other models such as Random Forest, KNN, and XGBoost also demonstrated near-perfect classification capabilities. The results highlight the importance of temporal feature engineering in enhancing human activity recognition tasks using wearable sensor data. This work contributes to the growing field of health monitoring, sports performance analysis, and real-time activity tracking using wearable devices.

Abstract

This study focuses on detecting physical activity using wearable sensor data, specifically distinguishing between walking and running. A dataset comprising accelerometer and gyroscope readings is used to train and evaluate various machine learning models, including logistic regression, random forest, k-nearest neighbors, naïve Bayes, and XGBoost. Extensive preprocessing, such as creating lag features and rolling statistics, is performed to enhance temporal data representation. The models are evaluated using metrics like accuracy, precision, recall, and F1 score. Incorporating lag and rolling features significantly improves model performance, with logistic regression achieving perfect scores across all metrics. These findings demonstrate the effectiveness of enhanced feature engineering for time-series data in human activity recognition and highlight the potential of wearable sensors in monitoring physical activities.

Course Name

STA 6366 Data Science 1

Instructor Name

Dr. RUI XIE

Rights

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

College

College of Sciences

Share

COinS