Machine learning-based menstrual phase identification using wearable device data

Kilungeja, Grentina; Graham, Krystal; Liu, Xudong; Nasseri, Mona

doi:10.1038/s44294-025-00078-8

Download PDF

Article
Open access
Published: 13 May 2025

Machine learning-based menstrual phase identification using wearable device data

Grentina Kilungeja¹,
Krystal Graham²,
Xudong Liu¹ &
…
Mona Nasseri²

npj Women's Health volume 3, Article number: 29 (2025) Cite this article

282 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

This study applies machine learning to identify menstrual cycle phases using physiological signals recorded from a wrist-worn device. These signals include skin temperature, electrodermal activity (EDA), interbeat interval (IBI), and heart rate (HR), and were collected without requiring participant input. Data from 65 cycles across 18 subjects were analyzed, and multiple classifiers including random forest (RF) models were trained to classify the phases. Using a leave-last-cycle-out approach, and features from non-overlapping fixed-size windows, the RF model achieved 87% accuracy and an area under the receiver operating characteristic curve (AUC-ROC) of 0.96 when classifying three phases (period, ovulation, and luteal). For daily phase tracking using a sliding window, the RF model achieved 68% accuracy and an AUC-ROC of 0.77 when classifying four phases (period, follicular, ovulation, luteal). While these results highlight the potential of wrist-based physiological signals to enable automated phase tracking, reduce the burden of self-reporting, and improve access to cycle tracking solutions, further validation is needed to enhance the results.

Sleep classification from wrist-worn accelerometer data using random forests

Article Open access 08 January 2021

Wearable sensors enable personalized predictions of clinical laboratory measurements

Article 24 May 2021

The impact of healthy pregnancy on features of heart rate variability and pulse wave morphology derived from wrist-worn photoplethysmography

Article Open access 30 November 2023

Introduction

The menstrual cycle is a fundamental biological process in the female reproductive system, involving intricate hormonal changes and structural transformations in the ovaries and uterus. The key hormones involved are follicle-stimulating hormone (FSH), luteinizing hormone (LH), estrogen, and progesterone. These hormones orchestrate the menstrual cycle, which is broadly divided into two main phases: the follicular phase, encompassing menstruation and ending with ovulation, and the luteal phase, which follows ovulation (Fig. 1). For classification purposes, a cycle was further divided into four distinct phases; Menses: This marks the beginning of the cycle, characterized by menstrual bleeding and low levels of estrogen and progesterone. Follicular: Following menses, this phase ends before the LH surge. FSH levels decline while the selected follicle matures in preparation for ovulation. Ovulation: This phase encompasses the LH surge and the release of a mature egg from the ovary. While ovulation itself occurs shortly after the LH surge (as shown in Fig. 1), in this paper, this phase was defined as the period spanning 2 days before to 3 days after the positive LH test. Luteal: After ovulation, the ruptured follicle forms the corpus luteum, which produces progesterone to prepare the uterus for a potential pregnancy. If pregnancy does not occur, hormone levels decline, and a new cycle begins¹.

**Fig. 1: Hormonal and BBT changes throughout the normal menstrual cycle.**

These phases and hormone fluctuations are essential for reproductive health and fertility². The hormonal changes in the menstrual cycle are typically associated with physiological changes in a woman, and therefore, physiological signals recorded from the body can provide valuable insights into these hormonal fluctuations during different phases of the cycle³. For example, Fig. 1, depicts the changes in hormones as basal body temperature changes.

Hormonal interactions regulate the menstrual cycle, which demonstrates significant variability both within and between individuals. Accurate tracking and prediction of menstrual cycle phases continue to be an active research area. Optimal practices for studying menstrual cycle phases were reviewed in⁴ emphasizing the importance of tracking daily symptoms, hormone levels, basal body temperature (BBT) measurements, and day-counting techniques relative to the onset of menses. While these guidelines are complex and rigorous–potentially limiting their feasibility in some research settings–they provide valuable methodologies for analyzing menstrual cycles. This paper serves as a guide for researchers aiming to conduct robust and informative analyses of menstrual cycles⁴.

Earlier methods for tracking menstrual cycles primarily relied on BBT to confirm the occurrence of ovulation. This method involves tracking slight temperature changes after ovulation due to increased progesterone levels. While widely used, BBT monitoring requires consistent daily measurements and can be affected by external factors, leading to potential inaccuracies^5,6. To improve this, some researchers have integrated BBT monitoring with urinary LH detection in commercial point-of-care (POC) devices. One research utilized OvuSense, a vaginal temperature sensor, demonstrating a 99% accuracy rate for detecting ovulation and an 89% accuracy rate for predicting it. These advancements have made BBT monitoring more reliable and user-friendly, enhancing ovulation tracking for natural family planning and fertility monitoring⁷.

In an effort to address some limitations of traditional BBT tracking, Luo et al. proposed using an in-ear wearable sensor that continuously measures temperature every five minutes during sleep. By applying a Hidden Markov Model to analyze data from 39 cycles from 22 women, the study achieved an accuracy of 76.92%, correctly identifying the occurrence of ovulation in 30 out of the 39 cycles⁸.

Other studies have explored using electrocardiogram (ECG) and heart rate variability (HRV) features to classify menstrual phases^9,10. In a study by Champaty et al. a 6-minute ECG signal was recorded from 14 women collected on the 1st, 13th, and 21st day of their menstrual cycles. They classified three phases (follicular, ovulation, luteal) using multilayer perceptron (MLP) and radial basis function (RBF) classifiers. Statistical analysis, classification and regression trees (CART) and bagging trees (BT) algorithms were implemented for feature selection. The RBF network achieved a high accuracy of 95% when using HRV features for phase classification, and 90% accuracy when using ECG features¹¹.

Recent studies have integrated multi-parameter wearable sensors with machine learning techniques. One study used a web-based medical device, OvulaRing, to measure circadian core body temperature every five minutes throughout the menstrual cycle. Data from 158 women collected over 15 months provided information on 470 cycles. They found that 83.4% of cycles showed a biphasic temperature pattern indicative of ovulation, with prospective prediction of the fertile window with accuracy of 88.8%¹². Another study utilized a wristband device worn at night by 237 women with regular cycles for up to a year. By utilizing features such as skin temperature, HR, and perfusion, they trained a random forest model that achieved a 90% accuracy rate in predicting the fertile window¹³.

In¹⁴ Oura ring has been used to monitor sleep quality metrics, HR, HRV, and skin temperature across the menstrual cycle. The study focused on statistical comparisons of physiological signals across four phases (menses, ovulation, mid-luteal, and late-luteal). The findings revealed significant differences in temperature and HR across phases, but limitations included small sample size and examination of only one cycle per woman (26 volunteers). Nonetheless, this research provided valuable insights into physiological variations across the menstrual cycle, laying the groundwork for potential predictive models.

In a study, wrist temperature and HR data were collected from over 100 women with both regular and irregular menstrual cycles using an ear thermometer and Huawei Band 5. Machine learning models integrating this multi-modal data predicted the fertile window with an accuracy of 87.46% for regular cycles and 72.51% for irregular cycles¹⁵. In another study transfer learning was implemented to re-train a deep residual neural network (ResNet) to classify the three phases of the menstrual cycle (luteal, menstruation, and follicular) based on pulse signal data. Wrist pulse signals were collected from 120 female volunteers, with long-term data tracked for three months from one participant. Initially, a model was trained using data from 100 volunteers and then fine-tuned with data collected during the first two months from one ambulatory volunteer. When tested on the third month of her data, the model reached an accuracy rate of 81.8%, highlighting the significance of a personalized approach¹⁶.

Existing literature emphasizes the progress made in wearable sensors and machine learning techniques for monitoring and tracking menstrual cycle phases. Different studies have shown varying levels of accuracy and identified certain limitations. These advancements hold great promise for applications in fertility awareness and women’s health monitoring. However, to improve menstrual cycle phases prediction, alternative techniques that can incorporate a broader range of physiological signals should be further explored.

This study aimed to develop and compare various classification models using a diverse set of physiological signals collected from a wristband device–HR, IBI, EDA, and temperature–to identify menstrual cycle phases. By introducing innovative feature extraction methods and demonstrating the importance of individualized algorithms, the study also evaluated the models’ performance in real-world applications, highlighting their potential for personalized health monitoring.

Results

This section evaluates the performance of machine learning models in predicting menstrual cycle phases using data collected from 18 of 22 participants who wore E4 and EmbracePlus wristbands for 2 to 5 months. These devices recorded physiological signals, including HR, EDA, temperature, accelerometry (ACC), and IBI. Four participants were excluded: three due to the absence of a positive LH test (8 cycles) and one due to missing data (2 cycles). This left 65 ovulatory cycles for analysis.

The evaluation was conducted using datasets derived from different data labeling and feature engineering methods to ensure a robust assessment of the models under varying conditions. Two feature extraction approaches, fixed window and rolling window, were applied, and the results are presented below.

Models performance: fixed window technique

The performance metrics summarized in Table 1 were evaluated using a dataset classified according to the reference definitions of the four phases. Features for these phases were extracted using a fixed window technique, and the data was split using a leave-last-cycle-out approach as described in the ‘Methods’ section under ‘Data Labeling’ and ‘Data Partitioning’. In this methodology, data from the initial 47 cycles was combined to train the models, while the last 18 cycles from 18 ovulatory subjects, were combined and used for testing, with the goal of identifying four distinct phases (P, F, O, and L). The random forest classifier demonstrated the best performance, achieving an accuracy of 71%. Additionally, ROC curve analysis revealed consistent performance across phases, with an overall area under the curve (AUC) score of 89% (Fig. 2). The highest AUC scores were observed in predicting the ovulation phase.

Table 1 Performance metrics for four models on a dataset with four phases, using fixed window extraction technique and leave-last-cycle-out splitting

Full size table

**Fig. 2: ROC Curves for four models on a dataset with four phases, using fixed window extraction technique and leave-last-cycle-out splitting.**

To evaluate generalizability, the leave-one-subject-out approach was employed, where data from all but one subject was used for training, and the remaining subject’s data was used for testing. Logistic regression performed better than other models in this setting, achieving an average accuracy of 63%. Detailed metrics for each subject are presented in Supplementary Table 1.

Table 2 presents the performance metrics of four models evaluated on a dataset classified based on the reference definitions of the three phases. Features were extracted using a fixed window technique, and the data was split using a leave-last-cycle-out approach. The goal of this methodology was to predict three distinct phases: P, O, and L. Among the evaluated models, the random forest model demonstrated the highest overall performance, achieving an accuracy, precision, recall, and F1 score of 87%. The ROC curves and AUC scores indicate that the random forest model achieved the best performance across all classes, with an overall AUC score of 96%. As illustrated in Fig. 3, all models performed well in predicting all phases, with the highest AUC scores observed in classifying the ovulation phase.

Table 2 Performance metrics for four models on a dataset with three phases using fixed window extraction technique and leave-last-cycle-out splitting

Full size table

**Fig. 3: ROC Curves for four models on three-phase dataset using fixed window extraction technique and leave-last-cycle-out splitting.**

When the leave-one-subject-out approach was applied to the three-phase dataset, the random forest model maintained superior performance, achieving an average accuracy of 87% (Supplementary Table 2).

Models Performance: Rolling window technique

Performance metrics in Table 3 were evaluated for four models using a dataset classified based on the reference definitions of the four phases (P, F, O, L) using rolling window extraction and leave-last-cycle-out splitting techniques. In rolling window extraction technique, features were extracted using a sliding window, and the label of the last day determined the segment label. Among the evaluated models, the random forest classifier demonstrated the best performance, achieving an accuracy of 68%. The ROC curves in Fig. 4 demonstrate that all models consistently achieved high AUC scores for predicting the L phase. The SVM model achieved the best performance with the overall AUC score of 80%.

Table 3 Performance metrics for four models on a dataset with four phases using rolling window extraction technique and leave-last-cycle-out splitting

Full size table

**Fig. 4: ROC Curves for four models on a dataset with four phases using rolling window extraction technique and leave-last-cycle-out splitting.**

Evaluation using the leave-one-subject-out approach showed comparable performance among SVM, random forest, and logistic regression models, each achieving an average accuracy of 65%. However, the decision tree model lagged behind with an accuracy of 58% (Supplementary Table 3).

For the three-phase dataset, the leave-last-cycle-out approach yielded similar performance across models. Logistic regression achieved the highest AUC score of 81% (Fig. 5), highlighting its effectiveness in this context. Table 4 provides detailed metrics for these evaluations.

**Fig. 5: ROC Curve for four models on a dataset with three phases, using rolling window extraction technique and leave-last-cycle-out splitting.**

Table 4 Performance metrics for four models on a dataset with three phases, using rolling window extraction technique and leave-last-cycle-out splitting

Full size table

Finally, applying the leave-one-subject-out approach to the three-phase dataset revealed that logistic regression achieved the highest average accuracy of 61%, followed closely by SVM with 60%. Supplementary Table 4 provides further details of these evaluations.

Discussion

This study contributes to women’s health by exploring how various machine learning models can accurately identify menstrual cycle phases using multimodal signals from wearable devices. Previous research has shown that while temperature remains the primary indicator of biphasic patterns in ovulatory cycles, EDA and HR also display notable non-uniform patterns that can assist in detecting menstrual phases¹⁷. Building on these findings, we incorporated these additional signals to enhance the reliability and robustness of machine learning model performance.

Additionally, to evaluate the models performance in different settings, several feature extraction and data splitting techniques were implemented. The leave-last-cycle-out data split approach emphasizes the importance of using data from all subjects and provides some individualized information, while the leave-one-subject-out method presents a more generalized model. It is noted that not all four specific phases always show distinct patterns for every individual, and the late-follicular phase, depending on the cycle length, can be difficult to capture for many subjects. This was confirmed by the results, which showed that the follicular phase detection had the lowest AUC among all phases. However, using a rolling window technique to extract features and assigning the follicular phase to data between ovulation and menses improves the overall performance compared to the three phases label assignment. This improvement is likely due to the limitations of the initial labeling technique. In the rolling window approach, every day is labeled, and three classes might not adequately capture the differences between phases. Additionally, a significant portion of the data between the P and L phases was labeled as O, which does not follow a distinct pattern, leading to suboptimal performance.

Among several machine learning techniques, for the fixed window feature extraction technique, the random forest model demonstrated the highest overall performance in predicting three distinct phases (P, O, and L). It achieved an accuracy, precision, recall, and F1 score of 87%, along with an AUC score of 96%, indicating strong performance across most classes in leave-last-cycle-out approach. When using the leave-one-subject-out approach on the three-phase dataset, the random forest model still outperformed other models with an accuracy of 87%.

The rolling window technique, which better represents model performance in real-world use, showed that the random forest model achieved the best performance in identifying four phases (P, F, O, L) with an accuracy of 68% in leave-last-cycle-out approach. The ROC curves for the four-phase dataset indicated that all models achieved the highest AUC scores for predicting the L phase, with the SVM model demonstrating the highest overall AUC score of 81%.

These findings highlight the robustness and effectiveness of these models in handling complex, temporal data, offering promising potential for real-time, personalized health monitoring. By incorporating advanced techniques such as deep learning and transfer learning methods, future research can further enhance the performance and generalizability of these models. This advancement could lead to the integration of accurate phase prediction models into wearable technology, providing women with valuable insights and improved management of their menstrual health.

However, while this study demonstrated the potential of wrist-worn physiological signals for menstrual cycle phase tracking, certain limitations should be considered. The relatively small sample size of 18 ovulatory participants and 65 cycles may limit the generalizability of the findings, as it might not capture the full range of physiological variations across a more extensive and diverse population. Additionally, factors related to the demographic composition of the participants, such as ethnicity and overall health, were not explicitly considered. These demographic factors could influence physiological responses and, consequently, the model’s performance.

Another potential limitation is the risk of bias introduced during data collection. Wrist-worn devices, while convenient, may suffer from inconsistent placement, variations in adherence to study protocols, or environmental factors affecting signal quality. These issues could introduce noise into the data, potentially impacting model accuracy. Future studies with larger, more diverse populations and additional validation steps are essential to enhance the robustness and generalizability of the proposed method.

Methods

This section provides a detailed description of the collected data and outlines the methodology used for data analysis and algorithm development.

Data collection

The study was approved by the University of North Florida Institutional Review Board (IRB #1800628). All participants provided written informed consent. This research complies with ethical principles, including those outlined in the Declaration of Helsinki.

To collect data continuously, Empatica E4 and EmbracePlus wristbands were used, which record various physiological signals. These signals-ACC, HR, IBI, blood volume pulse (BVP) derived from photoplethysmography (PPG), EDA, and body temperature-were recorded from female subjects at the University of North Florida. The recorded data encompasses 75 cycles from 22 subjects (Table 5). Of these, 43 menstrual cycles from fifteen subjects over 2 to 4 months were recorded using the E4 wristband. This data was downloaded in CSV format via Empatica’s Web Portal, E4 Connect. The E4 records PPG, ACC, temperature, and EDA at sampling rates of 64 Hz, 32 Hz, 4 Hz, and 4 Hz, respectively. HR, BVP, and IBI are derived from the PPG data¹⁸. IBI, in particular, is crucial as it provides detailed information about beat-to-beat variations, enabling the assessment of HRV, a key indicator of autonomic nervous system activity and overall cardiovascular health.

Table 5 Demographic information of study participants and recorded cycle data, including number of menstrual cycles, mean, and standard deviation of cycle lengths

Full size table

The remaining data, consisting of 32 cycles from 7 subjects who wore the device for 3 to 5 months, were recorded using EmbracePlus. This data was downloaded via Amazon Web Services (AWS) in Avro format, with sampling frequencies of 64, 64, 4, 1, and 1/60 Hz for, PPG, accelerometry, EDA, temperature and HR. Similar features were extracted from data collected by both devices and stored in the same format.

Throughout the study, each participant maintained a personalized calendar to record menstruation and positive LH test days. Ovulation occurrence was confirmed using urine test strips that detect LH. Sleep data was used due to its higher accuracy, reduced motion artifacts, and lower noise levels, ensuring reliable results. The sleep period was identified by selecting intervals with minimal changes in hand angle, derived from ACC data¹⁷. Key parameters, such as HR thresholds (< 90 bpm), sleep duration (> 1 hour), and extended movement, were also considered in the algorithm’s development¹⁷.

Data processing and feature extraction

A comprehensive set of features were extracted from the raw physiological data recorded during sleep, including temperature, HR, IBI, and EDA. These features spanned various domains, including time, frequency, and time-frequency. The extracted features represented daily values, with a single value extracted for each feature per day. The initial extracted features are presented in Table 6.

Table 6 Initial extracted features presenting daily values

Full size table

Features such as mean, median, standard deviation, skewness, and kurtosis provided insights into the data distribution. Mean and median power of IBI were extracted from the high-frequency (0.15–0.4 Hz) and the low-frequency bands (0.04–0.15 Hz).

To extract features from the EDA signal, the signal was first decomposed into two components¹⁹: the tonic component, representing baseline EDA that indicates overall arousal or relaxation, and the phasic component, which captures short-term emotional responses to stimuli. The signal magnitude area of the phasic component, which is the sum of absolute values of the signal, quantifies signal strength. Normalized power was calculated in frequency ranges between 0.1 and 0.5 Hz in steps of 0.1. The area under the curve is the area bounded by the EDA curve, while mean peak count and mean peak amplitude indicate arousal frequency and intensity. Peak width provides information about the duration of these responses. The mel-frequency cepstrum coefficient (MFCC) delves into the EDA signal power spectrum. This assortment of parameters forms a holistic framework for profound physiological assessment²⁰.

Feature selection and engineering

To enhance the dataset’s informative value, a multifaceted approach was implemented, as outlined below.

Feature Selection: Building on the previous discussion, features from temperature, EDA, HR, and IBI were selected for their physiological relevance to menstrual phase identification. Temperature reflects hormonal variations associated with different phases of the menstrual cycle. EDA captures changes in sympathetic nervous system activity, which may vary with hormonal fluctuations. HR and IBI were included due to their roles in capturing overall cardiovascular and autonomic system activity, which are influenced by hormonal changes during the menstrual cycle¹⁷. The features were specifically chosen for their potential to capture physiological variations associated with menstrual phases.

To further refine the selection, Python’s SelectKBest with the F-statistic function was employed to identify features with the highest correlation to the target variable. This method ensured feature selection was based solely on the statistical relationship between features and labels, independent of the subsequent model evaluation process. The selected features included temperature, HR, IBI mean and median, IBI powers in high and low-frequency bands, and all EDA phasic component features²¹.

Feature Engineering: Two methods were used to extract secondary feature sets from 6-day data segments, where each point represented a daily value. In the Fixed Window Technique, features were computed from non-overlapping data segments, as shown in Fig. 6. Alternatively, the Rolling Window Technique extracted features using sliding windows with a 5-day overlap, revealing evolving patterns within individual subject datasets. This approach is illustrated in Fig. 7.

**Fig. 6: Representation of Fixed window technique on a 29 days menstrual cycle plot.**

**Fig. 7: Representation of Rolling window technique on a 29 days menstrual cycle plot.**

The secondary extracted features using both, fixed window and rolling window technique capture crucial characteristics of the data, including, mean, median, standard deviation, slope and the value of each data point within a window. Additional features were extracted to capture changes within windows, including; maximum positive change (largest positive difference between consecutive data points) and maximum negative change (smallest negative difference between consecutive data points).

Data labeling

Two strategies were employed to identify the phases: cycles were classified into either three phases (P, O, L) or four phases (P, F, O, L) based on reference menstrual phases (as defined in the Introduction). Labels were assigned to each day or segment to provide an appropriate framework for classifier training. Here, P stands for period, F for follicular, O for ovulation, and L for luteal.

For the Fixed Window technique (Fig. 6), data was labeled in two distinct ways. In the first approach, the cycle was divided into three phases: the first 6 days were labeled as P, the day of a positive LH test, along with the two days before and three days after, was labeled as O, and the last six days of the cycle were labeled as L. Data outside these labeled phases were not included in the analysis. In the second approach, data was categorized into four phases: the first 6 days were labeled as P, the day of a positive LH test, along with the two days before and three days after, was labeled as O, the six days between P and O were labeled as F, and the last six days were labeled as L. For subjects with shorter cycle lengths where some phases were less than 6 days, one or two days from adjacent phases were appended to these shorter phases, and data outside of the labeled phases were not included in the analysis.

In the Rolling Window technique, two approaches were used to divide and label the dataset. In the first approach, the cycle was split into three phases: the first 8 days as P, the last 8 days as L, and the days in between as O. In the second approach, the cycle was divided into four phases: Period days, as reported by the subject, were labeled as P; the two days before and three days after a positive LH test, including the day of the positive LH test, were labeled as O; the days between P and O were labeled as F; and the remaining days after the last O day were labeled as L (Fig. 7). Features were extracted using a window size of 6-day, with the label of the last day in each window determining the segment label.

Data partitioning

Two techniques were employed to partition the data into training and test sets. The first technique, leave-last-cycle-out, involved using data from the last cycle of each subject as the test set, with the remaining data from earlier cycles reserved for training the model. The second technique, leave-one-subject-out, utilized data from all subjects except one for training, while the data from the excluded subject was used for testing. These methods were designed to provide a comprehensive assessment of model performance across different feature extraction and dataset partitioning strategies.

Training and evaluation of the models

Four machine learning models were trained and evaluated: decision tree, random forest, logistic regression, and support vector machine^22,23. The dataset was partitioned using the two approaches mentioned in the previous section, “Data partitioning”.

To enhance model performance and mitigate overfitting, hyperparameter tuning was conducted using GridSearchCV and randomized search. GridSearchCV systematically evaluates all combinations within a predefined hyperparameter grid, employing cross-validation to identify the optimal configuration based on metrics like accuracy or F1 score. Randomized search, on the other hand, samples a subset of the hyperparameter space, making it particularly efficient for large search spaces or when computational resources are constrained. Additionally, two data partitioning techniques, leave-last-cycle-out and leave-one-subject-out, were employed. These approaches ensured that models were tested on unseen data, providing a rigorous evaluation of their generalizability. Below is a detailed overview of each model and the rationale for its selection:

A decision tree is a non-parametric supervised learning method used for classification and regression. It effectively captures complex interactions between features, making it valuable for understanding the relationship between menstrual cycle phases and physiological features. The decision tree model was fine-tuned with a maximum depth of 5, a minimum samples split of 2, and a minimum samples leaf of 2 to prevent overfitting^24,25.

Random forest is an ensemble learning method that constructs multiple decision trees during training and outputs the class that is the mode of the classes or mean prediction of the individual trees. It reduces overfitting by averaging multiple trees, thus improving predictive performance. Random forests are robust to overfitting and can handle a large number of features, making them suitable for this study where multiple physiological features are used for phase prediction. Employing an ensemble approach with 100 decision trees, the random forest model was parameterized with a maximum depth of 5 and ‘sqrt’ as the criterion for determining the maximum features.

Logistic regression is a statistical model that uses a logistic function to model a binary dependent variable. It’s effective in handling large datasets and can be extended to multiclass classification using techniques like one-vs-rest. The logistic regression model underwent parameterization with a regularization parameters of 0.1, ‘l2’ penalty, and a maximum of 100 iterations.

SVM is a powerful classification technique that works by finding the hyperplane that best divides a dataset into classes. SVM is well-suited for this study because of its ability to handle high-dimensional data and its robustness in identifying complex decision boundaries between different classes. The SVM model was configured with a regularization parameter of 0.1, a radial basis function (‘linear’) as the kernel, and a degree of 3. These parameters were chosen to strike a balance between capturing complex decision boundaries and maintaining computational efficiency^24,25.

Evaluation metrics

The effectiveness of the phase identification models was assessed using following performance metrics: accuracy, precision, recall, and F1-score. Accuracy measures the proportion of correctly predicted phases out of all predictions. Precision evaluates the accuracy of the positive predictions, recall assesses the model’s ability to identify all actual positives, and the F1-score provides a harmonic mean of precision and recall, balancing the two metrics. Additionally, ROC curves were plotted for each class, with the micro-average calculated by aggregating the ROC curves of all classes into a single curve. The ROC provides insights into the model’s performance across various thresholds. The AUC was computed for each ROC curve, providing a quantitative measure of the models’ discriminative power, with higher AUC values indicating better performance.

Data availability

The minimal dataset necessary to replicate the findings of this study is available from the corresponding author (M.N.) upon reasonable request.

Code availability

The sample codes or portions of the code used for data analysis in this study are available from the corresponding author (M.N.) upon reasonable request.

References

Bull, J. R. et al. Real-world menstrual cycle characteristics of more than 600,000 menstrual cycles. npj Digit. Med. 2, 83 (2019).
Article PubMed PubMed Central Google Scholar
Thiyagarajan, D. K., Basit, H. & Jeanmonod, R. Physiology, menstrual cycle. Physiol., Menstrual Cycle. https://www.ncbi.nlm.nih.gov/books/NBK500020/ (2023).
Subathra, P. & Malarvizhi, S. A Comparative analysis of regression algorithms for prediction of emotional states using peripheral physiological signals. 2023 International Conference on Recent Advances in Electrical, Electronics, Ubiquitous Communication, and Computational Intelligence (RAEEUCCI) 1–6 https://doi.org/10.1109/RAEEUCCI57140.2023.10134253 (2023).
Schmalenberger, K. M. et al. How to study the menstrual cycle: Practical tools and recommendations. Psycho Neuroendocrinol. 123, 104895 (2021).
Article Google Scholar
Bortot, P., Masarotto, G. & Scarpa, B. Sequential predictions of menstrual cycle lengths. Biostatistics 11, 741–755 (2010).
Article PubMed Google Scholar
Bauman, J. E. Basal body temperature: Unreliable method of ovulation detection. Science Direct 36, 729–733 (1981).
CAS Google Scholar
Su, W., Yi, C., Wei, Y., Chang, C. & Cheng, M. Detection of ovulation, a review of currently available methods. Bioeng. Transl. Med. 2, 238–246 (2017).
Article PubMed PubMed Central CAS Google Scholar
Luo, L. et al. Detection and prediction of ovulation from body temperature measured by an In-Ear wearable thermometer. IEEE Trans. Biomed. Eng. 67, 512–522 (2020).
Article PubMed Google Scholar
Rawal, K., Sethi, G., Saini, B. S. & Saini, I. Effect of heart rate variations in the menstrual cycle using linear methods. 2018 International Conference on Intelligent Circuits and Systems (ICICS) 1–5 https://doi.org/10.1109/ICICS.2018.00013 (2018).
Yuda, E. & Hayano, J. Changes in heart rate dynamics with menstrual cycles. Intell. Comput. Optim. ICO 2019. Adv. Intell. Syst. Comput. 1072, 138–147 (2020).
Google Scholar
Champaty, B., Bhandari, S., Pal, K. & Tibarewala, D. N. Artificial intelligence based classification of menstrual phases in amenorrheic young females from ECG signals. 2013 Annual IEEE India Conference (INDICON) 1–6 (2013).
Regidor, P. A. et al. Identification and prediction of the fertile window with a new web-based medical device using a vaginal biosensor for measuring the circadian and circamensual core body temperature. Gynecol. Endocrinol. 34, 256–260 (2018).
Article PubMed CAS Google Scholar
Goodale, B. M. et al. Wearable sensors reveal menses-driven changes in physiology and enable prediction of the fertile window: observational study. J Med Internet Res. 21, e13404 (2019).
Article PubMed PubMed Central Google Scholar
Alzueta, E. et al. Tracking sleep, temperature, heart rate, and daily symptoms across the menstrual cycle with the Oura ring in healthy Women. Int. J. Women’s. Health 14, 491–503 (2019).
Article Google Scholar
Yu, J. L., Su, Y. F. & Zhang, C. Tracking of menstrual cycles and prediction of the fertile window via measurements of basal body temperature and heart rate as well as machine-learning algorithms. Reprod. Biol. Endocrinol. 20 https://doi.org/10.1186/s12958-022-00993-4 (2022).
Wang, X., Yu, J., Guo, J. & Huang, Z. ResNet based classification of female menstrual circle from pulse signal. 2019 IEEE 5th International Conference on Computer and Communications (ICCC) 640–644 (2019). https://doi.org/10.1109/ICCC47050.2019.9064306 (2019).
Sides, K. et al. Analyzing physiological signals recorded with a wearable sensor across the menstrual cycle using circular statistics. Frontiers in Network Physiology 3 https://doi.org/10.3389/fnetp.2023.1227228 (2023).
Empatica. Decoding wearable sensor signals - what to expect from your E4 Data. Empatica (2021).
Djawad, Y. A., Saharuddin, J., Ridwansyah, H. & Thayeb, M. Proficiency test analysis of a simple electro-dermal activity measurement technique for measuring an emotional task. AIP Conf. Proc. 2155, 020050 (2019).
Article Google Scholar
Shukla, J., Barreda-Angeles, M., Oliver, J., Nandi, G. C. & Puig, D. Feature extraction and selection for emotion recognition from electrodermal activity. IEEE Trans. Affect. Comput. 12, 857–869 (2019).
Article Google Scholar
Pilnenskiy, N. & Smetannikov, I. Modern implementations of feature selection algorithms and their perspectives. 2019 FRUCT Conference 250–256 https://doi.org/10.23919/FRUCT48121.2019.8981498 (2019).
Bobade, P. & Vani, M. Stress detection with machine learning and deep learning using multimodal physiological data. 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA) 51–57 https://doi.org/10.1109/ICIRCA48905.2020.9183244 (2020).
Chakrapani, P. & Chitradevi, D. Simulation of machine learning techniques to predict academic performance. 2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC) 329–334 https://doi.org/10.1109/ICESIC53714.2022.9783487 (2022).
Rajeswari, K. et al. Comparative analysis of various machine learning algorithms for stock price prediction. 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON) 1–6 https://doi.org/10.1109/SMARTGENCON56628.2022.10084008 (2022).
Angayarkanni, G. & Hemalatha, S. Evaluating the performance of supervised machine learning algorithms for predicting multiple diseases: A Comparative Study. 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS) 1–7 https://doi.org/10.1109/ICACCS57279.2023.10113100 (2023).

Download references

Acknowledgements

This study was supported by the National Science Foundation under grant CBET-2138378. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Science Foundation. The publication was supported by a University of North Florida Faculty Publishing Grant.

Author information

Authors and Affiliations

School of Computing, University of North Florida, Jacksonville, FL, USA
Grentina Kilungeja & Xudong Liu
School of Engineering, University of North Florida, Jacksonville, FL, USA
Krystal Graham & Mona Nasseri

Authors

Grentina Kilungeja
View author publications
You can also search for this author inPubMed Google Scholar
Krystal Graham
View author publications
You can also search for this author inPubMed Google Scholar
Xudong Liu
View author publications
You can also search for this author inPubMed Google Scholar
Mona Nasseri
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

G.K. and M.N. designed the study and analyzed the data. M.N. collected, annotated, and managed the data for this project. G.K. drafted the manuscript. All authors contributed technical and methodological input to the study and provided substantial edits to the manuscript.

Corresponding author

Correspondence to Mona Nasseri.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Kilungeja, G., Graham, K., Liu, X. et al. Machine learning-based menstrual phase identification using wearable device data. npj Womens Health 3, 29 (2025). https://doi.org/10.1038/s44294-025-00078-8

Download citation

Received: 21 September 2024
Accepted: 25 April 2025
Published: 13 May 2025
DOI: https://doi.org/10.1038/s44294-025-00078-8