A SEMI-SUPERVISED HYBRID MODEL FOR ANOMALY DETECTION: LEVERAGING AUTOENCODER FEATURE EXTRACTION AND RANDOM FOREST CLASSIFICATION

Authors

  • Sakshi Jethwa The Maharaja Sayajirao University of Baroda Author
  • Hetal Bhavsar The Maharaja Sayajirao University of Baroda Author
  • Kshitij Gupte The Maharaja Sayajirao University of Baroda Author
  • Anjali Jivani The Maharaja Sayajirao University of Baroda Author

Keywords:

Anomaly Detection, Outlier Detection, Autoencoder, Random Forest, False Negatives

Abstract

Anomalies, also known as outliers, are data points that deviate significantly from the norm. Detecting such anomalies is crucial in many real-world applications such as fraud detection, fault diagnosis, cybersecurity, and system health monitoring, where even a single undetected anomaly can lead to critical failures or losses. However, real-world datasets often suffer from extreme class imbalance and overlapping distributions, where anomalies are camouflaged within dense clusters of normal data. Popular anomaly detection algorithms like Isolation Forest, Local Outlier Factor (LOF), and clustering-based methods often struggle in these scenarios, leading to high false negatives. To address this, we propose a hybrid model combining an Autoencoder for reconstruction-based anomaly scoring with a Random Forest classifier trained specifically on ambiguous regions—where reconstruction errors for normal and anomalous instances overlap. This two-stage approach improves decision boundaries by leveraging both unsupervised reconstruction and supervised classification. Experiments on a labeled benchmark dataset demonstrate that our method significantly outperforms traditional models in terms of F1-score and detection accuracy, making it more effective for detecting subtle, disguised anomalies in imbalanced datasets

Downloads

Published

2026-04-08

Issue

Section

Articles