A lightweight, real-time AIOps anomaly detection system for logs, using an Isolation Forest model. This service is designed to be deployed on Kubernetes and can stream logs from multiple sources, detect anomalies based on learned patterns, and send alerts through various channels.
- Real-time Anomaly Detection: Uses an Isolation Forest model to score incoming logs for anomalies.
- Multi-Source Log Ingestion: Accepts logs from different services and sources in various formats (JSON, Key-Value).
- Extensible Alerting: Sends alerts via Slack, PagerDuty, or a generic webhook.
- Model Retraining: Supports online model retraining through a dedicated API endpoint.
- Feedback Loop: Allows users to submit feedback on predictions to improve the model over time.
- Containerized & Deployable: Ready for deployment on Kubernetes with an included Helm chart.
- Python 3.11+
- venv for virtual environment management
- curl for testing the API
Create and activate a virtual environment to isolate project dependencies.
Install all the required Python packages.
The application uses a .env file for configuration. Copy the example file and fill in your specific values, especially for alerting.
Edit the .env file with your details:
Launch the FastAPI service using Uvicorn. The --reload flag will automatically restart the server on code changes.
The service will be available at http://localhost:8000.
To generate test logs and send them to the service, run the log simulator in a separate terminal.
To run the full suite of unit tests, use pytest:
You can use the provided test-curl.sh script or run the commands individually.
Health Check:
Stream Logs:
The service provides a flexible, two-layered system for identifying anomalies and triggering alerts. This is configured in your .env file (or values.yaml for Helm deployments).
An individual log is flagged as an "anomaly" if it meets either of the following criteria:
These are fast, deterministic rules that check if a numeric feature in a log exceeds a defined threshold. This is useful for setting hard limits on critical metrics. Rules can be defined per-service, with a __default__ fallback.
Example env.example:
- A log from web_server with response_time of 1600 is an anomaly.
- A log from database with query_time_ms of 4000 is an anomaly.
- A log from any other service with cpu_usage_percent of 98 is an anomaly.
If a log does not violate any simple rules, it is passed to the Isolation Forest machine learning model. The model calculates an anomaly score, and if the score is below the ANOMALY_THRESHOLD, the log is flagged as an anomaly. This allows the system to catch more subtle or complex patterns that simple rules would miss.
To reduce noise, the system does not send an alert for every single anomaly. Instead, it only sends an alert when the frequency of anomalies becomes significant.
This setting defines the conditions for sending a high_anomaly_rate alert. It tracks the number of identified anomalies (from both simple rules and the ML model) for each service over a rolling time window.
Example env.example:
- Scenario: If 5 anomalies (of any kind) are detected for the web_server within a 60-second period, a single high_anomaly_rate alert is sent. No individual alerts for those 5 anomalies are sent.
- If only 4 anomalies occur, no alert is sent. They are simply stored for later analysis.
This approach ensures you are only notified about sustained or high-frequency problems, not single, transient spikes.
- A running Kubernetes cluster
- kubectl configured to connect to your cluster
- helm version 3+
Before deploying, you must configure the Helm chart's values, especially for environment variables like your alerting webhooks.
Edit helm/aioops-mcp-iforest/values.yaml and update the env section:
You can also enable the log generator job for testing purposes:
Install the Helm chart to deploy the application and the log generator to your cluster.
Check the status of your pods. You should see pods for the main service and, if enabled, the log generator.
To access the service running in your cluster from your local machine, use kubectl port-forward.
You can now use the curl commands above to interact with the service at http://localhost:8000.
To remove the deployment from your cluster, use helm uninstall.
This service is designed to function as a powerful, centralized anomaly detection engine within a larger AI or AIOps ecosystem.
Instead of running anomaly detection models on individual agents, those agents can be configured to stream their collected logs to this single, robust service. This offers several advantages:
- Centralized Model Management: A single, more powerful model can be trained and managed, rather than maintaining separate models on each agent.
- Consistent Anomaly Scoring: Ensures that all logs across the entire system are evaluated using the same criteria.
- Simplified Agent Logic: Agents can focus on log collection and forwarding, offloading the complex task of anomaly detection.
To use it in this capacity, deploy this service and configure your fleet of AI agents to send their log data to the /api/v1/stream/multi-source endpoint.
Contributions, issues, and feature ideas are welcome!
See CONTRIBUTING.md for guidelines on how to get started.
This project is licensed under the MIT License.
Created and maintained by Kishore Korathaluri.
Built as a personal side project to explore AIOps and log anomaly detection.
.png)


