✨ What's New?
Thank you all for the feedback on Chaos Genius 0.1.3. Our main focus for this upgrade was on covering edge cases for making DeepDrills and Anomaly Detection work on as varied datasets as possible, adding Task Status monitoring to enable users to detect if any analytics is failing and other bug fixes.
Key highlights being:
- Detailed status tracking for analytics for faster detection & debugging (cc: @bouke-nederstigt , @gxu-kangaroo, @davidhayter-karhoo, @mvaerle)
- Configurations for edge cases like older data sets, smaller data sets, enabling KPI definition w/o dimensions etc. (cc: @davidhayter-karhoo)
- DeepDrills handling for missing data, NULL/NaN values (cc: @davidhayter-karhoo)
- New Anomaly Detection model - EWMA
- Error & Analytics - Config for enabling Sentry & PostHog for Error handling & Analytics (cc: @coindcx-gh)
- Improved Alerting logic
- Bug Fixes
- Data Sources not showing on installation (cc: @omriAl, @nsankar)
- Other Bug Fixes
We're happy to inform you that we've reached a Community Size of 50 with teams from 10 different time zones in such a short period of time! We look forward to working closely with all of you to support your use cases before we open up to the Public.
🧮 New Model(s)
We added a new model for Anomaly Detection - EWMA. Exponentially Weighted Moving Average (EWMA) is a statistic that averages the data in a way that gives less and less weight to data as they are further removed in time. EWMA is better suited for cases where the data is largely static and then can have sudden state change.
- feat(anomaly): add EWMA Model (#428)
🎉 New Features
Task and status observability on your Analytics
There are various unique reasons which can sometimes lead to analytics failing - e.g. Database access/authorization error, network error, incomplete data. While we are covering as many edge cases as possible, adding a Task Status is our first step towards faster incident detection. We are adding more features to it including exact errors & diagnoses when the analytics fails. The task status on local installation should be available at http://127.0.0.1:8080/api/status/
- Store task & subtask status and create a view for it for streamlined troubleshooting (#459)
- Observable tasks deepdrills (#446)
Error handling and user analytics to give better support (sentry, posthog)
In order to identify the error sooner, you can now configure your Sentry account by updating the parameter
docker-compose.yml. We can also provide you with our Sentry token so we can closely monitor any issues you might be facing.
We've also added Posthog - an open-source analytics tool, to capture user activity to help us better inform the product roadmap as we open up our repos for public access. We enabled an option for anonymizing the data before sharing. It is also possible to disable Posthog.
More dataset configurations/missing data support
In the previous versions, there were analytics failures in cases where there was no data for the past 5 days. We call this 'Slack length'. We've made this value configurable (
MAX_ANOMALY_SLACK_DAYS) in the
docker-compose.yml and update the default to 14 days. This parameter helps us to perform anomaly detection on the latest data for the most accurate results.
In our previous versions, we also required users to select dimensions as a mandatory field. We've now made this optional. You need to specify dimensions only if you need sub-dimensional insights.
- Make slack configurable for DeepDrills and Anomaly (#434)
- Remove the mandatory option for the dimension (#445)
Robust DeepDrills for missing data & errors
Our first implementation of DeepDrills required complete datasets with the last 60 days of data to run successfully. We've enhanced DeepDrills to be more granular in order to work with incomplete data sets & handle missing data.
- Handle DeepDrills analytics failures gracefully with partial analytics in case of subtask errors (#458)
- Account for NaN & NULL values in DeepDrill analysis (#437)
Improved alerting logic
We've enhanced our alert logic to instantly trigger alerts once an anomaly is detected. We've also made a few improvements in the alert format. We'll continue to build out the alerting functionality in our future releases.
Improved analytics indexing
We have optimized our indexes to provide faster drill-downs for large KPIs & dimensions.
- Add the analytics data index (#461)
🐛 Bug Fixes
- Handle KPI queries with trailing semicolon for KPI validation & analytics (#429)
- Validate the duplicate column in the result dataset of a query defined KPI (#441)
- Snowflake connector mentions setting up with a hostname, where the hostname is actually not required (#438) (cc: @joshuataylor)
- Metric columns having NaN's in first 10 or higher rows fails KPI Validation (#444)
- Validation for the dimension column in the add KPI screen (#450)
- DeepDrills fails for KPI with no dimensions defined (#468)
- Handle empty data in comparison data frame for mean aggregation in DeepDrills (#494)
We have 15+ contributors spread across 10 different time zones across the world who have made commits to our GitHub repo to make Chaos Genius better than it was when they found it.
We are thankful to each one of you, and we're very excited about what the future holds for Chaos genius in the open-source ecosystem.
Chaos Genius is an open-source business observability platform democratizing access to AI-powered Anomaly Detection for businesses around the world. Check out and access our Github Repository here. Give it a spin!