The false positive alerts that customers were seeing in the Mendix Cloud V4 region are now resolved.
We have resolved the issues of abnormally high load on our Metrics and Alerting service, which was the primary cause of this incident. We are continuing our research into adding extra safeguards against false positive alerts, which will ensure that if high load recurs, false positive alerts will not be sent. We will provide an RCA in due course.
Posted Jul 23, 2020 - 12:22 CEST
We have increased the resources of our Metrics and Alerting service. This has improved the situation somewhat, but not completely resolved it. In the past 24 hours, we have noticed significantly reduced error rates in our services, and significantly reduced numbers of false alerts.
We are working on two separate strands to fully resolve the issue: - Eliminating the source(s) of abnormally high load on our Metrics and Alerting service, to restore healthy operation of the service - Adding extra safeguards against false positive alerts. There are already multiple safeguards against false positive alerts in place, but these appear to be insufficient when the underlying service is under extremely heavy load.
We will provide further updates when these actions have been taken. We will also provide a full postmortem/RCA after the incident has been fully resolved.
Posted Jul 07, 2020 - 15:21 CEST
We have noticed that customers in the Mendix Cloud V4 EU region are experiencing numerous false-positive alerts, specifically from the "Runtime heartbeat" alert.
We have identified that this is due to high load on the service backing our Metrics and Alerting services.
We are currently working with our external hosting partner to provision extra resources for this service, and will provide an update here once this is complete.
Posted Jul 06, 2020 - 11:18 CEST
This incident affected: Mendix Cloud V4 (Mendix Cloud V4 EU).