Mendix Cloud v3 EU - Physical server outage
Incident Report for Mendix Technology B.V.
Postmortem

After closely analysing the chain of events and the involved components we have come to the following conclusion.

  • A disk failure in the shared data storage system was handled in an unexpected manner by the storage system which resulted in issues with the stability of the network connections.
  • After eliminating any possible cause, resetting individual network connections and finally one of the core network switches resulted in regaining network stability.
  • Mendix is working with suppliers to resolve any possible underlying issue to prevent this incident from occurring again.

Please contact customer support should you require more information.

Posted Oct 25, 2019 - 17:18 CEST

Resolved
This incident has been resolved.
Posted Oct 24, 2019 - 17:42 CEST
Monitoring
The reboot of the networking infrastructure has alleviated the stability issue and the environment is performing as expected. We are monitoring the situation and will update the status when necessary.
Posted Oct 23, 2019 - 20:33 CEST
Update
The engineers are still working on restarting the networking infrastructure. We will post the next update in an hour.
Posted Oct 23, 2019 - 19:19 CEST
Update
The next step in the process of resolving this issue will be to restart all other components in the networking infrastructure. We will post the next update in an hour.
Posted Oct 23, 2019 - 18:16 CEST
Update
In cooperation with the supplier, engineers are currently working to clear, reset and restore the connections between the storage subsystem and the servers. We are analysing the results of this action and will post a next update in an hour.
Posted Oct 23, 2019 - 16:53 CEST
Investigating
The engineers working on the case have come to the conclusion after resolving a likely root cause that this did not resolve the stability issue which is causing the partial failure. We are continuing the investigations and will post a next update in 1 hour.
Posted Oct 23, 2019 - 15:27 CEST
Identified
The engineers working on this issue have isolated a likely root cause and are working towards a recovery scenario. We currently expect to see results of this at around 15:00 CEST. We will post an update on or before that time.
Posted Oct 23, 2019 - 13:43 CEST
Update
Unfortunately, we have no news to report. We are continuing investigating this issue with the supplier and will post another update within 1 hour.
Posted Oct 23, 2019 - 12:38 CEST
Update
A number of applications running on Mx Cloud V3 are experiencing slowdowns due to the partial storage failure.

We are continuing investigating this issue with the supplier and will post another update within 1 hour.
Posted Oct 23, 2019 - 11:30 CEST
Investigating
We are experiencing a partial failure on the storage subsystem in Mendix Cloud v3 EU. We're currently investigating the issue and will post an update within 1 hour of this posting.
Posted Oct 23, 2019 - 10:28 CEST