Cloud v3 infrastructure issue
Incident Report for Mendix Technology B.V.
Resolved
This incident is fully resolved now. Please reach out to Mendix support in case you are still facing any issues.
Posted Jun 10, 2021 - 13:32 CEST
Monitoring
We have implemented a fix for the issue since our last update. All the applications should be working as expected on the Cloud v3 infrastructure now.
We are actively monitoring the systems which were impacted due to this issue.
Posted Jun 10, 2021 - 09:57 CEST
Update
We have identified a solution to fix the immediate problem. We will execute a migration on the storage service, taking customers from the unhealthy host to a new healthy host. This will not incur downtime to affected customer applications.

We are performing this maintenance immediately. Once this maintenance is completed, customers should notice that their applications are once again accessible.

If your application remains inaccessible after this process, the issue is likely in the application, and you should first try to restart or redeploy the application, and failing that, contact Mendix Support for further resolution.
Posted Jun 08, 2021 - 20:40 CEST
Update
We're continuing to work with the highest priority on this issue, together with our hardware providers. We don't have any specific ETA at this point in time.

We've considered a few alternatives for affected customers, but none of them were feasible, for reasons of data integrity.

We will continue to share updates here as and when the situation changes.
Posted Jun 08, 2021 - 17:48 CEST
Update
We're continuing to investigate this issue with priority along with our hardware providers.
Posted Jun 08, 2021 - 16:44 CEST
Identified
After further investigation, we have determined that the root cause of the issue is a connectivity issue between applications and their storage devices.

When applications read and write files, these operations may be slow, or time out completely. Applications can still interact with their database without being affected.

The control plane actions above (start, restart, stop, restore backup, deploy model) also involve disk reads and writes, which is why they are also affected.

We're continuing to work, along with our hardware providers, to fix the issue.
Posted Jun 08, 2021 - 14:04 CEST
Update
We have also observed that there is a risk of extended downtime of your action if any control plane actions are taken: if the app is started, restarted, stopped, a new model is deployed, or a backup is restored.

Mendix can manually initiate these actions on your behalf, but because of the issues with the underlying infrastructure, this downtime can be up to 1 hour.

Thus, we strongly recommend that customers don't perform any of these actions unless they are highly necessary. If you need further guidance, please reach out directly to Mendix Support, referencing this message.
Posted Jun 08, 2021 - 13:21 CEST
Update
We've observed that a number of apps are intermittently unreachable in one of our Cloud V3 datacenters. So far, we have identified that the issue is with one specific component of our base infrastructure.

We're continuing to investigate this issue with priority.
Posted Jun 08, 2021 - 11:43 CEST
Investigating
We are currently investigating an issue on our Cloud v3 infrastructure.
From our initial findings we have noticed that few applications running on the impaired infrastructure are not reachable.
We will be sharing more updates shortly.
Posted Jun 08, 2021 - 10:48 CEST