Downtime detected on Paydock platform

Incident Report for Paydock

Postmortem

After performing an extensive review of what had happened with this incident, we can confirm the following points:

Our monitoring did alert us to high memory usage on our database cluster, which caused a fail-over to occur.
In reality, the database fail-over completed with no downtime to our end users.

In our investigations since then, we haven’t seen any evidence of failed transactions in the affected period, however we are still making enhancements to prevent this kind of high database memory usage in future, these enhancements are set to be released in the next scheduled production release (05/07/23).

We apologise for any concern caused by this incident notification, we always strive to be transparent and proactive with any notifications we give when there is an issue, however sometimes this can lead to notifications being sent when there is no real world impact to the platform.

Posted Jun 26, 2023 - 00:23 UTC

Resolved

We are pleased to announce the closure of this incident. The immediate issue has been resolved, and services are operating normally. However, please note that the root cause investigation is still underway. We are actively collaborating with our database vendor to identify and address the underlying cause of the incident. A separate post-incident review will be issued once the investigation is complete, providing a comprehensive analysis and any necessary preventive measures. We appreciate your patience and understanding throughout this process. As always, we remain committed to maintaining the reliability and stability of our services.
Paydock Platform - Paydock API

Posted Jun 22, 2023 - 05:51 UTC

Monitoring

Currently, we are pleased to report that there are no ongoing troubles with the outage of our Production API endpoint (api.paydock.com). The API is functioning smoothly without any disruptions.

However, we are actively monitoring the situation and awaiting a response from our database vendor. Our objective is to identify the root cause of the failure that occurred previously. We are committed to working closely with our database vendor to gain insights and address any underlying issues.

Rest assured, we remain vigilant in monitoring the situation to ensure the continued stability and reliability of our services.

Posted Jun 22, 2023 - 04:18 UTC

Identified

We've identified the cause of the API outage we experienced earlier:

One of our back end database instances experienced an unexpected failure, at which point a failover process was initiated automatically, but had caused performance degradation while this was happening.

Currently all database instances have recovered, so performance has been restored to normal levels, however we are still working with our database vendor to identify the root cause of the failure.

More updates will be provided as we learn more.

Posted Jun 22, 2023 - 04:16 UTC

Investigating

We received an alert on our monitoring that there was downtime on the Paydock platform.

Upon checking this alert, we found that the issue had already recovered, however we are still looking into what happened and we'll update on this channel as soon as possible.

Posted Jun 22, 2023 - 01:08 UTC