Downtime detected on Paydock platform
Incident Report for Paydock
Postmortem

After performing an extensive review of what had happened with this incident, we can confirm the following points:

  1. Our monitoring did alert us to high memory usage on our database cluster, which caused a fail-over to occur.
  2. In reality, the database fail-over completed with no downtime to our end users.

In our investigations since then, we haven’t seen any evidence of failed transactions in the affected period, however we are still making enhancements to prevent this kind of high database memory usage in future, these enhancements are set to be released in the next scheduled production release (05/07/23).

We apologise for any concern caused by this incident notification, we always strive to be transparent and proactive with any notifications we give when there is an issue, however sometimes this can lead to notifications being sent when there is no real world impact to the platform.

Posted Jun 26, 2023 - 00:23 UTC

Resolved
We are pleased to announce the closure of this incident. The immediate issue has been resolved, and services are operating normally. However, please note that the root cause investigation is still underway. We are actively collaborating with our database vendor to identify and address the underlying cause of the incident. A separate post-incident review will be issued once the investigation is complete, providing a comprehensive analysis and any necessary preventive measures. We appreciate your patience and understanding throughout this process. As always, we remain committed to maintaining the reliability and stability of our services.
Paydock Platform - Paydock API
Posted Jun 22, 2023 - 05:51 UTC
Monitoring
Currently, we are pleased to report that there are no ongoing troubles with the outage of our Production API endpoint (api.paydock.com). The API is functioning smoothly without any disruptions.

However, we are actively monitoring the situation and awaiting a response from our database vendor. Our objective is to identify the root cause of the failure that occurred previously. We are committed to working closely with our database vendor to gain insights and address any underlying issues.

Rest assured, we remain vigilant in monitoring the situation to ensure the continued stability and reliability of our services.
Posted Jun 22, 2023 - 04:18 UTC
Identified
We've identified the cause of the API outage we experienced earlier:

One of our back end database instances experienced an unexpected failure, at which point a failover process was initiated automatically, but had caused performance degradation while this was happening.

Currently all database instances have recovered, so performance has been restored to normal levels, however we are still working with our database vendor to identify the root cause of the failure.

More updates will be provided as we learn more.
Posted Jun 22, 2023 - 04:16 UTC
Investigating
We received an alert on our monitoring that there was downtime on the Paydock platform.

Upon checking this alert, we found that the issue had already recovered, however we are still looking into what happened and we'll update on this channel as soon as possible.
Posted Jun 22, 2023 - 01:08 UTC