Fewer failures
Reduce the likelihood of failures on digital platforms through automated learning of typical error patterns;
Optimized failure detection
More effective remediation
Real-time performance monitoring
Provide a new tool for monitoring the health status of digital platforms for a better understanding of their performance in real time.
Project Context
Digital platforms are playing an increasingly important role in companies’ core strategies, enabling the management of internal processes and the integration of customers, partners, and suppliers in an increasingly global and complex market. However, downtime, poor system responsiveness, data inconsistency, and security issues have caused, and may continue to cause, enormous losses for companies. It’s essential to know how to prevent potential problems from arising and act promptly when they do. One way to combat this problem is through the implementation of self-healing solutions, which allow anomalies to be detected when they occur, or even before they occur, initiating corrective actions automatically. This requires the implementation of Artificial Intelligence (AI) algorithms. However, despite the enormous potential of AI for self-healing solutions, there are challenges to overcome, particularly those related to the design and training of AI models, the lack of effectiveness or even incorrect decision-making by AI models, and the potential lack of transparency and explainability of AI-based self-healing solutions.
To support companies across various sectors in identifying, correcting, and preventing issues on their digital platforms, the Digi Self-Healer project proposes the research and development of an innovative AI-based self-healing solution that acts not only reactively but also preventively. This solution must be able to integrate with digital platforms and identify issues based on previously defined manual rules or by analyzing past problem patterns. Furthermore, the proposed self-healing solution is expected to automatically suggest possible fixes for identified issues and predict potential problems through preventive analysis based on AI mechanisms to be implemented.
The ecosystem will have a web-based tool for monitoring the status of digital platforms, where users can define manual rules for detecting errors on the platforms, choose to apply manual or automatic self-healing rules, access alerts on platform health, view graphs and dashboards with real-time performance information, among other features.
Functionality

Web tool for monitoring the status of digital platforms
Web-enabled interface for entities to use to assess the status of their digital platforms in real time.

Manual rules configuration for failure detection
Users can define manual rules for detecting failures in their organizations' digital platforms, such as thresholds for the minimum and maximum number of calls made to a given service, the maximum expected response time for a request made to a given API, firewall configurations, exclusion rules, etc.

Application of manual or automatic self-healing rules
This feature allows users to choose whether to apply manual rules, previously defined by themselves, or automatic rules, internally learned by the self-healing solution itself, to identify failures in each of the characteristics of the digital platforms under analysis.

Alarms and notifications
The monitoring tool will provide alarms and notifications to its users regarding three aspects: failures detected on digital platforms that require user action; failures detected on digital platforms that were automatically resolved by the self-healing solution; and detection of potential failures that will occur on digital platforms (preventive action).

Interaction with the system's intelligence center
Users of the monitoring tool will also be able to question the self-healing solution's intelligence center (Generative AI mechanisms) about the operating status of the integrated digital platforms or about resolutions for potential problems, such as whether the number of requests arriving for a given service is within the expected range, what configurations are necessary for a given service to be more responsive, etc.

Correction History
A correction history table available in the digital platform status monitoring tool provides a record of corrections applied to the platforms under analysis. This history table provides information about the type of correction applied, the parameter to which the correction was applied, when it was applied, and more.

List of suggested corrections to be applied
This feature provides a list of next best actions for some errors detected on digital platforms. The suggestions presented will relate to corrections that the self-healing solution interprets as the most appropriate to implement for each of the identified errors.

Graphs and Dashboards
The web tool provides graphs and dashboards with information about the evolution of the parameters to be monitored on the digital platforms integrated with the self-healing solution. This information may include the number of requests per minute for a given service that reached the platform, the amount of information processed per minute, the number of accesses per hour, and other statistics to be investigated. The number of faults detected per day, number of corrections applied per day, etc., will also be taken into account.
Deliverables