Resiliency is the ability of your application and infrastructure to handle and recover from failue with minimal impact on user experience. Imagine your application encountering a sudden surge in traffic or a server failure - resiliency is what enables it to stay up and running, serving users without breaking a sweat.
To achieve resiliency, you can implement strategies like redundancy, graceful degradation, and automatic failover.
- Redundancy means having backup systems in place, so if one component fails, another can take over automatically.
- Graceful degradation is designing the app to continue functioning even if some parts are not fully operational.
- Automatic failover is thr ability of your application to automatically switch from one server or system to another when the first one fails.
The ultimate goal when building resilient applications is to gracefully fail without impacting user experience.
Explore related concepts
Reliability
Reliability is the ability of an application to consistently 'do what is says on the can'. It's the ability of the application to perform as expected, even when the set of conditions are not optimal. It involves minimizing the occurrence of failures and ensuring that the system can recover quickly when failures happen.
Incident
An incident is an unexpected disruption in the normal operation of an application. Incidents can range from complete service outages to performance degradation, and they often require immediate attention to restore normal functionality.
Recovery
Recovery is the process of restoring an application to a stable and functional state after a failure or incident.