Going through a course for my AWS Solutions Architect Associate cert. Learned about planning for fault-tolerance with EC2 instances.

Apparently the trick is to think of the worst case failure. For example, say you have three availability zones (AZs), and three MUST have 100% availability.

Think of the worst case failure, i.e. the AZ with the most number of EC2 instances fails. Then come up with a setup that has three instances running, despite the failure. That leads to setups like:

  • 3 in AZ-0, 2 in AZ-1, 1 in AZ-2
  • 2 in AZ-0, 2 in AZ-1, 2 in AZ-2

In either of these cases, even if the worst case scenario happens, you’re still abiding by your SLA.