Sunday, January 18, 2026

DNS Failover on OCI Using DNS Steering Policies

I recently put together a very small demo to showcase DNS-based failover on OCI using DNS Steering Policies. The idea was to build a setup, just enough to show how DNS failover behaves when server actually goes down.

The goal of the demo was straightforward: I wanted DNS to resolve to a primary instance as long as it was healthy, and automatically switch to a secondary instance once the primary stopped responding. No load balancers, no application logic, no frameworks — just DNS, health checks, and very basic HTTP endpoints.

This post briefly explains the steps. All implementation details and scripts are in my GitHub repository .

Demo setup
I used two OCI Compute instances running Ubuntu. On each instance, I started a minimal HTTP service on port 80. Each instance serves a simple HTML page that prints the OCI region, the instance display name and the public IP address. This makes it easy which instance DNS is pointing to at any given time.

The page is generated dynamically using instance metadata, and the HTTP service is started with Python’s built-in web server. The exact script I used is included in my Github repository.

DNS and health checks on OCI
On the OCI side, I assumed a public DNS zone already existed.

I then created an HTTP health check that monitors port 80 on both instances.
Using that health check, I created a DNS steering policy with the FAILOVER template. The primary instance was given the highest priority, and the secondary instance a much lower one. I also kept the TTL low (30 seconds) so that failover could be observed quickly.
And finally attached the steering policy to domain.

Testing failover
To test the behavior, I queried DNS and accessed the service from a third host using standard tools like dig, nslookup, and curl.

As expected, DNS initially resolved to the primary instance. When I stopped the HTTP service on the primary, the health check failed, and DNS started returning the secondary instance instead.

Because this is DNS-based failover, the switch is not instant. TTL still applies, which is an important point to understand when using this mechanism in real environments. According to my tests failover and failback takeas around 88 seconds.

Cleanup
One thing I ran into while testing was that DNS steering policy attachments are easy to create, but not as obvious to remove from the Console. Besides console experience is not the best.

To avoid leaving resources behind, I added CLI scripts to my Github repository to create, list and delete steering policy attachments, steering policies and HTTP health checks. This makes it easy to run the demo multiple times without cluttering the compartment.

Closing thoughts
This demo is intentionally basic, a clean way to observe how failover happens. While DNS-based failover is simple and effective.

If you want to try it yourself, all scripts and commands are documented in my GitHub repository .

While DNS-based failover is simple and effective, it’s not the right solution for every scenario. Session and data consistency issues must be evaluated at application level. Also DNS caching, resolvers ignoring low TTLs, and client-side behavior all introduce uncertainty that you can’t fully control. If an application needs lower failover times, a global load balancer or application-level failover mechanisms are usually a better fit. However it's still a viable approach to improve your RTO and disaster recovery capabilites during a regional outage.

No comments:

Post a Comment

Featured

Putting it altogether: How to deploy scalable and secure APEX on OCI

Oracle APEX is very popular, and it is one of the most common usecases that I see with my customers. Oracle Architecture Center offers a re...