Sunday, January 18, 2026

DNS Failover on OCI Using DNS Steering Policies

I recently put together a very small demo to showcase DNS-based failover on OCI using DNS Steering Policies. The idea was to build a setup, just enough to show how DNS failover behaves when server actually goes down.

The goal of the demo was straightforward: I wanted DNS to resolve to a primary instance as long as it was healthy, and automatically switch to a secondary instance once the primary stopped responding. No load balancers, no application logic, no frameworks — just DNS, health checks, and very basic HTTP endpoints.

This post briefly explains the steps. All implementation details and scripts are in my GitHub repository .

Demo setup
I used two OCI Compute instances running Ubuntu. On each instance, I started a minimal HTTP service on port 80. Each instance serves a simple HTML page that prints the OCI region, the instance display name and the public IP address. This makes it easy which instance DNS is pointing to at any given time.

The page is generated dynamically using instance metadata, and the HTTP service is started with Python’s built-in web server. The exact script I used is included in my Github repository.

DNS and health checks on OCI
On the OCI side, I assumed a public DNS zone already existed.

I then created an HTTP health check that monitors port 80 on both instances.
Using that health check, I created a DNS steering policy with the FAILOVER template. The primary instance was given the highest priority, and the secondary instance a much lower one. I also kept the TTL low (30 seconds) so that failover could be observed quickly.
And finally attached the steering policy to domain.

Testing failover
To test the behavior, I queried DNS and accessed the service from a third host using standard tools like dig, nslookup, and curl.

As expected, DNS initially resolved to the primary instance. When I stopped the HTTP service on the primary, the health check failed, and DNS started returning the secondary instance instead.

Because this is DNS-based failover, the switch is not instant. TTL still applies, which is an important point to understand when using this mechanism in real environments. According to my tests failover and failback takeas around 88 seconds.

Cleanup
One thing I ran into while testing was that DNS steering policy attachments are easy to create, but not as obvious to remove from the Console. Besides console experience is not the best.

To avoid leaving resources behind, I added CLI scripts to my Github repository to create, list and delete steering policy attachments, steering policies and HTTP health checks. This makes it easy to run the demo multiple times without cluttering the compartment.

Closing thoughts
This demo is intentionally basic, a clean way to observe how failover happens. While DNS-based failover is simple and effective.

If you want to try it yourself, all scripts and commands are documented in my GitHub repository .

While DNS-based failover is simple and effective, it’s not the right solution for every scenario. Session and data consistency issues must be evaluated at application level. Also DNS caching, resolvers ignoring low TTLs, and client-side behavior all introduce uncertainty that you can’t fully control. If an application needs lower failover times, a global load balancer or application-level failover mechanisms are usually a better fit. However it's still a viable approach to improve your RTO and disaster recovery capabilites during a regional outage.

Friday, January 16, 2026

IIS, NFS, and the Drive Letter Trap: Using OCI File Storage on Windows for High Availability (HA)

Recently, I came across an interesting problem while working on a customer deployment on Oracle Cloud Infrastructure (OCI).
This post is a technical field note for myself and people who might find themselves in a similar situation.Definetly not a reference architecture or a best-practice guide. It documents:

  • A real life issue encountered when running IIS on Windows with OCI File Storage Service (FSS)
  • The thought process behind the initial design
  • What failed, and why it failed
  • The minimal change required to make it work
I thought it was just a practical issue that can easily show up when moving Windows workloads from on-premises to OCI, and decided to write about it.

1 The use case
Customer has an ASP.NET application running on IIS, previously deployed on-premises. And they are in the process of moving the application to Oracle Cloud Infrastructure, with the following design:

  • Two IIS nodes (Windows Server 2022 Standard)
  • High availability deployment
  • All application servers in a private subnet
  • A public load balancer in front
So far, this is a fairly standard 2/3-tier web deployment.

The complication
The application stores some data on disk, and that data is meaningful only when combined with database records (for example: uploads, generated files, artifacts, etc.).

Once you go multi-node, the obvious question appears: How do both web servers see the same files? A shared storage layer is required.

2 First approach: shared storage with OCI FSS
The natural choice here was OCI File Storage Service (FSS):

  • Managed NFS service
  • Simple to mount on multiple instances
  • Works well for shared file access across compute nodes
  • Many features for resillience (backup/restore, snapshot, cross-region replication, etc.)
The plan was straightforward:
  • Create an FSS export and mount target in private subnet
  • Mount File Systems on both Windows IIS instances
  • Point the application to a shared directory

Where things start to break
The FSS export was mounted successfully on both Windows servers:
  • The NFS client was installed
  • The mount was visible at the OS level
  • Files could be created manually from command line and explorer, and it was visible on the other node.

However, the application had a fixed directory structure and expected a specific path under wwwroot. So when I decided to add a Virtual Directory another surprise appeared: IIS doesn’t even “see” the NFS-mounted drive. How could the application use it?

The symbolic link idea
The first workaround that came to mind was simple: Create a symbolic link under C:\inetpub\wwwroot that points to the NFS mount.

This approach often works with local disks and SMB shares, so it looked reasonable. However, once the application was tested, file operations failed with the following error: Error: Could not find a part of the path 'C:\inetpub\wwwroot\FSSApp\data\test.txt' At this point:
  • The path did exist
  • The symbolic link was valid
  • The same path worked from an interactive PowerShell session
  • The failure happened only when accessed from IIS
So the question became: What exactly is going on here?

3 What’s actually going on between IIS and NFS
Initially, this looked like an identity or authentication problem, but further testing showed that the real issue was more fundamental.

Drive letters and Windows services
On Windows, drive letters are session-scoped.

This means, a drive letter is associated with the user session that created it. It is visible to that user in Explorer and PowerShell. It is not automatically visible to Windows services. You will experience the same error when you launch command prompt as Administrator.

IIS worker processes run as services, execute in Session 0 and do not inherit drive mappings from interactive logons.

So when the NFS export was mounted using a drive letter (for example X:), the mount worked for the logged-in user. The symbolic link resolved correctly in PowerShell. However IIS could not resolve the same path. From IIS’ point of view, the target simply did not exist.

That’s why the application failed with "Could not find a part of the path" misleading but understandable. The path exists, but not in the IIS execution context.

Why the symbolic link didn’t work (at first)
The symbolic link itself was not the problem. From IIS’ point of view, the symbolic link resolves to "a drive letter that doesn't exist". Letter was assigned in an interactive user session.

Why UNC paths work
UNC paths are global, session-independent, resolvable by Windows services, accessible from IIS worker processes

When the symbolic link was recreated to point directly to the UNC path, application pool identity remained unchanged, pass-through authentication was still used and yet, file uploads worked immediately.

Root Cause
So the real root cause was not IIS vs NFS, and not strictly authentication. IIS cannot access drive-letter–based network mounts, because drive letters are session-scoped and assigned letter stayed in user session.

Where domains still matter
Active Directory and domain membership are not required for this specific fix. However, they become relevant when you need non-anonymous NFS access.

Even though IIS doesn't see session scoped mapped drive, the simplest and practical solution was using a symbolic link with UNC path and avoid drive letter. This way IIS directly can access to globally resolvable paths.

4 The Working Setup
We install:

  • IIS Web Server
  • ASP.NET (my test application uses Web Forms)
  • Required IIS dependencies

  • NFS Client (Services for NFS)

  • Mount FSS to X: drive (Command Prompt, not Power Shell)

  • IIS can't see FSS
  • Create symbolic link (Workaround)

  • Create a test application, web form to upload file to data folder which is mapped to FSS

  • Upload fails when symbolic link is created using session scoped Drive X:/
  • Upload is successfull when symbolic link is created using UNC path to FSS

Friday, January 2, 2026

Deploy QRadar Console (SIEM) on OCI using IBM Cloud Marketplace Image

In my previous blog post I explained how to deploy QRadar Console on OCI using an ISO file, along with the limitations of that approach—especially the 2 TiB boot volume limitation caused by legacy BIOS and IDE-based images.

While looking for a solution to the storage problem, I revisited the IBM QRadar 7.5 Installation Guide . In Section 19, IBM describes QRadar marketplace deployments on major cloud providers, including AWS, Azure, Google Cloud, and Oracle Cloud Infrastructure.

After going through that section, it became clear that IBM Marketplace images are the intended and supported approach for cloud deployments—and importantly, they solve the storage design problem.

This blog post walks through how I deployed QRadar on OCI using the IBM Marketplace image, and why this approach works significantly better.

Unlike ISO-based installations:

  • The Marketplace image is cloud-optimized
  • QRadar is installed as software, not a pre-built appliance
  • Storage layout supports large secondary disks
  • The installation flow is exactly as documented by IBM
  • No legacy BIOS / IDE limitations
Most importantly, /store is placed on a secondary block volume, not the boot disk. This completely avoids the 2 TiB boot volume limitation discussed in my previous post.

So here are the high-level steps:
  1. Download QRadar image from IBM Cloud Marketplace
  2. Upload the image to OCI Object Storage
  3. Create a custom image in OCI
  4. Provision the OCI instance with proper networking and storage
  5. SSH into the instance and install the QRadar Console

1 We downloaded the QRadar Console image from the IBM Cloud Marketplace , as referenced in the IBM QRadar 7.5 Installation Guide. The downloaded file name is ORACLE-CLOUD-741-console-20220811114721 which is similar to what is mentioned in the guide. This image is specifically prepared for Oracle Cloud and follows IBM’s supported deployment model.

2 Next, I uploaded the image to an OCI Object Storage bucket. You can use OCI web interface or create Pre-authenticated requests wih object writes following steps here .

3 Then using the OCI Console:

  • Navigate to Compute → Custom Images
  • Create a new custom image
  • Select Object Storage as the source
  • Choose the uploaded QRadar image file
  • Select OCI as the image type
As you see we don't worry about launch mode (Paravirtualized, Emulated etc.), OCI validates the image and prepares it for instance creation.

4 While creating the VM instance from the custom image, there are a few important considerations.

Networking

  • Although the guide says "Assign a public IPv4 address" during provisioning, I did not. I reserved a public IP for practical reasons, and assigned it after provisioning. This kind of workaround works fine.
  • HTTPS access on port 443 was enabled using a Network Security Group (NSG)
  • Another benefit of this approach is ability provide SSH access to the VM through ssh key authentication, not password.
Storage
  • Also guide doesn't mention anything about the boot volume and if left untouched VM is provisioned with 122GB boot volume. I find this very small for a QRadar deployment. So I allocated 2 TiB and added some post-provisioning steps to make this space available.
  • I created a secondary disk by attaching block volume, installation guide recommends using Paravirtualized attachment type, no need set device path, and obviuosly Read/write access type. I was able to use 12 TB without any problem.
Important Storage size cannot be increased after installation. Make sure you allocate enough space for log retention from day one.
Important I tested provisioning the instance with the default boot volume size (122 GB) and resizing it after deployment. I was able to successfully extend the boot volume up to 2 TiB without breaking the boot process or affecting QRadar functionality.

5 After the instance was running I connected using my SSH keys provided during provisioning. Note that user is cloud-user not opc.
ssh -i ~/.ssh/server.key cloud-user@$public_ip Fixing Storage on Boot Volume
So when checked the boot volume disk capacity is 2T but partition table doesn't know this, and needs to be updated. Here are the steps:

  • Expand the Partition: Use fdisk to extend sda3 to fill the 2 TB disk.
  • Resize the PV: Run pvresize so LVM recognizes the partition is now larger.
  • Extend the Logical Volumes: Decide which folders need more space (e.g., /var or /opt) and grow those specific LVMs.

Expand the Partition
Enter the fdisk interactive menu: fdisk /dev/sda Follow these keystrokes carefully:
  1. p: Print the table (one last check of that Start sector).
  2. d: Delete a partition.
  3. 3: Select partition 3. (Don't worry, the data is still on the bits of the disk).
  4. n: New partition.
  5. p: Primary.
  6. 3: Partition number 3.
  7. First sector: TYPE THE START SECTOR YOU WROTE DOWN. (It usually defaults to the right spot, but double-check).
  8. Last sector: Press Enter to accept the default (the end of the 2 TB disk).
  9. Signature?: If it asks "Do you want to remove the signature?", type N (No). This is critical.
  10. t: Change type.
  11. 3: Select partition 3.
  12. 8e: (or 31 for LVM on some versions). Type L to list codes if unsure, but usually, it's Linux LVM.
  13. w: Write changes and exit.
Since sda3 is currently mounted (it holds your OS!), the kernel might use the old table until a reboot. Force an update: partprobe /dev/sda (If partprobe gives an error that the disk is busy, you may need to reboot, but usually it works on modern RHEL).

Or simply just reboot!

Resize the PV
Now that the partition is 2 TB, tell LVM to use that new space: [root@qradar-20260102-1919 ~]# pvresize /dev/sda3
Physical volume "/dev/sda3" changed
1 physical volume(s) resized or updated / 0 physical volume(s) not resized
vgs (volume group scan) should display a large amount of "VFree" (Virtual Free space). [root@qradar-20260102-1919 ~]# vgs
VG #PV #LV #SN Attr VSize VFree
rhel 1 9 0 wz--n- 1.95t 1.83t

Extend the Logical Volumes
QRadar is extremely "log-heavy." If /var/log or /storetmp fills up, the services will crash or stop collecting events. So based on my current LVM layout and best of judgment this is what I've come up with:
Mount Old New Why?
/ (root) 20 GB 100 GB Gives the OS breathing room for updates and temporary files.
/opt 14 GB 200 GB QRadar binaries and many extensions/apps live here.
/var/log 18 GB 500 GB Critical. This is where QRadar stores active logs.
/storetmp 15 GB 500 GB Used for temporary data processing and backups.
/var 8 GB 50 GB General system variable data.
Free Space 0 GB ~500 GB Keep this unallocated. LVM allows you to grow any folder instantly later if it gets full.

And here are the commands to distribute space:


QRadar Software Installation
Then I started installation as documented in installation guide: sudo /root/setup_console At some point you might see a hardware warning message, proceed with Y.

Script will format attached secondary block volume (sdb), organize the storage with volume groups and folders. Also install required packages, install software ("All-In-One" Console and many supporting others) and configure everything. When the script completes, it will ask you to set the admin password. You can set/change the admin password anytime using: sudo /opt/qradar/support/changePasswd.sh -a

Backup/Restore and Disaster Recovery
I configured volume group backups within the same region and successfully tested restores.

For guaranteed restore times, I also usedcross-region volume group replication , which creates consistent snapshots of both the boot and block volumes in another region. After activating the volume group, I created a new instance, reconfigured networking, and confirmed full functionality.

This provides a good business continuity plan.

For lower RPOs (e.g., under 30 minutes), tools like RackWare can be evaluated for continuous replication.

Final Thoughts
If you are planning to run QRadar on OCI for production, this is the recommended and supported approach. The ISO-based method can still be useful for labs, short-term testing or if you can live with 2 TiB storage, but for long-term SIEM workloads, IBM Marketplace images are the right choice. Just plan your storage requirements ahead including log retention and software updates as well and allocate enough storage to both boot volume and additional block volume.

Thursday, January 1, 2026

Deploy QRadar Console (SIEM) on OCI using an ISO file

Recently, I faced a challenge while helping a customer deploy their SIEM solution on Oracle Cloud Infrastructure (OCI). The customer provided an ISO installer, which is not a natively supported format for OCI custom images.

OCI currently supports importing custom images in VMDK, QCOW2, and OCI image formats. As of December 2025, the maximum supported image size is 400 GB (this limit may change in the future). You can find additional details about importing custom Linux images in the OCI documentation .

Given these constraints, I followed the approach below to successfully deploy IBM QRadar Console on OCI.
  1. Install VirtualBox on a compute VM running on OCI
  2. Install Oracle VirtualBox Extension Pack and configure OCI integration
  3. Create a VirtualBox VM using the QRadar ISO
  4. Import the VM disk (VMDK) as a custom image in OCI
  5. Launch a VM with the desired compute, storage, and networking
  6. Extend storage using LVM
  7. Configure QRadar networking

1 Installing VirtualBox on a compute VM in OCI is straightforward. I downloaded VirtualBox for my platform. During installation, I was prompted to install the latest Microsoft Visual C++ redistributable package, which is required.

2 Next, I installed the VirtualBox Extension Pack . Once enabled, I configured a Cloud Profile using OCI API Keys .This allows VirtualBox to push custom images directly to OCI.

This step is critical. Uploading the VMDK manually to OCI Object Storage and creating a custom image using generic parameters did not work. VirtualBox Cloud Integration automatically selects image launch parameters that allow the VM to boot successfully in OCI.

3 The QRadar installation is heavily automated using the Anaconda installer. It performs hundreds of steps and multiple reboots. Below are only the important and non-obvious steps.

a Uncheck “Proceed with Unattended Installation.”

If unattended installation is enabled, the VM fails to start with errors similar to the following:
VERR_ALREADY_EXISTS - Error setting name '/ks.cfg'
PIIX3 cannot attach drive to the Primary Master
Power up failed (VERR_ALREADY_EXISTS)
This happens because QRadar already includes its own kickstart configuration.

bQRadar is resource-intensive. If minimum requirements are not met, the installer will not show the “All-in-One Console” option. Minimum requirements:
  • CPU: 4 vCPUs
  • Memory: 16 GB
c Configure Disk Size and Format:
  • Minimum disk size: 256 GB
  • VirtualBox does not pre-allocate disk space unless specified
  • This disk becomes the OCI boot volume later, so keep it minimal and resize cautiously
  • Select VMDK as the disk format (useful if manual copy is required)
d I started the VM in GUI mode and followed the on-screen instructions. Most steps are automated. At one point, the installer prompted for user input. I selected FLATTEN and continued.
e Later, when prompted again, I typed HALT to stop the VM.
f While the VM was stopped, removed the optical drive (ISO) and restarted the VM.
g The scripted installation resumed and takes time to complete.
h Eventually prompted for login, I logged in as root.
i Scrolled through the license agreement (Space key) and typed yes to accept.
j Installation continues with Software Install
k It will not display All-In-One Console option if the VM has less than recommended minimums
l Installation continues with normal setup (not HA), date time and time zone settings, ipv4 and emp0s3 for NIC and networking. QRadar uses static IP configuration. I used a generic OCI VCN configuration (10.0.0.0/16), this can be configured anytime later. I re-configured it using VNC connection after provisioning the VM on OCI.
m Later I set the admin password (used for qradar web console) and root password (will be used for ssh)

n Installation is complete and QRadar web console is running on localhost:443

4 Initially, I tried uploading the VMDK manually to Object Storage and creating a custom image using PARAVIRTUALIZED mode. Unfortunately, the VM failed to boot with dracut errors. So I allowed VirtualBox to decide on the required parameters by using Export to OCI feature.

a Select the VM in VirtualBox

b I prefered to create the image and later provision the instance manually, but VirtualBox can also do this for you.
c Select the object storage bucket
d And VirtualBox will first upload VMDK file to object storage and then create custom image. It might take sometime.
e Once completed you can see the file in object storage and custom image.
Important The resulting image uses:
  • BIOS firmware
  • IDE boot disk
  • E1000 network adapter
IDE and BIOS combination limits the boot volume size to ~2 TiB. Any resize operation beyond this limit corrupts the boot sequence. Data is still present, but the VM will not boot.

5 I have launched a VM instance using the custom image. During my tests allocating more than 2TiB for boot volume the instance boot will failed. As explained here , I connected to instance serial console with VNC viewer. Entered root password and ready to complete configuration.

6 As you see, 2TiB storage is there but not allocated. First I fixed this.

a First format disk and add new partition to allocate all available storage

b Now initialize physical volume and extend volume group by adding new volume, then reboot.

c Now I can see /store has the available space increased.

7 And the final stage is to re-configure qradar network with qchange_netsetup. Update the private IP with the one assigned from subnet.

The script got stuck so after waiting for a while, I restarted the server. Next time I ran qchange_netsetup I saw error message about there are pending changes, so I deployed them by using
/opt/qradar/upgrade/util/setup/upgrades/do_deploy.pl

After configuring security list and/or network security group (NSG) in my VCN I am able to ssh into the server, also can login to QRadar console using my browser.

Important Notes

  • I tried importing VMDK myself, using most generic parameters, unfortunately the VM didn't boot. Once I imported custom image using VirtualBox Cloud Extensions I saw it is using very specific and older technology like E1000 for NIC attachment, BIOS firmware and IDE for boot volume. Last two combination is limiting boot volume size at 2 TiB. I tried resizing the boot volume both offline and online but the VM didn't boot after resize.
  • QRadar has a specific storage arrangement, and will require most space for /store partition. Since I can't exntend boot volume, I decided to attach a block volume as a data disk. This approach worked fine, but how do I enable Qradar to use the new disk? I added new block volume to volume group and extended /store but unfortunately after a restart volume group failed because of SCSI attachment. I guess volume group initilization is part of boot, where attaching network storage and mounting it can only happen after a successful boot. So this approach also didn't work. By the way, I was able to recover VM by editing fstab and removing additonal block volume later

Final Thoughts
This approach works only if you are comfortable with the 2 TiB boot volume limitation. For a SIEM solution that continuously collects logs, this is a serious constraint. One possible workaround is to move data outside and free up some space. I would recommend deploying OCI Image you can obtain from IBM Marketplace. Your on-premise license will work on this image as well. Marketplace deployment uses a secondary disk which can be larger than 2 TiB, but still can not be resized after installation.
See my blog post that covers Marketplace-based deployment approach in detail.

Featured

Putting it altogether: How to deploy scalable and secure APEX on OCI

Oracle APEX is very popular, and it is one of the most common usecases that I see with my customers. Oracle Architecture Center offers a re...