OCI-C orchestration error with load balancers

OCI-C orchestration error with load balancers

There are some of those days that you just don’t get it anymore. Last few days were like that for me. One of my current projects is to provide a fully redundant infrastructure in the Oracle Cloud for some critical applications.

So I created the Database as a service with dataguard to another region, provisioned a vm (IAAS) for the webserver and tomcat and made some firewall rules. Check also my colleague Freek (you can find him here @dhoogfr ) his journey through the Oracle Cloud Rabbit hole. The entrance is here: https://www.exitas.be/blog/oracle-cloud-rabbit-hole-part-1/

Ok, there we go. As this is still a setup in preparation, I turn my instances off after work and boot them up the next morning. Yes, we are cost-aware 😉 Anyhow, The infrastructure was ready except for one part. The DR solution for the IAAS based webserver. As I could not find any other solutions than “take a snapshot and replicate it” or “use rsync to sync some files”, I decided to open an SR to search for official recommendations.

The engineer was friendly and recommended to implement a load balancer. But … load balancers only work in the same region. At least according the documentation. There you can find following text

The Public Load Balancer is ideal for load balancing incoming public internet traffic across multiple compute instances in different Availability Domains (ADs) within a region. The Private Load Balancer is ideal for private traffic within your Virtual Cloud Network (VCN) or for traffic originating from your on-premises network that is connected to your VCN via VPN or the FastConnect Service.

So I think that covers it and tells that it’s only inside the region. Anyhow, I usually tend to obey support, so I created a load balancer and after a while it had a healthy status

So there we go. Basically we need some listeners and server pools as well to use it but ok. As you can see, you can’t assign an IP Address. The creation does it automagically. Anyhow, before we can do something, the VM needs to be booted up.

After a while this was the result …

The reason, which is a little too long for the screen, is that he cannot allocate his static IP address in the subnet I have defined. The database as a service takes .2 and .4 So I did assign .3 to this VM.

The error in the orchestration is also not very clear …

Transient error… ok then. Next thing I did was shutdown everything in the subnet so everything was free and try to boot up the vm. But first I tried to ping the .3 address from one of the db nodes. Guess what … no reply. So after shutting down the database and other nodes I tried to boot only this vm. The result … surprise! Exactly the same.

Long story short … after a long service request, it turned out to be the load balancer who assigns itself an IP address at create time and keeps it. Even disabling the load balancer does not result in a successful boot. So in this case, there are 2 possibilites

  1. Change the ip address of the IaaS guest
  2. Delete the load balancer

Oracle Support gave me this answer I’d like to share as I did not find it directly in the documentation:

The longer explanation is that load balancer compute resources are owned by a different compute account than the the load balancer itself, which is owned by the customer. When a customer creates a load balancer, the compute instances backing the load balancer will be attached to customer’s ip network (thus instance and network belong to different account). Since the load balanacer gets its IP assignment from the customer’s IP network, the customer has to make sure to choose a static IP not used by any node on the network. A common practice is to choose static IP from the upper range since ip network assign IPs from the lower range.

A quick discussion with our own network admins turned out that you can basically assign any address to the load balancer and that network admins mostly have they own way of working, so I asked support about on “how high” the upper range starts and I was a bit surprised about the answer:

There is no definitive answer for this as of now, we recommend to use after initial 20-30 IP’s.

A man got to do what a man got to do, so I will change the ip address of my internal machines, but my personal opinion is that if there is a part of the subnet you can’t use, it should not be possible to be used. Even when a machine is down, the orchestration json is known. So according me, you can perfectly determine which addresses are statically assigned or not.

But for now … If you use compute classic and if you use iaas guests with static ip’s in a private subnet, make sure they don’t start lower than .20

As always, questions, remarks? find me on twitter @vanpupi

Leave a Reply

Your email address will not be published. Required fields are marked *

16 − 6 =

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: