LAB-time: upgrading Grid Infrastructure (GI) from 12.1 to 18c

LAB-time: upgrading Grid Infrastructure (GI) from 12.1 to 18c

First of all I like to start with telling you that this blog-post is intended only for testing, playing, learning,… things. It is not intended for production use and currently 18c is only available for Cloud and engineered systems environments. For on-premises availability check Mike Dietrich’s ( @MikeDietrichDE ) blog here: https://mikedietrichde.com/2018/03/20/when-will-oracle-database-18c-be-available-on-prem/ . There I like his sentence that I will quote:

Anyway, for all the details and announcements, please see always the single-source-of-truth MOS Note: 742060.1 – Release Schedule of Current Database Releases.

This being said, this blogpost will tell you the upgrade path I decided to take to upgrade my 12.1 Grid infrastructure (GI) to 18c.
My homelab is running a 4 node flex cluster with 2 serverpools and one policy managed database. That one I will keep on 12.1. I know I should upgrade it as well to 12.2 but hey … can’t do it all at once.

The main problem which we will face are the exadata features which are currently enforced. The same happens when you try to start an 18c database on premise. The error you will get is

You will see that, apart from this, the upgrade goes pretty smooth. I’d like to thank Mahmoud Hatem (@Hatem__Mahmoud) for doing his research about why this happens. You can read his discoveries here

Installation

Software staging

As of 12.2 setting up the GI become a lot easier as it used to be in earlier days. Just unzipping the software and running a setup script. How hard can it be?

Create the new directories

And this has to be done on all the nodes.

Unzipping the software, has to be done as the owner of the Grid Infrastructure on the first node only:

Prechecks

As with each installation, prerequisites should be met. That’s what we need our good friend cluvfy for.
Call me paranoia, but I usually first want to have my peace of mind that my current running cluster is ok.

And then I usually gather some evidence.

That way, I can always refer back on “which was the output again”.

Cluvfy

As the Grid infrastructure owner, run the cluvfy in pre install mode.

This will check our environment about potential issues which will hold you back from from upgrading. You see I will be brave and attempt a rolling upgrade 🙂 for the rest it is similar to the cluvfy command you’re used to.

When I ran it, something caught my attention:

and at the bottom

I do ignore the swap, I know about that and you should not ignore it for production, but as a sandbox playgarden … you got the picture.
I HAVE to run the fixup scripts

Afterwards I have to patch my 12.1 cluster. The only patch a base install 12.1 cluster needs is patch 21255373. It’s a full rolling patch which is applied using opatchauto and I did not have any issues on my environment, so I won’t cover that here.

After this, patched the system with the required patch (rolling ofcourse) and reran the precheck:

So for this cluster, we’re good to go (I would not continue in prod, but now I’m repeating myself).

Setup

As stated before, installing a cluster is now as simple as running a configuration script. At least, kind of. Instead of runInstaller we now use gridSetup.sh.

And the GUI pops up.

We go for the upgrade.

Make sure all your nodes are listed and test the ssh-equivalence, it should be working already, but better safe than sorry.

As it is just for test and playing, I won’t register it in Cloud control now.

I have started it with a responseFile and I did fill in my Oracle Home. That’s why I can’t change it here I think. I like the idea of keeping the oracle base and oracle home separated. But that’s another discussion if this is ok or not.

I usually run my root scripts part of the installer. I know you can do it manually, but the only scripts he’s running, and he will ask you before starting, is the rootupgrade.sh. So we know what he’s doing and if it fails, then there is the retry button because since 12 the rootupgrade.sh is restartable. So no harm in doing it this way.

I like this idea! If you have a big amount of nodes you can separate them in batches. This also saved my a** a little as between the batches you get a pop-up to ask you if it is ok to continue with the next batch. I used this time in between to correct the missing _ parameter in the asm spfile to make sure that during the installation always at least two asm instances are available. Yes this is something I definitely would do in production, but it’s to get it running. Also, we know that in July it is planned to be released for on premises, so no fiddling around anymore by then, but for now, it does help.

The very well known moment of truth.

it’s my lab … /u01 and swap are pretty small, so this is safe to ignore. He will complain with a dialog box that you choose to ignore this message and you can confirm that you’re sue about it.

This is something I would definitely recommend. Always save your response files! You never know what you need them for. For re-running your configuration assistants for example 😉

And there the fun starts! After a while it pauses, and do not click anything yet!

During my installation I had node1 and node2 in one batch and node3 and node4 in the other batch. What happened during the rootupgrade.sh was that indeed the asm instance did not come up properly due to the error

This wasn’t too much of a problem as my database was still able to connect to asm through the other surviving asm instances. The moment I saw that I hit this error I started the instance using a pfile containing the spfile entry and the underscore parameter. When all was done, I recreated the spfile containing the _exadata_features_on parameter. The proxy instances did pick up their pfile in $GRID_HOME/dbs and started up without any issue.

If you have only 2 nodes, it can be an option to put each node in a separate batch. It seems a bit overkill at first, but it gives you a pause to make sure you always have an asm instance available and to connect to it so the assistants don’t fail. When your both asm instances and their proxies are back online, then click “Execute now” and the installer continues.

Then It’s time for the configuration assistants. 

If for some reason or another you loose your session if you want to rerun the config assistants, then you can rerun them using gridSetup.sh and giving the execConfigTools flag.

This went actually pretty smooth as soon as I found out on how to get around the GIMR issue. Check ISSUE 4 further in this blogpost. Afterwards … all was done and I had a running 18c cluster.

Next steps was to

  • enable the volume and acfs volume GHCHKPT.
  • enable and start the rhp (rapid home provisioning)

In my case they were not enabled by default. You can choose, or you do it in the brand new fancy asmca and click around. In the settings box, you can enter the root passwordt, which makes life a little easier, or you use the commandline. It’s up to you.

For rhp, you must do it using cli as the grid user

After doing all that this was the result:

One happy DBA 🙂

The summary of the issues and their workaround comes next.

 

Issues and their workarounds

ISSUE 1

During one of the upgrade attempts, my installation kept complaining I wasn’t on the first node. Afterwards I found out I found it had to do with DNS. I Installed my old cluster using shortnames and wanted the new nodes to have their fully qualified domain name. In the logs he then sees that it doesn’t match exactly and he tells you that you’re not on the first node. The logfile you’re looking for is cluutil2.log.

So to make sure for our own peace of mind, do the installation on the master node. It’s easy to retrieve who this master node is:

So you see that in my particular case labvmr01n01 is my master node and  I will thus perform the upgrade from the master node.

ISSUE 2

ASM instances

If you’re just like me too stubborn to check some things upfront sometimes. Ok I admit, this was on first attempt, but I would highly recommend to make sure your asm proxy instances are running. I needed them to make sure the upgrade succeeded.  Also, make a not on where your spfile is located in asm:

WARNING: this is an example! In the rest of my journey, the spfile might differ. What I did is also create a copy on the filesystem “just in case”.

If you find yourself in the same troubles as me, then you would end up in an asm instance who refused  to start and teases you with

We can get around this. If you’re stuck and you don’t have a copy of your spfile, find the parameters back in the asm alertlog and construct it yourself with some creativity. In asm alert log you will seen something like this:

This is the moment where the underscore parameter comes into play the first time. Construct a pfile containing the spfile and the underscore parameter, then you can include the underscore in the spfile and you’re good to go again (but only until the proxy instances pop up).

Then start the asm instance and get it online (shut it down afterwards again, because in this stage your upgrade assistant may be hanging and then you can just retry the operation)

Oh some nice to know. Don’t try to be smarter than Oracle and set it upfront, 12.1 doesn’t recognise it and will not start due to invalid parameters. At this point it’s in the spfile in the 18c version, so all good now.

proxy instances

For the proxy instances, they are a little different. The easiest workaround I found to get them starting and remain consistent during the process is to give them a pfile in /u01/app/18.0.0/grid/dbs . If you do this upfront, you only need to add them to node 1 as during the gridSetup.sh the home is copied over.

So in the end, I have these 4 files on all 4 nodes, just in case some instance is not on it’s normal node which can happen in a flex cluster.

The content of these files is the same for every file

ISSUE 3

This is completely my fault by running the installer of 12.1 manually during a rebuild instead of using my scripts. I ended up with different groups. It is normal that MUST match. So what I did is, I copied my response file 12.1 to 18c and then i started the gridSetup.sh with the -responseFile option. That way you can convince the installer to use some other variables.

ISSUE 4

The GIMR (Grid Infrastructure Management repository ). This puzzled me during the first time I tried to upgrade. I admit, it was a bit late already, but it looked like the pfile was coming back or being generated. After some digging and reading scripts it was actually pretty simple. The assistant for the gimr firsts starts it up using its own pfile which it has backed up in the old $GRID_HOME/dbs and tries to drop it in order to recreate it.

In my, and i repeat: this is my particular case, I had screwed up (before) my GIMR already and as i do have limited resources I already deleted it. I know it is not healthy and I would strongly advise against doing so especially for production or real use clusters. That’s why the /u01/app/18.0.0/grid/crs/sbs/dropdb.sbs script failed. If you follow that carefully you can remove the GIMR manually and edit the script so it returns 0 then the installer accepts the retry. If you decide to do this, make sure you know what you’re doing and understand what is happening because if you leave some things behind Oracle doesn’t expect the rest might fail as well and we don’t want that. For creating the GIMR following command is used internally:

And that explains why I was thinking the pfile was rebuild each time. To work around this is very easy. Edit the template and add the underscore parameter.

 

So folks, this was all. It was a lengthy post this time, but …

As always, questions, remarks? find me on twitter @vanpupi

 

Leave a Reply

Your email address will not be published. Required fields are marked *

5 × four =

%d bloggers like this: