Oracle’s SCN Flaw – could it happen in DB2?

I read an article on a flaw in Oracle that was recently discovered. It had to do with the SCN number that is constantly increasing in any database:

http://www.infoworld.com/d/security/fundamental-oracle-flaw-revealed-184163-0

Thanks to my friend Fitz for bringing it to my attention – and you really should read all 6 pages of that article – it is fascinating for a database geek.

Essentially it indicates that there is not only a hard limit to the SCN(281,474,976,710,656), but also a “soft” limit that prevents databases from even approaching that hard limit. The SCN in Oracle is roughly equivalent to the LSN(Log Sequence Number) in DB2. (side note, Oracle also has an LSN, but it’s something else) This number is incremented with every single change on the system in both Oracle and DB2. It cannot be reduced. There are some bugs and other things that can drastically increase the SCN on Oracle that make this a particular issue.

Upon reading this I immediately wondered if the same thing could be an issue in DB2. My overall conclusion would be “not at this time”. I say that because it’s amazing the rate at which storage increases – we always have limits now that we think are just astronomical and can never be reached, and then some future group pays the price – the Y2K issues being the most obvious of these, but I also remember the Commodore64 – so named because it had a whopping 64K of memory, and was so much more powerful than the Commodore Vic 20 I still have in my basement.

DB2’s limit for the LSN is about 16 exbibites. Don’t know what an exbibite is? I didn’t either, until I looked it up. Here is Oracle’s hard limit side-by with DB2’s limit:

DB2’s LSN maximum (bytes)

18,446,744,073,709,551,616

Oracle’s SCN maximum (bytes)

281,474,976,710,656

Is the size difference the only reason I think this won’t be an issue for db2? No. The two key factors in the Oracle issue are:

  1. the “soft” limit that increases in terms of time since a particular point in time in 1988
  2. other bugs or actions which can increase the SCN – malicious or unintentional

Some of Oracle’s patching attempts are focused on each of these.

In 11g there’s a patched bug that caused the SCN to increase drastically when an online backup is done, and causes it to continue increasing drastically. That plus the fact that whenever two Oracle databases connect, they apparently sync SCN’s, always going with the higher one, of course, seem to be the most concerning factors here. There are apparently hidden configuration parameters and other methods that privileged users can use to increase the value.

The soft limit also sounds a bit odd. Originally, the soft limit was defined as the number of seconds since 00:00:00 01/01/1988 multiplied by 16,384. One of the patches apparently allows an administrator to increase that limit, multiplying by a larger number, but the concern there is what happens when a system with that patch and a SCN over the original soft limit connects with a database without the patch?

DB2’s LSN, in addition to increasing with normal activity, can also be increased by other databases in a DPF environment. It’s certainly used in a lot of places – Replication, as I recall, and in HADR – but I can’t currently imagine the situation that either of those two would actually increment the number on another system (obviously HADR pairs should be on the same #). It is not to my knowledge changed by a simple connect with another DB2 system. I was also not able to find any APARs noting increases in LSN as bugs nor any parameters that appear to artificially increase the LSN. I could not find anything on a soft limit like Oracle has.

DB2 has well-defined behavior when it does reach or get close to these limits:

http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.ha.doc/doc/c0057412.html

It essentially warns you multiple times ahead of time and then if it does reach the limit, puts the database into read-only mode. Info World’s article on this made it sound as though the database behavior on reaching the soft limit was an unknown bad thing that could include data corruption.

This isn’t an issue or even close to one for any of my clients – they just aren’t the huge ones who would be approaching any of the limits. But it is interesting all the same. It is theoretically possible for you to reach this limit in DB2.

 

Ember Crooks
Ember Crooks

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, spent a year working with high-volume MySQL, and is now learning Snowflake. Ember shares both posts about her core skill sets and her journey learning Snowflake.

Ember lives in Denver and work from home

Articles: 544

6 Comments

  1. Hi Ember –

    I manage commerce DB2 at {Company Redacted}. I like your blog and occasionally head over here when I find the topic that pops up in my RSS reader interesting.

    Believe it or not, this has happened to me once in a previous life. It was a DB that managed a service bus architecture (webmethods, I believe). The transactions were very short-lived and the application always disconnected after the work. This frequent deactivation of DB2 caused log files to be truncated and advanced the LSN number. Eventually it went over and it got reset to FFFFFFFs and DB was unaccessible/read-only. I think we had to export and reload the data to fix the issue.

    Going forward, we manually activated the DB every time after a recycle, so that the disconnects didn’t deactivate the DB and in-turn didn’t cause the log truncations.

    Fun memories.

    Cheers –
    Jipy

    • Good to know it can happen, and a good reminder of one of the many reasons to activate your database if you don’t have an application that has a continuous connection like WebSphere Commerce does. I support some ESB databases myself, but nothing with the kind of volume you saw. At least in my experience they’re usually smaller databases since most of the data is transitory – so at least possible to export and reload the data, though no outage like that is good.

      I notice that it was just with db2 9.7 that the number was increased to the value I listed above, so they’re apparently aware of the need to increase it.

      • One other comment – one of the most concerning things with the Oracle flaw is that once you reach the limit for one database, you reach it for all interconnected databases. Bad enough for one databases, but for multiple databases in your organization?

  2. Hi, Ember,

    Data Replication affects the LSN of target databases by the simple fact that it’s replaying transactions from the source database. However, it’s just issuing SQL (insert, update, delete) the same way any other app does. It doesn’t touch or even know about the target’s LSN directly (unless you’re also capturing from the target :).

    In other words, Data Replication would likely only push the target towards an LSN limit if the source workload was pushing both source database towards the limit.

    thanks
    david

    • Thanks for the details. I haven’t actually worked with Replication since version 7, and while I remember that’s where I learned how to find the LSN using db2flsn, I was thinking that it was only for some investigative purposes on one side, not that it was synchronized. So even my statement that it’s “used” by replication is an overstatement – it’s used by replication the same way as any other app “uses” it.

  3. Even though IBM are increasing the size of the RBA (Relative Byte Address) field for DB2 z/OS 11, several companies have been hit by this issue for their DB2 z/OS running in single system mode in the recent past. So they would have been happy if IBM had come up with this improvement one or two versions earlier. While the RBA can be reset to zero, some downtime is unavoidable to get this done. DB2 z/OS also has a soft limit after which a warning message is issued. And IBM provides good and active support for customers who realize that the hard limit is coming closer.

    The whole thing is however NOT an issue for DB2 z/OS in Data Sharing mode as these systems use a different implementation (LRSN – Logical Record Sequence Number).

    So it looks like yes, it can happen to some flavours of DB2, but it is extremely unlikely to happen to recent versions of DB2 LUW.

    And one might assume that vendors sometimes copy the wrong ideas and concepts from their competitors. One of the many shades of “copy right”…

Leave a Reply to Ember CrooksCancel Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.