I have recently encountered an interesting issue – after an IPS (Intrusion Prevention System) upgrade, client applications suddenly stopped being able to connect to some database instances – not all of them, but just a to a subset.
What are IPSes?
IPSs are network devices that can both monitor network traffic and act on it. If they detect traffic that they do not “like”, they can sever the connection. In case of TCP traffic, an IPS can achieve this for example by sending a reset packet (technically, a packet with the RST flag bit set) to the sender (the client), telling the client that the server did not expect this traffic, and and a final packet (packet with the FIN flag bit set) to the receiver (the server) end of the connection, telling the server that no more traffic is to be expected on this particular TCP flow, thus allowing the server to close the TCP socket and release the resources used by it.
An important thing to notice is that if there is an IPS present in the network traffic flow between the client and the server, then a popular connectivity verification method of using telnet (or any other similar method that simply tries to open a socket but does NOT send actual data through) from the client to the server is in this case insufficient. Why? Because telnet connect attempt will work as long as applicable firewalls are open, but until some actual traffic is sent, IPS will not have anything to act on – will not have anything to apply its rules against. The accuracy of the IPS rule set cannot be tested without sending actual traffic of the type for which one is attempting to validate connectivity.
The Heart of the Problem
This is what was happening in this case – the team that validated the change used telnet, mistakenly believing that this was enough; however, since nothing changed with firewalls, telnet tests were all positive. Unfortunately, once the applications tried connecting to the DB2 servers, the new IPS decided that some of the connection attempts did not look right according to some built-in rule set and terminated those flows. This was confirmed after a look at the IPS logs showed messages explaining that IPS terminated IBM DB2 TCP client to server connections that did “not comply with DRDA in terms of message length”.
The cause of this behavior of the IPS was relatively simple to find. Since I knew that only a subset of connection attempts were failing, and that if connections to one database on an instance were terminated then connections to all databases on that instance were also being terminated, then – all other things being equal – the cause was most likely within the instance configuration settings. And one of the commonly changed instance configuration settings that specifically affect connection attempts to DB2 instances is AUTHENTICATION. This parameter on the affected instances was set to DATA_ENCRYPT (because, well… “reasons”), as opposed to SERVER_ENCRYPT on the instances which weren’t.
So – if an IPS causes you DB2 blues, complaining about “non-compliant IBM DB2 TCP” and client to server connections that do not “comply with DRDA in terms of message length”, the DB2 instance AUTHENTICATION parameter set to DATA_ENCRYPT setting might be the problem.