By special request, I’m going to share some details on how to do a one-time offline clean up of guest users.
PLEASE NOTE: Following are suggestions only – it is critical to thoroughly test this before doing this in production. This is one of the most tested and still most failed changes I do.
First off, check out the entry on pruning guest users. It has some good information on how to reduce guest user creation, as well as the ongoing methodology for it that you should implement the second you finish the one-time pruning in order to avoid getting in the same situation.
When do I use this process?
Generally this process is used on clients who have been live for at least a year without pruning guest users. But it may not even be needed for all of those – it depends on the rate at which guest users are created. This process is a last resort, so I’ll usually try dbclean first – for 10,000 or so records, and calculate the time it would take dbclean to do the work running constantly based on the deletion rate for those 10,000 multiplied by the number of records I have to delete. I’ve come up with numbers over 120 hours before, and it’s usually somewhere above 40 hours that I resort to this process. That may vary depending on the windows you have to do deletions – if you can just leave dbclean running during normal processing (if your hardware is over-sized enough this will work) then you may not ever have to use this process. Obviously if you have proper pruning in place starting at go-live and your guest user creation is being done properly then you should never have to use this process.
General Concept
It is expensive to delete. I’ve seen guest user deletions on older hardware run at just 100 per MINUTE. I assume a rate (if I haven’t been able to test) of 150 per minute. Why so slow? The member table is directly or indirectly connected to nearly every table in the database with Foreign Keys, so a delete here checks just about every table in the database (800 or more) for possible deletes. Some sites may even create users at a greater rate than that.
So usually you’ll be deleting 80% or more of the members and users tables. It is much faster to export the data you want to keep, and then do a LOAD with the replace option. This will still cascade to all tables on the integrity checks.
The Export
So this can be a bit tricky. There are some users you won’t want to delete. You have to craft the select statement for your export just right. Here are things you know you don’t want to delete:
- Users/Members with Orders (delete any abandoned shopping carts first, if you aren’t already using dbclean to purge them)
- Users/Members where the USERS_ID/MEMBER_ID is less than 0 – these are used for internal commerce purposes
- Users/Members with a REGISTERTYPE that is not G
- Members that do not have a corresponding User
- Other users that are administrative users
The LOAD
Once you have exported the files, you’re loading them right back into the same database. You’ll want to do the work on the database server and not on the app server.
My favorite method is to use nonrecoverable. This gives you time after completion to take a backup and re-set HADR if needed. Remember that LOADs are not replicated by HADR without some work or without a restore.
I’d also recommend specifying a messages file so you can delve deep into any failures.
If you’re using multiple files, you’ll also want to use the REPLACE option for the first file loaded, and then the INSERT option subsequent files.
There’s nothing special other than that for the LOAD command.
The Set Integrity
While the Primary Key is enforced during load the Foriegn Keys are not. After a load, a set integrity must be done. Simple you say? Well, the problem is that due to the sheer volume of RI in a Commerce database, most of the tables will be placed into check pending. So you really have to have a script to check for tables in a check pending status and run a set integrity statement on them. There will be hundreds. As part of this I choose to have any exceptions written out to exception tables – this is especially critical while you are testing the process.
The general syntax I like to use is:
create table <exception_schema>.<table> like <schema>.<table>
set integrity for <schema>.<table> immediate checked for exception in <schema>.<table> use <exception_schema>.<table>
This process will take at least as long as the LOAD if not longer. You’ll see it take longer on larger tables.
You may also find some circular foriegn keys in this process – tables that both reference each other. If they’re in check pending and you try to issue a set integrity on one of them, you’ll get error SQL3608N. This error indicates that a table cannot have constraints checked while it’s parent table is in check-pending state. You’ll get this error for both (or all) tables. So in that case, you simply have to specify both (or all) tables involved in the same statement – multiple tables can be done in the same statement.
Testing
Testing is absolutely critical for this. I don’t think I’ve done one without at least 3 iterations of tests. The only way to truely test is on a full copy of your production data, so I take over a development or QA environment and restore a full copy of the production data into it, and then do the full process on that environment, and ask someone to do a regression test when I’m done.
While going through the testing, you want to pay special attention to the exceptions that are kicked out during the set integrity. If you have missed one or more users, you’ll find that your set integrity kicks out rows in a table where you do not want any rows deleted – such as CATENTRY or one of the PROMO tables. For every table with exceptions, use the commerce info center to determine whether it is a table where you expect deletes to cascade to (such as ADDRESS) or whether it is a table you don’t want deletes on due solely to the deletions of users. You will likely end up altering your export SQL or splitting it into multiple SQL statements to make sure you’re catching all the users you want to keep while you’re testing.
Production Execution
When you actually perform this in production, you will have to have your site unavailable. Stopping WebSphere Commerce on your app servers. The reason is that your member and users tables will be partially or wholly unavailable during the process, and as other tables go into a check pending state, they will also be unusable. This will cause results for the site that are unreliable at best.
You should have an accurate timing estimate from the testing you’ve done on a different environment – remember to allow time for reviewing the exception tables, and a bit of leeway – this one stinks to have to back out of and do again.
You’ll have to have your DBA doing the work, your Commerce Admin to stop/start Commerce, and whoever you have for regression testing available.
If you’re using HADR you will need to immediately take a backup and restore it over to your standby database server. If you don’t, it will look like you’re able to fail over, but when you go to do so, the tables loaded will be marked unavailable and your site will not function.
Backout Plan
Personally, I try to go into every change with at least one if not two or three backout plans. Before you start the work, take a full database backup. Also export all data from the users and member tables(preferably in del format so you can parse through it if needed), and save the files off for potential backouts.
If you decide to stop the change before the first load has completed (because say, it is running too long), you may just be able to cancel out of the load. Generally your quickest backout option is just to restore the whole database. You can load data from the full exports that you took ahead of time, but that will still require the set integrity on most tables, which is very time consuming. If you end up having to cancel out after you’ve started the set integrity steps, you MUST use the full restore method, because there has likely been data deleted from other tables.
So let me know if you try it – I’d welcome comments on other gotchas.
Ember, thanks for the great tip and insight.
Is there any new update to the methodology given that there are newer versions of WC and DB2?
– Sean
Not really, though I haven’t had to do it a few years. I’ve got a client coming up who might need it, we’ll see. In many cases, as long as your guest user creation rate isn’t too high, you can simply let dbclean run for a few weeks 24/7 and remove the users that way. A few deadlocks or timeouts might result, but sometimes it’s not bad enough to actually have to use another method like this. I also recommend deleting some smaller number using dbclean to get good timing estimates for deleting with dbclean. I generally assume I can delete 150 guest users per minute, but sometimes it is faster as hardware gets faster. Also if you have one of those unusual WCS sites that doesn’t take orders, you can modify the dbclean delete syntax to not check for orders and therefore run much faster.