Advent of Code 2021 in Db2 LUW SQL – Day 01!

Posted by

Check out my general blog entry on Advent of Code 2021 for background and resources. In short, I’m working through the Advent of Code 2021 programming challenge using Db2 LUW SQL. This is the detailed entry for day 1.

AOC Day 1
My day 1 solutions are available in my GitHub repo.

Part 1 Problem Statement

AOC Day 1

As the submarine drops below the surface of the ocean, it automatically performs a sonar sweep of the nearby sea floor. On a small screen, the sonar sweep report (your puzzle input) appears: each line is a measurement of the sea floor depth as the sweep looks further and further away from the submarine.

For example, suppose you had the following report:

199
200
208
210
200
207
240
269
260
263

This report indicates that, scanning outward from the submarine, the sonar sweep found depths of 199, 200, 208, 210, and so on.

The first order of business is to figure out how quickly the depth increases, just so you know what you’re dealing with – you never know if the keys will get carried into deeper water by an ocean current or a fish or something.

To do this, count the number of times a depth measurement increases from the previous measurement. (There is no measurement before the first measurement.) In the example above, the changes are as follows:

199 (N/A - no previous measurement)
200 (increased)
208 (increased)
210 (increased)
200 (decreased)
207 (increased)
240 (increased)
269 (increased)
260 (decreased)
263 (increased)

In this example, there are 7 measurements that are larger than the previous measurement.

How many measurements are larger than the previous measurement?

General Commentary on the Problem

My initial reaction is that this seems like exactly the kind of problem that is meant for SQL. I’ve used the LAG function before, and it seems tailor made to this kind of question. The initial challenge that comes to mind, though, is that Db2 does not maintain the order of data in a table. While Db2 LUW likely to return data in the order in which it was inserted, it is not guaranteed, and can return data in any order unless you tell it the order you would like it in. This means I’ll need to add a sequence number as I insert the data into a table so that I can maintain the correct order.

My Solution

My day 1 solutions are available in my GitHub repo.

First, I need to get the data into a table in the database. To do that, I create a table with this syntax:

create table aoc_day01 (
    seq int generated by default as identity primary key
    , depth int );

I am keeping it simple here since I know I’m going to be scanning the whole table, and the input data provided is only 2,000 rows anyway.

I import my input data into the table with this syntax:

call admin_cmd ('import from /database/day01_input.csv of del 
                      insert into aoc_day01 (depth)');

This is the code I used from a Jupyter Notebook. That is why I’m using admin_cmd.

Once I’ve done a quick verification that the data was inserted the way I intended, this is the SQL I came up with to generate the answer to part 1:

with depth_diff as (select depth - lag(depth,1) over(order by seq asc) as diff
    from aoc_day01)
select count(*) as count 
    from depth_diff where diff > 0

I’m using the lag function here over data ordered by my sequence number to calculate the difference between each row and the preceding row. I like using a CTE for that, as it’s easy to test the SQL for the CTE separately to understand what output I get there before running the count of rows with numbers greater than zero. This same methodology would work with different data sets easily.

Part 2 Problem Statement

AOC Day 1 – you can only see part 2 after you’ve completed part 1

Considering every single measurement isn’t as useful as you expected: there’s just too much noise in the data.

Instead, consider sums of a three-measurement sliding window. Again considering the above example:

199  A      
200  A B    
208  A B C  
210    B C D
200  E   C D
207  E F   D
240  E F G  
269    F G H
260      G H
263        H

Start by comparing the first and second three-measurement windows. The measurements in the first window are marked A (199, 200, 208); their sum is 199 + 200 + 208 = 607. The second window is marked B (200, 208, 210); its sum is 618. The sum of measurements in the second window is larger than the sum of the first, so this first comparison increased.

Your goal now is to count the number of times the sum of measurements in this sliding window increases from the previous sum. So, compare A with B, then compare B with C, then C with D, and so on. Stop when there aren’t enough measurements left to create a new three-measurement sum.

In the above example, the sum of each three-measurement window is as follows:

A: 607 (N/A - no previous sum)
B: 618 (increased)
C: 618 (no change)
D: 617 (decreased)
E: 647 (increased)
F: 716 (increased)
G: 769 (increased)
H: 792 (increased)

In this example, there are 5 sums that are larger than the previous sum.

Consider sums of a three-measurement sliding window. How many sums are larger than the previous sum?

General Commentary on the Problem

So a rolling sum over each group of 3, then find the differences, then look for positive differences. As with many AOC second parts, had I coded the first part slightly differently I could have essentially run the same code with different hard numbers. It occurs to me that I could easily write a stored proc or function to do this work that would take the number of days as an input. However, I’m trying to work in SQL alone wherever possible since the procedural languages are rather different between RDBMSes, while SQL itself has more minor differences from one platform to another.

My Solution

I use the same table and data as I used in part 1. The SQL I use to get get the correct answer is:

with depth3 as (select seq
    , case when count(depth) over(order by seq asc ROWS 2 PRECEDING) = 3
        then sum(depth) over(order by seq asc ROWS 2 PRECEDING) 
        else null
        end as depth3
    from aoc_day01
), depth3_diff as (select depth3 - (lag(depth3,1) over(order by seq asc)) as diff
        from depth3
)
select count(*) as count 
    from depth3_diff where diff > 0

I’m using two CTEs here so that the final select statement is essentially the same as part 1.

In the first CTE (depth3), I’m using a case statement to eliminate the first couple of results that don’t have 3 rows to sum up, and that I don’t want to consider for further processing. I’ve coded this so that it is only dependent on the number of rows and not on the sequence number itself. This allows me to be anywhere in the sequence, and to delete and re-load the table as many times as I want without depending on that sequence number starting at 1.

In the second CTE (depth3_diff), I’m just calculating the difference between rows, much like the CTE in the answer to part 1.

If I needed to instead calculate the rolling values over 5 rows, I would just replace anywhere I’ve got 3 with 5 and also the places where I have 2 with 4. If I wanted to code this as a stored procedure, I could have this as a variable.

Summary

Today’s problem was fairly easy, and something SQL was made to do quickly and easily. I didn’t have to wait for anything to run here. I’m looking forward to the challenge of future days!

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, and is now pivoting to focus on learning MySQL. Ember shares both posts about her core skill set and her journey learning MySQL. Ember lives in Denver and work from home

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.