Advent of Code 2021 in Db2 LUW SQL – Day 04!

Check out my general blog entry on Advent of Code 2021 for background and resources. In short, I’m working through the Advent of Code 2021 programming challenge using Db2 LUW SQL. This is the detailed entry for day 4. It’s posting on February 1. I mentioned in my intro post that I have no intention of keeping up daily, because with a full-time job, I just can’t pull off the time it takes me.

AOC Day 4
My day 4 solutions are available in my GitHub repo.

Part 1 Problem Statement

AOC Day 4

— Day 4: Giant Squid —
You’re already almost 1.5km (almost a mile) below the surface of the ocean, already so deep that you can’t see any sunlight. What you can see, however, is a giant squid that has attached itself to the outside of your submarine.

Maybe it wants to play bingo?

Bingo is played on a set of boards each consisting of a 5×5 grid of numbers. Numbers are chosen at random, and the chosen number is marked on all boards on which it appears. (Numbers may not appear on all boards.) If all numbers in any row or any column of a board are marked, that board wins. (Diagonals don’t count.)

The submarine has a bingo subsystem to help passengers (currently, you and the giant squid) pass the time. It automatically generates a random order in which to draw numbers and a random set of boards (your puzzle input). For example:

7,4,9,5,11,17,23,2,0,14,21,24,10,16,13,6,15,25,12,22,18,20,8,19,3,26,1

22 13 17 11  0
 8  2 23  4 24
21  9 14 16  7
 6 10  3 18  5
 1 12 20 15 19

 3 15  0  2 22
 9 18 13 17  5
19  8  7 25 23
20 11 10 24  4
14 21 16 12  6

14 21 17 24  4
10 16 15  9 19
18  8 23 26 20
22 11 13  6  5
 2  0 12  3  7

After the first five numbers are drawn (7, 4, 9, 5, and 11), there are no winners, but the boards are marked as follows (shown here adjacent to each other to save space):

After the next six numbers are drawn (17, 23, 2, 0, 14, and 21), there are still no winners:

Finally, 24 is drawn:

At this point, the third board wins because it has at least one complete row or column of marked numbers (in this case, the entire top row is marked: 14 21 17 24 4).

The score of the winning board can now be calculated. Start by finding the sum of all unmarked numbers on that board; in this case, the sum is 188. Then, multiply that sum by the number that was just called when the board won, 24, to get the final score, 188 * 24 = 4512.

To guarantee victory against the giant squid, figure out which board will win first. What will your final score be if you choose that board?

General Commentary on the Problem

There are several challenges here. First, the data is not in a nice easy comma separated and normalized format. The numbers selected are ok in the first line, but I want each number in its own row, not in its own column. And the game boards are visual representations, so I need to break that data down into a normalized and relational format.

Data Prep for Today’s Problem

There are three major steps for data prep in today’s problem: creating the tables, loading data into them, and running the game to update the game boards.

Creating Today’s Tables

The table structure for the selected numbers is pretty straight forward. I simply want each number as a row in a table with a sequence number to tell me what order it was selected in – remembering that I cannot count on the order a table is returned in being the order it was inserted in. Here’s the table I created for that:

create table if not exists aoc_day04_picker (
    seq int generated by default as identity primary key
    , num_selected int not null
    , processed smallint not null default 0
)

Note that I had a processed column here to begin with that I didn’t end up using in either of my final solutions. It turned out I didn’t need to process each number one by one and track what I had done, but I instead chose to update selection order in a different way.

Now the table for the bingo games is a bit more complicated. Here I need to identify each value and where it lies (row_num and col_num) in which game number(game_num). I’m also adding a column to record the order in which that value was picked. I default that to 0.

create table if not exists aoc_day04_boards (
    seq int generated by default as identity primary key
    , game_num int not null
    , row_num smallint not null
    , col_num smallint not null
    , value int not null
    , picked smallint not null default 0
);
create unique index ix_aoc_day04_boards_position on aoc_day04_boards (game_num, row_num, col_num) include (picked) allow reverse scans;
create index ix_aoc_day04_boards_value on aoc_day04_boards ( value ) allow reverse scans;

Note that I also added a couple of indexes to help out with the way I see the table being accessed. Probably not really needed at this scale.

Loading Today’s Tables

Now that I have those structures, I need to break the input file down to insert it into the tables. I used python code for this:

line_num=0
game_num=-1
horiz_pos=0
vert_pos=-1
with open("data/day04_input.txt") as fin, open("data/day04_input.csv", 'w') as fout, open("data/day04_input_bp.csv", 'w') as pfout:
    o=csv.writer(fout)
    op=csv.writer(pfout)
    for line in fin:
        #display(line_num)
        if line_num == 0 :
            #display(line)
            line = line[:-1]
            #display(line)
            #display(type(line))
            picked_vals=list(line.split(","))
            #display(picked_vals)
            #op.writerows([picked_vals])
            for picked_val in picked_vals:
                display
                op.writerow([picked_val])
            line_num = line_num + 1
            continue
        if line == '\n' :
            game_num = game_num + 1
            vert_pos=-1
        #display("game_num: "+str(game_num))
        horiz_pos=0
        for val in line.split():
                #display([game_num, vert_pos, horiz_pos, val])
                o.writerow([game_num, vert_pos, horiz_pos, val])
                horiz_pos = horiz_pos + 1
        vert_pos = vert_pos + 1
        line_num = line_num + 1

I left in the display lines I used for troubleshooting this while I was working through this. I might have been able to do this in pure SQL using a stored procedure and more of an ELT approach, but I chose the ETL approach instead.

With that code splitting out the data into proper CSV files, I can then insert it into the tables:

call admin_cmd ('import from /database/day04_input_bp.csv of del 
    insert into aoc_day04_picker (num_selected)');
call admin_cmd ('import from /database/day04_input.csv of del 
    insert into aoc_day04_boards (game_num, row_num, col_num, value)');

Updating the Game Boards

Finally, I need to update the table that holds the boards by looking at the numbers that were selected and the order in which they were selected. This is essentially running all the games through picking all of the numbers so I can record the order in which the boards win. I do this with:

update aoc_day04_boards as b 
    set picked=(select seq from aoc_day04_picker p where b.value=p.num_selected)

Query to Answer Part 1

You can see in the notebook several interim steps I took to identify winning rows and columns and then narrow them down to get the first board to win. The final query that generates the correct answer for part 1 is:

with rowsum as (select game_num, row_num, count(picked) as count, max(picked) as max_seq
from aoc_day04_boards 
where picked !=0 
group by game_num, row_num
having count(picked)=5),
colsum as (select game_num, col_num, count(picked) as count, max(picked) as max_seq
from aoc_day04_boards 
where picked !=0 
group by game_num, col_num
having count(picked)=5),
allwin as (select game_num, max_seq from rowsum where max_seq=(select min(max_seq) from rowsum) union all select game_num, max_seq from colsum where max_seq=(select min(max_seq) from colsum) ),
finalwin as (select * from allwin where max_seq = (select min(max_seq) from allwin))
select sum(value) * (select value from aoc_day04_boards where game_num=(select game_num from finalwin) and picked=(select max_seq from finalwin))
from aoc_day04_boards
where game_num = (select game_num from finalwin)
    and picked > (select max_seq from finalwin);

Now, I’m a bit embarrassed with the number of CTEs I ended up using here and in the solution to part2, but it works. CTEs are a very natural way to me to break up a query and work through it part by part. Were I implementing this in anything other than a sandbox, I’d probably look for efficiencies and try to clean it up a bit more.

Part 2 Problem Statement

AOC Day 4 – you can only see part 2 after you’ve completed part 1

— Part Two —
On the other hand, it might be wise to try a different strategy: let the giant squid win.

You aren’t sure how many bingo boards a giant squid could play at once, so rather than waste time counting its arms, the safe thing to do is to figure out which board will win last and choose that one. That way, no matter which boards it picks, it will win for sure.

In the above example, the second board is the last to win, which happens after 13 is eventually called and its middle column is completely marked. If you were to keep playing until this point, the second board would have a sum of unmarked numbers equal to 148 for a final score of 148 * 13 = 1924.

Figure out which board will win last. Once it wins, what would its final score be?

General Commentary on the Problem

This wasn’t too hard after having the solution to the last one – I already had all the data I needed because of the way I structured things for that. However to get the answer, I ended up adding (I’m sure you’re shocked) another CTE.

My Solution


with rowsum as (select game_num, row_num, count(picked) as count, max(picked) as max_seq
from aoc_day04_boards 
where picked !=0 
group by game_num, row_num
having count(picked)=5),
colsum as (select game_num, col_num, count(picked) as count, max(picked) as max_seq
from aoc_day04_boards 
where picked !=0 
group by game_num, col_num
having count(picked)=5),
allwin as (select game_num, max_seq from rowsum as a where max_seq=(select min(b.max_seq) 
                 from rowsum as b 
                 where a.game_num=b.game_num  
                 group by game_num) 
           union 
               select game_num, max_seq from colsum as c where max_seq=(select min(d.max_seq) 
                 from colsum as d 
                 where c.game_num=d.game_num  
                 group by game_num) ),
eachwin as (select * from allwin as aw where max_seq = (select min(max_seq) from allwin as aw2 where aw.game_num=aw2.game_num)),
finalwin as (select * from eachwin where max_seq=(select max(max_seq) from eachwin))
select sum(value) * (select value from aoc_day04_boards where game_num=(select min(game_num) from finalwin) and picked=(select max_seq from finalwin))
from aoc_day04_boards
where game_num = (select game_num from finalwin)
    and picked > (select max_seq from finalwin)

Once again, I’m sure there’s a way to optimize and simplify this, but this worked to get the correct answer.

Summary

While I enjoyed this challenge, the holidays got in the way, and I ended up doing it a bit piecemeal over several weeks. Here’s hoping I make it to day 5 before Valentine’s Day!

My blog entries about Advent of Code 2021

Ember Crooks
Ember Crooks

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, spent a year working with high-volume MySQL, and is now learning Snowflake. Ember shares both posts about her core skill sets and her journey learning Snowflake.

Ember lives in Denver and work from home

Articles: 545

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.