Update on Ember’s Data Science Journey – January 10, 2019

Well, I knew it was going to be slow, and that is part of why my plan is a 20-year plan instead of just a short-term plan. The fall was particularly difficult due to a couple of issues at work and two conferences that I absolutely loved, but that took time away from the already 60-hour work weeks that I needed to be putting in.

In October, the first test of the MITx/edx class on probability that I was taking fell on the same week as the first IBM Gold Consultant summit at the IBM lab in Markham, Ontario. I have been pushing IBM for this event for several years, as they’ve offered one for Db2 for z/OS for years. I’ve dreamed of visiting the Toronto lab for at least a decade, and it was really a dream come true to be in a room with the top 20 consultants in the world for Db2 on LUW, along with the developers I’ve met before and only heard about. The ability to really interact with the people who actually write IBM Db2 was amazing. An actual visit to one of the Labs that held the machines they code Db2 on had huge grins on all of our faces, too.

Participating in that 8 hours a day and then trying to keep up with the work I actually get paid for around the networking in the evenings left nearly no room for studying, and my grades started to show it. I realized that I was taking away what little time I had with my family for the class, and that the spike at work was temporary and would be ending around the end of November. So I dropped the class.

I was hoping they would offer the same class in January and I could pick it up again, but alas, they don’t offer it again until May. I was actually really enjoying and learning from the class and found it delightfully challenging. I found topics in the Gold summit that tied in and there were things I understood better from the little classwork that I did.

Since the starting class of that MicroMasters isn’t available until May, I have found a different MicroMasters through UC San Diego to try out until then. It starts with a class that draws on some things I already know – Python for Data Science. I started using Jupyter Notebook for some of my work on database performance and health checking nearly two years ago, so I have some foundation in Python. I’ve found the first week of the class to be much easier. Whether that is better or worse I’m not sure yet. But it feels more like something I can accomplish. It is also self-paced. I thought I would benefit from a bit more rigidity in the schedule, but working it in with a full-time job an a family, I’m finding the ability to work ahead a benefit in the first week or two of the course, anyway.

The MITx probability class had more points for “classwork” – answering questions with multiple opportunities while working through the material. And also more points for “homework” – problem sets to be completed each week, also with immediate feedback. The UC San Diego one doesn’t have these, and instead depends on projects and tests exclusively for the grades, so we’ll see how that works. After having started the other probability class, I’m very interested to see how the probability and statistics class for this MicroMasters goes.

In any case, I’m back in the saddle and studying again after taking December to rest and recuperate. I’m also finding time to blog again, which was very challenging in September, October, and November.

I’m still committed to my data science journey, I just have to remember to give myself a break for the times I need it. I’m also still committed to still blogging about and presenting on Db2 topics. A colleague and I are putting together a day-long introduction to Db2 for the IDUG North American Technical Conference in Charlotte, NC in June.

Ember Crooks
Ember Crooks

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, spent a year working with high-volume MySQL, and is now learning Snowflake. Ember shares both posts about her core skill sets and her journey learning Snowflake.

Ember lives in Denver and work from home

Articles: 555


  1. Hi Ember,
    I really enjoy reading your blogs and i feel inspired by you. I have strikingly similar area of interest as of yours. I have been db2 luw dba for last 8 years but recently i have grown interest towards data science for which i am investing 2 hours daily. Sometimes it becomes hard to commit daily 2 hours because of work pressure and other commitments but i really currently committed to the data science thing and hopefully i can write awesome data stories in future.
    One part which caught my attention in this post is about the jupyter notebook you have been using for database performance and health checking. Can you shed some light on it what exactly you use python with jupyter notebook for database performance? Thanks in advance.
    -Suvradeep Sensarma

    • I can write a blog series on this at some point. If you have access to IDUG archives, I gave a presentation on it at IDUG EMEA in 2018. Much of a health check consists of running SQL. I have a simplified version of my health check available on GitHub. In addition to what is there, I do some graphing of data, using histograms, bar graphs, and time lines. I also am constantly adding to and improving my health check process. I take the raw data generated as an appendix to a formal health check document, and pull out the figures and data that seems relevant to the circumstances of the particular database I’m working with.

      There are other examples in that GitHub Repo of what I’ve been trying and doing with Jupyter Notebook. Feel free to use anything there.

      I also have topic-specific jupyter notebooks I use to do things like check up on data pruning and other items that generally require running a bunch of SQL and benefit from graphical representations.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.