Update on Ember’s Data Science Journey – February 18, 2019

Posted by

I’ve been moving along well this year on my data science journey. I’ve been thoroughly enjoying an edX class: Python for Data Science from UC San Diego. I started using Jupyter Notebooks (Python Kernel) over a year and a half ago, so I am enjoying the course on three levels:

  1. It seems easy because I have a background in Python, basic programming, and Linux systems
  2. I’m learning more about the Python libraries I use for Db2 health checks, and more about DataFrames that builds on my self-taught knowledge.
  3. It has me actually doing some (very simple) data science practice projects!

I’m finding it far less challenging than the MITx probability class that I tried last fall. I’m not 100% sure this is a good thing yet, but I’m hoping later courses in the UC San Diego MicroMasters will build my skills in areas where I’m more lacking. I’m also finding the self-paced format fits my life so much better. I can work ahead several weeks when I have more time, and I can ignore the course entirely for a week when I am really busy. I had worried that the lack of deadlines would have me slacking too much, but I’m keeping up just fine so far.

My favorite week in the class so far was the one that talked about data visualization. This is an area I want to delve into more and more, and I learned several important things on how to honestly and compellingly display data. One of my priorities after completing the other courses in the MicroMasters will be to find a dedicated course in data visualization, because I can see how much more I have to learn in this area.

I’ve also enjoyed the Data Science generalities I’ve learned. The basic descriptions of Machine Learning have made such a difference in my understanding of these topics.

I’m finding it much easier to devote a reasonable amount of time and do work I’m proud of with this course. I plan to take Probability and Statistics in Data Science using Python when I have completed this course.

I’ve started perusing job postings to see if my perfect unicorn of a job – something that uses my advanced Db2 skills and lets me learn professional Data Science (or pays for a graduate degree in Data Science) really exists. I’m still not sure it really does exist. I’m not waiting for some perfect unicorn job to start applying what I learn, though. I plan to start applying what I’m learning in my data science training to performance data I have for Db2, and continuing to enhance my health check process with Jupyter Notebooks.

Lead Database Administrator
Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 18 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science. Ember lives in Denver and work from home

3 comments

  1. In this post you state “…Python libraries I use for Db2 health checks”. In your pdf, 20180202DB2Night202.pdf from your Db2 Nightshow appearance on page you have a setup environment slide that appears to have several Python libraries. Are these the Python libraries you use for Db2 health checks or are there more?

    1. I’m constantly adding and updting. My current list includes these:
      import ibm_db
      import ibm_db_sa
      import sqlalchemy
      %load_ext sql
      import matplotlib
      import numpy as np
      import matplotlib.pyplot as plt
      from matplotlib import cm
      import matplotlib.dates as mdates
      from datetime import datetime
      import pandas as pd
      from IPython.display import display, HTML, Markdown
      import nbextensions
      %matplotlib inline

      Most of those are for various visualizations I use. The nbextensions and associated configurator are things I just added last week, but they can add an interactive table of contents and really makes navigating a long workbook easier. George Baklarz also has some extensions that are nice, but I haven’t kept up with them much.

      I also have a github repo(https://github.com/ecrooks/db2_and_jupyter_notebooks) where I have a simplified version of the health check(DB2_HealthCheck_Malta.ipynb). I’m doing a presentation on the health check specifically at IDUG NA, and will put an updated version in the github repo for that with some changes and nice improvements.

    2. Oops, sorry, the presentation on Health Checks using Jupyter Notebook was not approved for IDUG NA this year. I’ll get an updated version of my stripped-down health check out there sometime in the next few months anyway.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.