When I first started learning about data science, the amount of material out there was daunting. It is hard to find your way in what you should learn and how you should learn. I thought I’d share what has been working for me and what didn’t.
Driving Principles and Learning Styles
Everyone’s learning style is a little different. I learn best by doing and by writing, and somewhat by reading. If I watch videos, I take notes to make that time actually worthwhile. Videos alone often don’t work well for me.
I also came into this quest wanting to avoid getting locked in to any one vendor. My career so far is very much built on IBM’s Db2, and I love Db2 with all my heart. Coming up on 20 years of experience with it, I am something of an expert, at least in some specific areas. However, being tied to just one vendor is also difficult. The market for Db2 jobs is not as wide as it used to be, and having this as my expertise narrows my path a bit. I’d like more options.
This is of note because it would be really easy for me to dive into IBM’s certifications and tools for data science and data analytics and go through the training out there on them, but I don’t want to learn one vendor’s tools to start. I want to start broadly and pick up freely available tools, perhaps specializing in tools as opportunities present themselves, but not before then. I also believe that IBM has decimated their education staff and is not keeping up on maintaining the materials they do have out there.
I have known from the start that I’ll have to work on some skills more than others. For better or worse, my bachelor’s degree is a business-based computer degree, and not a computer science based computer degree. This means I may be a bit lacking in math and in the more complicated aspects of programming, though I’ve scripted my whole career. I consider myself competent in SQL and an expert in relational databases. I have dabbled in a number of different programming languages throughout my career. I also had about the best intro to programming class ever, that focused more on the practice of programming and the analytical skills than on any particular programming language, and that has really served me well.
Resources I’ve Used
When I first started looking for resources, there was a lot to consider. I wasn’t sure I was really going to go for a graduate degree. Perhaps some certificates would be enough along with my career in a data-related field.
The first thing I really took was an introduction to Data Science on https://cognitiveclass.ai/. This was a natural place for me to start, since it is offered by IBM, and I’ve discussed offering classes on this platform back when it was called Big Data University. The class was really excellent and gave me a good overview of what data science is and isn’t and some of the important skills and roles within data science.
I ultimately decided not to move forward with more classes on this platform because of the tie to IBM, though. I think they offer courses that include using non-IBM tools, but the IBM connection is still there.
I considered several options on EdX. Most of these courses are NOT self-paced, and there’s an advantage to that for me. I tend to go overboard learning something when I’m excited and then burn out on it. I was thinking about a micromasters as a good option. I started a course from MITx on Probability and Statistics. It was a fantastic class, and I learned so much. It was really challenging, though, and I took it at a time when the stress from work was just about killing me. I ended up dropping the class, but still learned so much in the time I was in it. That knowledge is serving me well as I move on to other courses.
I then completed Python for Data Science from UC San Diego. This was a good course, introducing data science concepts while working in python. It was ultimately a bit too much of a swing towards easy from the rigor of the MITx class. I also realized that if I wanted to get into data science, I probably wanted to have an actual master’s degree. I spent a lot of time reading job descriptions, and this seemed to be a fairly common item on either the must have or nice to have lists.
I was also realizing around this time that it would be hard in my current job to have the time and mental energy to pursue a master’s degree. I worked quite a few hours, participated in a pager rotation where I was also an escalation point, and was doing management work that just sapped my mental and emotional energy. I found an opportunity to work in an environment where personal time was respected, I knew a lot of the people, and which would also provide tuition reimbursement. Working 7.5 hour days and getting comp time for time worked outside of that really makes it easier to focus on education in my personal time.
The new job has also required me to learn a lot. At first I was worried that this might make learning outside of work more difficult, but it has really spurred me to do new stuff that may also be more relevant in an eventual career change. I’m a year in to the new job, and it has given me enthusiasm for my work back.
Throughout my journey, there have been topics I’ve needed to brush up on. It has been a long time since the statistics classes I took in undergraduate school, and remembering the details there has required some study.
The book Naked Statistics1 was really useful for remembering and re-gaining some of the basics about statistics. I bought it through audible and mostly listened to it while doing household chores.
I’ve found Khan Academy to be an amazing resource for specific topics. It was a revelation when understanding Markov chains. It helped me shore up my knowledge on confidence intervals. Oddly, as my kids have been schooling from home, it has been an invaluable resource for them, too. Khan Academy is fully free, but I’ve chosen to donate in a recurring fashion because I have seen and continue to see so much benefit from their content.
While videos often aren’t my first choice, the practice problems and presentation style make these a bit better for me.
(Note, the book link above is an Amazon Affiliate link. I might get a few pennies if you use it. Feel free to search for it from your favorite bookseller.)
I eventually decided to pursue a master’s degree. I’ve decided to go with Georgia Tech. Even though the degree is in “Analytics” and not “Data Science”, the topics are what I’m looking for, the cost is so much more reasonable than many programs, and the program has been established online for more than 2 years, which makes it one of the older programs out there – as odd as that may sound. There are many new data science programs out there, so investigation on their background is important. As I went into my first course, one of the requirements was using the R language. R is used in a few courses, with Python in others. I had NO experience whatsoever in R, so thought I’d go through some tutorials online to get ready.
I was looking for alternatives to data camp because of the controversies there. I stumbled on dataquest.io2. I immediately loved the interactive style. On the left hand side of the screen is explanatory text, and on the right hand is a coding interface where you code for just about every topic covered. This fits my interactive learning style very well. No videos, and that’s just fine for me. At the time, most of the lessons were free. They’ve since started charging for the later lessons in a learning path, but at least you can get a good idea of what you’re getting into by doing the first lesson on something. I’ve paid for an annual membership, as I have found that they have the perfect set of lessons to prepare me for my next class – the combination of linear algebra and python is exactly what I needed to brush up on. I dearly want to go through some of the lessons on graphical presentation, too, but need to stay focused for now on what I need for the next class on my path.
(a note, the link above to dataquest is a referral link, and you’ll get a discount for using it. I would also get free access if 4 people end up signing up. I would never recommend something for money that I wouldn’t recommend otherwise, and I strive to let my readers know if I ever get anything out of recommending something. Feel free to navigate to the site yourself to avoid the referral link.)
Another source I’ve found for specific topics is the StatQuest youTube channel. This has been great for digging deep into specific statistical topics to better understand them. I particularly enjoyed the videos on Support Vector Machines. The presentation is a bit corny, but this is kinda fun, too.
In all the videos I’ve been using, as a native English speaker, I’m often playing them at 1.5x speed. I tend to alternate between that and pausing, so that I can take notes or look something up.
There are a ton of resources out there. I thought I’d share a few of my favorites and why I like them to help others. I’d love to hear what you’ve used and liked in the comments below.
There are two affiliate links on this page, listed below. I do not let money influence what I recommend on this blog, and would recommend these things whether there was financial benefit for me or not. Feel free to search for these resources and get them through other means.
- Naked Statistics is a link to Amazon. I truly don’t quite know what I get from them, but I suspect it’s not much.
- dataquest.io is a referral link. If 4 people sign up, I get free access to the platform – there’s no benefit beyond that.
Hi Ember – I picked up your blog from the Cousera Data Engineer material. This is just a “thank you” for sharing your experience and taking the trouble to put this very useful summary together – it is much appreciated.