Type in “learn to become a data scientist” into Google and you will get the following results: DataCamp, Udacity, Udemy, Coursera, DataQuest, etc. These Massive Open Online Courses, MOOCs for short, are invaluable when learning a new skillset. They allow students to enter a once guarded elite within the academic walls. They sharpen skill sets and add value to any resume.
However, I have noticed that they have become a crutch that is traded for comfort over the rigors of self-improvement past a certain point. This realization is not placed on others but is reflected in my own advancement in advanced analytics. We strive to learn the most advanced methods, the coolest visualizations, the highest performing algorithms. Yet, we seldom strive for originality in our work because we are afraid it might be too mundane.
To add to one’s proverbial toolkit in data science, MOOCs and educational resources that guide the user through a well-formatted analysis is crucial. There are little other ways to learn the fundamentals other than structured educational materials. But this can lead to the “MOOC trap.” A typical case study of the MOOC trap is the burgeoning data scientist who has completed an intensive 6-month online program. She has dedicated 300 hours of intense study, through both sandbox exercises and a few semi-structured projects. She feels like she has a basic understanding of a host of skills, but is timid to try her analytical toolset on a problem of her choosing. Instead, she signs up for another 6-month MOOC, a mere regurgitation of the material that she just covered. Enamored with the ads and displays of a polished portfolio on GitHub that the MOOC promises, she forks over another $200 a month.
This individual felt the excitement of looking for a question, venturing the internets for a dataset, and the feeling of struggle as she looked at the mess that real-world data provides. But she regressed back to the comfort of the MOOC. I feel the same in my own work. There are so many datasets that we as a community have access to, structured and unstructured, clean and well, terribly scattered and messy. We are trained through our educational systems in college/grad school, online courses and structured tutorials, to create something advanced and analytically perfect. We are pressured to post this to GitHub, to display our certification of accomplishment with a stamp from an official organization.
The problem with the MOOC trap is that it no longer trains us for the real world; it trains us to become great followers of directions. We fear that our analysis on an original piece of work will not be cool enough, it will not be advanced enough, and well, we might have grind just to produce an exploratory analysis of things that we might have already assumed. But this is the challenge, to create something original because it gives us ownership. Completing basic analytics with an original dataset that we went out and found adds to the data science community. This builds the foundations of what science is and hones our fundamental skills so sorely needed in the workforce.
While MOOCs offer a structured and nicely formatted addition to our repositories/portfolios of glistening analytical work, it has the potential to leave us in a comfortable position where growth decays. There is a certain point to where educational training and online courses can take us, but beyond that, it is a series of diminishing returns. Each nascent data scientist will have a different inflection point, but the feeling is the same; you have a burning question, but feel your skillset is unpracticed. In this instance, forgo the MOOC and find the data in the world. Produce the basic analysis, ask your peers to review, and struggle a little more. Only then will we grow as data scientists.