I once had an interview with a data science consulting firm that I completely bombed. It wasn’t the SQL test and it wasn’t the questions on machine learning algorithms. The question that stumped me was when the interviewer asked, “What is your T?” Dumbstruck, I answered that I didn’t know this terminology. Politely, the interviewer explained that she wanted to know what my core competencies were, what area was I known as the “go-to” guy and where was I competent in performing on my own. This brings me to what I believe is the most important story in data science; the story of you.
The T-model was defined by Harlan Harris in “Analyzing the Analyzers,” published by O’Reilly Press in 2013. Heralded as “an introspective survey of data scientists and their work,” Harris and his colleagues argue that there are many categories of individuals calling themselves data scientists. Defining which category one fits in depends on the contents of their T. I created the following T as a visual example.
The vertical of the T is one’s core competency, their comparative advantage in the field of advanced analytics. This is the area where researchers distinguish themselves and are frequently called upon to perform as the leader of a team. You might remember these people from graduate school that seemed to effortlessly explain the underpinnings of a complex derivation on the board, pulling proofs out of thin air as if a celestial power were whispering answers in their ear. Now, you don’t have to have analytic superpowers to complete your vertical of your T, but you should be quite competent in that area.
The horizontal bar are the areas that you feel comfortable operating in, but are not solely known for. These can be areas where you utilize for a project every now and then or are part of your analytical process. You might need a little extra time with your buddy Google or a sojourn on StackOverflow, but these are areas where you know if you are given a little extra time, you will have no trouble performing.
The programs to the right of the T are the programming languages, software, and technologies that go hand-in-hand with your competencies. For example, if someone lists machine learning as their core competency, they will most likely have R or Python listed in parallel to perform those analyses. With Python would be the libraries or modules that one would depend on to perform these duties. These could include pandas for data manipulation, numpy for quick calculations, matplotlib and seaborn for visualization, beautifulsoup for scraping data from the web, nltk for natural language processing and scikitLearn for machine learning algorithms.
The purpose of defining your T is not to include every buzzword technology and programming language that is hot, but to include those resources that get you 90% to your goal. That final 10% could be something that you use seldom and need to call on additional expertise to complete. Creating a firm narrative of your core strengths helps the nascent and advanced data scientist alike explain what they have to bring to the team. It creates a mutual understanding between those hiring you to what you bring to the table. But more importantly, it provides a visual elevator pitch when communicating what data science is to you.