Global Lead Developer for Data Science at General Assembly

Matt sat down with Translating Nerd in a conference room at the Washington, DC data science and programming school General Assembly. Matt teaches a re-occurring 12-week, full-time, data science program that takes data novices and transforms them into employment-ready data scientists. He discusses the data science pipeline, machine learning procedures and sticking points that students need to overcome.

Screen Shot 2018-09-21 at 11.22.53 AM.png


Matt currently is a global lead instructor for General Assembly’s Data Science Immersive program in ten cities across the U.S and most enjoys bridging the gap between theoretical statistics and real-world insights. Matt is a recovering politico, having worked as a data scientist for a political consulting firm through the 2016 election. Prior to his work in politics, he earned his Master’s degree in statistics from The Ohio State University. Matt is passionate about putting the revolutionary power of machine learning into the hands of as many people as possible. When he isn’t teaching, he’s thinking about how to be a better teacher, falling asleep to Netflix, and/or cuddling with his pug.

How to contact Matt?

Twitter: @matthewbrems

Should I call my mom?: Machine Learning using NLP for non-nerds

My mom really likes to write. She writes emails that would make folks in college English departments give valedictorian status to. Now I know this makes me seem like a bad son, but there are times when I would like an indicator on the emails that tell me the urgency of responding. When there is great news, then I can respond over the weekend by giving her a phone call, but if there is something that is upsetting, I need a red flag to pop up and let me know that I need to respond via phone call or else my father is going to start texting me saying, “Dude, call your mom.” (Preface to say, see image below, I think everyone should be calling their mom when asking this question. But this does not make a convincing data science blog).

Screen Shot 2018-09-12 at 8.10.38 PM

Well, natural language processing (NLP) offers a solution. In fact, NLP is at the forefront of data science industry specialists for its use on unstructured (text) data. Once arduous tasks of combing through piles of PDFs and reading long emails have been replaced by techniques that allow the automation of these time-consuming tasks. We have spam detectors that filter out suspicious emails based on text cues predicting with astonishing accuracy which emails are spam and which are notes from grandma. But what lacks in the data science community is the “Mom Alert.”

Screen Shot 2018-09-12 at 8.15.00 PM

To understand the complexities that NLP has to offer, let’s break down the “Mom Alert.” First, we need a corpus (collection) of past emails that my mom has sent me. These need to be labeled by hand as “upset” mom and “happy” mom. Once I have created labels for my mom’s historical emails, I can take those emails and break them down into a format that the computer algorithm can understand.

But first, I need to separate these emails into two sets of data, one with known labels of upset and happy and the other with no labels that I want to predict. The set with labels will be called the training emails and the one without labels will be called the testing emails. This is important in the NLP process because it is going to help me build a model that can help generalize, or better predict the future.

Screen Shot 2018-09-12 at 8.09.39 PM

Training and Testing Data

It is important to note that I will want to pre-process my data. That is, I want to make sure that all the words are lowercase because I don’t want my model to treat words that are capitalized and not as different words. I will then remove all words that offer little value and occur frequently in the English language. These are called “stop words” and usually offer little semantic value. “And”, “but”, “the”, “a”, “an” give little meaning to the purpose of a sentence and can all go. I will also remove punctuation because that is not going to give me information for my model. Finally, I will take any word that is plural and bring it down to its singular form, thus shoes to shoe and cars to car. The point of all this pre-processing is to reduce the number of words that I need to have in my model.

Now that I have separated emails into training and testing sets, pre-processed the words, I need to put those emails into a format that my computer algorithm can understand: numbers. This is called the “vectorization” process where we create a document-term matrix. The matrix is simply a count of the words in a document (mom’s email) and the times that a word occurred. This matrix is then used to compare documents across themselves. The reason we pre-processed these words was that the vectorization process would result in a massive, extremely clumsy matrix.

Screen Shot 2018-09-12 at 8.03.16 PM

        Document-Term Matrix (aka, Vectorization)

As you can see by the image above, each email is displayed as their own row in the matrix (document-term matrix). This document-term matrix then takes each unique word in ALL the emails from the training emails and places them in the columns. These are called features. Features of a document-term matrix can make this matrix incredibly long. Think of every unique word in thousands of emails. That is one long matrix! Which is the reason we did pre-processing in the first place! We want to reduce the number of columns, or features, which is a process known as dimensionality reduction. In other words, we are reducing the numbers that our algorithm needs to digest.

Screen Shot 2018-09-12 at 8.40.13 PM

Wrong Matrix

Now that I have my emails represented as a matrix, I can create an algorithm to take those numerical representations and convert them to a prediction. Recall from a previous post that we introduced Bayes Theorem. (see Translating Nerd’s post on Bayes). Well, we could use Bayes Theorem to create a predicted probability that my mom is upset. We will call upset = 1 and happy = 0.

Side Note: I know this seems pessimistic, but my outcome variable is going to be the probability that she is upset, and that is why we need to constrain our algorithm between zero and one. Full disclosure, my mom is wonderful and I prefer her happy. Again, see images under the text.

Screen Shot 2018-09-12 at 8.34.55 PM


Now, there are other algorithms that can be used, such as logistic regression, support vector machines and even neural networks, but let’s keep this simple. Actually, the first email spam detectors used Naïve Bayes because it works so darn well with large numbers of features (words in our case).  But “naïve”, what does that mean you may ask? The model makes the naïve assumption that these words are not related to each other. We know this cannot be true because that is how we create sentences, that is, words have meaning when coupled with each other. Of course, each algorithm has drawbacks but Naïve Bayes proves to be quite accurate with large amounts of features (ie, words).

Once we have implemented our Naïve Bayes model on the document-term matrix, we can make a prediction on each email in the test set. This test set acts as a validation on the training set which will allow us to make changes to our model and get as close as possible to a generalized model for determining an upset email from mom. A keynote of machine learning is to create a model that doesn’t just fit our training data, because we need it to be vague enough to generalize to new data. This is called overfitting a model and should be avoided at all costs. Of course, there is a trade-off between underfitting a model that needs to be balanced but again, I digress (see image below).

Screen Shot 2018-09-12 at 8.04.25 PM

Machine learning basics for future posts

Once we have tuned our Naïve Bayes algorithm to both fit the training emails and generalize well enough to future emails in the test set, we are ready to test it out on a new email from mom. When mom sends us a new email, our algorithm will output a predicted probability. Let’s say that any email that has a predicted probability of 50 percent or more (0.5) will be called upset (1) and any predicted probability that is under 50 percent will be called happy (0).

Screen Shot 2018-09-12 at 8.31.33 PM

If we wanted a simplistic model we could look at the above new emails that have been run (without labels) through our Naive Bayes algorithm. It looks like the predicted probabilities have cleared our 0.5 threshold and made a classification of happy, upset, upset, and happy. It looks like I will have some calls to make this evening!



Images sources:






World Bank to Front End Developer: Coffee with Andres Meneses

A couple months ago Andres Meneses and I sat down at a busy Adams Morgan cafe in the heart of Washington, DC to discuss his success as a frontend developer. Incredible to his story is how he navigated from a successful long-term position at The World Bank to a coding boot camp to gain a foothold in the web development world. His background is one of the many success stories popping up from mid-career individuals making a jump into the world of data science/technology development.



Bio: Andrés Meneses is the proud owner of the happiest dog in Washington, DC and a passionate pro-butter advocate. He is also a web developer, committed first and foremost to optimizing user experience. He involves users from the outset of all projects because, as he likes to put it, “There is nothing worse than working hard on a digital product that no one ever uses!” By leveraging his combined expertise in product and project management and digital communications, Andrés approaches his work by thinking broadly. Why and what does this organization have to share, and how will that information engage, inform, surprise, and help the intended audience? And how will the information best deliver tangible outcomes? Throughout his time working in all types of organizations, most notably his more than 10 years at the World Bank as a technical project manager, he never stopped learning and made the leap a few years ago into full-time web developer. Just like his dog, he has never been happier.


Three websites Dr. Nicholas uses to keep current:

Contact information:



Interview with General Assembly Data Scientist: Dr.Farshad Nasiri

Dr.Farshad Nasiri is the local instructor lead for the Data Science Immersive (DSI) program at General Assembly in Washington, DC. He received his B.S. from Sharif University of Technology in Tehran and his Ph.D from George Washington University in mechanical engineering where he applied machine learning tools to predict air bubble generation on ship hulls. Prior to joining General Assembly, he worked as a computational fluid dynamics engineer and a graduate research assistant. As the DSI instructor, he delivers lectures on the full spectrum of data science-related subjects.


Farshad is interested in high-performance computing and implementation of machine learning algorithms in low level, highly scalable programming languages such as Fortran. He is also interested in data science in medicine specifically preventive care through data collected by wearable devices.

Favorite Data Books:

Favorite websites:

Contact info: