Machine Learning Post-GDPR: My conversation with a Spotify chatbot

I have a dilemma. I love Spotify and find their recommendation engine and machine learning algorithms spot on (no pun intended). I think their suggestions are fairly accurate to my listening preferences. As well as it should be. I let them take my data.

Screen Shot 2018-05-25 at 10.43.19 PM.png

Spotify has been harvesting my personal likes and dislikes for the past 6 years. They have been collecting metadata on what music I listen to, how long I listen to it, what songs I skip and what songs I add to my playlists. I feel like I have a good relationship with Spotify. Then came the emails. Service Term Agreements, Privacy Changes, Consent Forms. Something about GDPR.

Which brings me to the topic of discussion, what is GDPR and how will services that rely on our metadata to power algorithms and machine learning predictions function in a post-GDPR world? First, let’s back-track a bit for a second. Over the past week, most people’s inboxes have looked like this…

Screen Shot 2018-05-25 at 10.06.06 PM.png

Or you feel you have won an all expense paid “phishing” trip for your gmail account.

Screen Shot 2018-05-25 at 10.02.50 PM

Nope, your email hasn’t been hacked, but something monumental has occurred in the data world that has been years in the making. Two years ago, the European Union Parliament passed a law to have strict user-friendly protections added to an already stringent set of EU data regulations. The GDPR stands for the General Data Protection Regulation and went into force today in the member countries of the European Union. There are a number of things that GDPR seeks to accomplish, but the overarching mood of the regulation seeks to streamline how a user of a good or service gives consent to an entity that uses their private data.

Now I hear you saying, “I thought that GDPR was just an EU regulation, how does it affect me as an American? I don’t like this European way of things and this personal rights stuff.”

Well, you are correct, it goes into effect May 25, 2018 (today or in the past, if you are reading this now), but any organization that has European customers will need to abide by it. Since most organizations have European customers, everyone is jumping in and updating their privacy policies, begging for your consent in your inbox.

“Who cares? Aren’t companies going to use my data anyways without me knowing?” Well, not anymore. Not only does GDPR allow individuals and groups to bring suit against companies that are in breach of GDPR, but these companies can be fined 4 percent of their annual global revenue. Yes revenue, not profit, and global revenue at that. Already today Facebook has been hit with 3.9 billion Euros and Google with 3.7 billion Euros in fines. Source

In a book published by Bruce Schneider, Data and Goliath: The Hidden Battles to Collect and Control Your World speaks to the deluge of data that has congested society’s machines. This data exhaust, as Schneider refers to it, has a plethora of extractable data, known as metadata. Metadata, while not linked to specific conversations you have on your phone or the text contents of your emails, is collected by organizations we use in our daily life. Facebook, Amazon, Google, Yahoo (yes, some of us still use Yahoo email), Apple all rely on gigantic server farms to maintain petabytes of saved metadata.

Screen Shot 2018-05-25 at 9.52.37 PM

Schneider refers to the information age being built on data exhaust and he is correct. Modern algorithms that power recommendation engines for Amazon, the phone towers that collect cell phone signals to show your location, the GPS inside your iPhone, your Netflix account, all rely on metadata and the effective storage of, well, a boat load of data for lack of a better term.

Over the course of the past couple decades, we have been consenting to privacy agreements that allow these companies, and companies like them, to use our data for purposes that we are not aware of. That is the reason for GDPR. This is where consent comes to play.

According to the New York Times, GDPR rests on two main principles. “The first is that companies need your consent to collect data. The second is that you should be required to share only data that is necessary to make their services work.” Source

Broken down, here is Translating Nerd’s list of important highlights of GDPR:

1. You must give your consent for a company to use your data
2. Companies must explain how your data will be used
3. You can request your data be deleted
4. If there is a data breach at a company, they have 72 hours to notify a public data regulatory agency
5. You can request access to the personal data that is kept
6. You can object to any processing of your personal data in any situation that arises
7. Simply put, consent must be easy to give and easy to take away

So after receiving Spotify’s updated privacy policy terms in my inbox, I decided to contact a representative of my favorite music streaming company (her name is Joyce). Namely, I was curious if I failed to give my consent for Spotify to use my data, would their recommendation engine not have data from my personal account to provide those terrific suggestions that all seem to begin with Nine Inch Nails and regress slowly to somewhere near Adele. Regression to the mean perhaps? A conversation for another post perhaps or a comment on society’s love of Adele.

The conversation I had went like this:



As for most chatbots, sorry, Joyce, I was referred to a standard account term source. So I needed to do some more digging. I found that as stated in GDPR terms, I could deactivate certain features such as what data from Facebook is kept on me or using third party sites to help target ads to me.

Screen Shot 2018-05-25 at 10.53.09 PMScreen Shot 2018-05-25 at 10.53.13 PM

But this isn’t enough, what I am curious about is can a machine learning algorithm like Spotify’s recommendation engine work without access to my data? And then I saw it, all the way at the bottom of their user terms in this link. A clear and definitive answer that no, to delete my personal data I would need to close my account, which means no Spotify or recommendations for Adele.

Screen Shot 2018-05-25 at 10.54.14 PM

Simply put, Algorithms run on data. No data, no prediction. No prediction, no recommendation engine to suggest Adele. My intuitions proved correct, that data powers the machine learning algorithms we build. This should be obvious. Just as you need fuel to power a car, you need data from a user to run an algorithm. What GDPR ensures is that the data you are giving is used SOLELY for the service you requested. But it remains unclear how this is differentiated in the user agreement.

And this is exactly where GDPR places us as a society. How much data are we willing to give to receive a service? What if the service is enhanced with more personal data? We rely on the algorithms from recommendation engines to tell us what to buy from Amazon, to inform us what Netflix shows we would like, to help match us on social dating websites like OKCupid and, but that will require the full consent of the user to receive the full power of the predictive algorithm that drives these products and our cravings for smarter services.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s