Technology, machine learning and algorithms
Technology, machine learning and algorithms
It's cold outside. Let's speak about AI winter (Ep. 93)
In the last episode of 2019 I speak with Filip Piekniewski about some of the most worth noting findings in AI and machine learning in 2019. As a matter of fact, the entire field of AI has been inflated by hype and claims that are hard to believe. A lot of the promises made a few years ago have revealed quite hard to achieve, if not impossible. Let's stay grounded and realistic on the potential of this amazing field of research, not to bring disillusion in the near future.
Join us to our Discord channel to discuss your favorite episode and propose new ones. I would like to thank all of you for supporting and inspiring us. I wish you a wonderful 2020!Francesco and the team of Data Science at Home
The dark side of AI: bias in the machine (Ep. 92)
This is the fourth and last episode of mini series "The dark side of AI". I am your host Francesco and I’m with Chiara Tonini from London. The title of today’s episode is Bias in the machine
C: Francesco, today we are starting with an infuriating discussion. Are you ready to be angry?
F: yeah sure is this about brexit? No, I don’t talk about that. In 1986 the New York City’s Rockefeller University conducted a study on breast and uterine cancers and their link to obesity. Like in all clinical trials up to that point, the subjects of the study were all men. So Francesco, do you see a problem with this approach?
F: No problem at all, as long as those men had a perfectly healthy uterus.In medicine, up to the end of the 20th century, medical studies and clinical trials were conducted on men, medicine dosage and therapy calculated on men (white men). The female body has historically been considered an exception, or variation, from a male body.
F: Like Eve coming from Adam’s rib. I thought we were past that...When the female body has been under analysis, the focus was on the difference between it and the male body, the so-called “bikini approach”: the reproductive organs are different, therefore we study those, and those only. For a long time medicine assumed this was the only difference.
Oh good ...This has led to a hugely harmful fallout across society. Because women had reproductive organs, they should reproduce, and all else about them was deemed uninteresting. Still today, they consider a woman without children somehow to have betrayed her biological destiny. This somehow does not apply to a man without children, who also has reproductive organs.
F: so this is an example of a very specific type of bias in medicine, regarding clinical trials and medical studies, that is not only harmful for the purposes of these studies, but has ripple effects in all of societyOnly in the 2010 a serious conversation has started about the damage caused by not including women in clinical trials. There are many many examples (which we list in the references for this episode).
Give me oneResearchers consider cardiovascular disease a male disease - they even call it “the widower”. They conduct studies on male samples. But it turns out, the symptoms of a heart attack, especially the ones leading up to one, are different in women. This led to doctors not recognising or dismissing the early symptoms in women.
F: I was reading that women are also subject to chronic pain much more than men: for example migraines, and pain related to endometriosis. But there is extensive evidence now of doctors dismissing women’s pain, as either imaginary, or “inevitable”, like it is a normal state of being and does not need a cure at all.
The failure of the medical community as a whole to recognise this obvious bias up to the 21st century is an example of how insidious the problem of bias is.
There are 3 fundamental types of bias:
One: Stochastic drift: you train your model on a dataset, and you validate the model on a split of the training set. When you apply your model out in the world, you systematically add bias in the predictions due to the training data being too specific
Two: The bias in the model, introduced by your choice of the parameters of your model.
Three: The bias in your training sample: people put training samples together, and people have culture, experience, and prejudice. As we will see today, this is the most dangerous and subtle bias. Today we’ll talk about this bias.
Bias is a warping of our understanding of reality. We see reality through the lens of our experience and our culture. The origin of bias can date back to traditions going back centuries, and is so ingrained in our way of thinking, that we don’t even see it anymore.
F: And let me add, wh
The dark side of AI: metadata and the death of privacy (Ep. 91)
Get in touch with us
Join the discussion about data science, machine learning and artificial intelligence on our Discord server
We always hear the word “metadata”, usually in a sentence that goes like this
Your Honor, I swear, we were not collecting users data, just metadata.
Usually the guy saying this sentence is Zuckerberg, but could be anybody from Amazon or Google. “Just” metadata, so no problem. This is one of the biggest lies about the reality of data collection.
F: Ok the first question is, what the hell is metadata?
Metadata is data about data.
F: Ok… still not clear.Imagine you make a phone call to your mum. How often do you call your mum, Francesco?F: Every day of course! (coughing)
Good boy! Ok, so let’s talk about today’s phone call. Let’s call “data” the stuff that you and your mum actually said. What did you talk about?
F: She was giving me the recipe for her famous lasagna.
So your mum’s lasagna is the DATA. What is the metadata of this phone call? The lasagna has data of its own attached to it: the date and time when the conversation happened, the duration of the call, the unique hardware identifiers of your phone and your mum’s phone, the identifiers of the two sim cards, the location of the cell towers that pinged the call, the GPS coordinates of the phones themselves.
F: yeah well, this lasagna comes with a lot of data :)
And this is assuming that this data is not linked to any other data like your Facebook account or your web browsing history. More of that later.
F: Whoa Whoa Whoa, ok. Let’s put a pin in that. Going back to the “basic” metadata that you describe. I think we understand the concept of data about data. I am sure you did your research and you would love to paint me a dystopian nightmare, as always. Tell us why is this a big deal?
Metadata is a very big deal. In fact, metadata is far more “useful” than the actual data, where by “useful” I mean that it allows a third party to learn about you and your whole life. What I am saying is, the fact that you talk with your mum every day for 15 minutes is telling me more about you than the content of the actual conversations. In a way, the content does not matter. Only the metadata matters.
F: Ok, can you explain this point a bit more?
Imagine this scenario: you work in an office in Brussels, and you go by car. Every day, you use your time in the car while you go home to call your mum. So every day around 6pm, a cell tower along the path from your office to your home pings a call from your phone to your mum’s phone. Someone who is looking at your metadata, knows exactly where you are while you call your mum. Every day you will talk about something different, and it doesn't really matter. Your location will come through loud and clear. A lot of additional information can be deduced from this too: for example, you are moving along a motorway, therefore you have a car. The metadata of a call to mum now becomes information on where you are at 6pm, and the way you travel.
F: I see. So metadata about the phone call is, in fact, real data about me.
Exactly. YOU are what is interesting, not your mum’s lasagna.
F: you say so because you haven’t tried my mum’s lasagna. But I totally get your point.
Now, imagine that one day, instead of going straight home, you decide to go somewhere else. Maybe you are secretly looking for another job. Your metadata is recording the fact that after work you visit the offices of a rival company. Maybe you are a journalist and you visit your anonymous source. Your metadata records wherever you go, and one of these places is your secret meeting with your source. Anyone’s metadata can be combined with yours. There will be someone who was with you at the time and place of your se
The dark side of AI: recommend and manipulate (Ep. 90)
In 2017 a research group at the University of Washington did a study on the Black Lives Matter movement on Twitter. They constructed what they call a “shared audience graph” to analyse the different groups of audiences participating in the debate, and found an alignment of the groups with the political left and political right, as well as clear alignments with groups participating in other debates, like environmental issues, abortion issues and so on. In simple terms, someone who is pro-environment, pro-abortion, left-leaning, is also supportive of the Black Lives Matter movement, and viceversa.
F: Ok, this seems to make sense, right? But… I suspect there is more to this story?
So far, yes…. What they did not expect to find, though, was a pervasive network of Russian accounts participating in the debate, which turned out to be orchestrated by the Internet Research Agency, the not-so-secret Russian secret service agency of internet black ops. The same connected with the US election and Brexit referendum, allegedly.
F: Are we talking about actual spies? Where are you going with this?
Basically, the Russian accounts (part of them human and part of them bots) were infiltrating all aspects of the debate, both on the left and on the right side, and always taking the most extreme stances on any particular aspect of the debate. The aim was to radicalise the conversation, to make it more and more extreme, in a tactic of divide-and-conquer: turn the population against itself in an online civil war, push for policies that normally would be considered too extreme (for instance, give tanks to the police to control riots, force a curfew, try to ban Muslims from your country). Chaos and unrest have repercussions on international trade and relations, and can align to foreign interests.
F: It seems like a pretty indirect and convoluted way of influencing a foreign power…
You might think so, but you are forgetting social media. This sort of operation is directly exploiting a core feature of internet social media platforms. And that feature, I am afraid, is recommender systems.
F: Whoa. Let’s take a step back. Let’s recap the general features of recommender systems, so we are on the same page.
The main purpose of recommender systems is to recommend people the same items similar people show an interest in.Let’s think about books and readers. The general idea is to find a way to predict the best book to the best reader. Amazon is doing it, Netflix is doing it, probably the bookstore down the road does that too, just on a smaller scale.Some of the most common methods to implement recommender systems, use concepts such as cosine/correlation similarity, matrix factorization, neural autoencoders and sequence predictors.
The major issue of recommender systems is in their validation. Even though validation occurs in a way that is similar to many machine learning methods, one should recommend a set of items first (in production) and measure the efficacy of such a recommendation. But, recommending is already altering the entire scenario, a bit in the flavour of the Heisenberg principle of uncertainty.
F: In the attention economy, the business model is to monetise the time the user spends on a platform, by showing them ads. Recommender systems are crucial for this purpose.Chiara, you are saying that these algorithms have effects that are problematic?
As you say, recommender systems exist because the business model of social media platforms is to monetise attention. The most effective way to keep users’ attention is to show them stuff they could show an interest in.In order to do that, one must segment the audience to find the best content for each user. But then, for each user, how do you keep them engaged, and make them consume more content?
F: You’re going to say the word “filter bubble” very soon.
Spot on. To keep
The dark side of AI: social media and the optimization of addiction (Ep. 89)
Chamath Palihapitiya, former Vice President of User Growth at Facebook, was giving a talk at Stanford University, when he said this: “I feel tremendous guilt. The short-term, dopamine-driven feedback loops that we have created are destroying how society works ”.
He was referring to how social media platforms leverage our neurological build-up in the same way slot machines and cocaine do, to keep us using their products as much as possible. They turn us into addicts.
F: how many times do you check your Facebook in a day?
I am not a fan of Facebook. I do not have it on my phone. Still, I check it in the morning on my laptop, and maybe twice more per day. I have a trick though: I do not scroll down. I only check the top bar to see if someone has invited me to an event, or contacted me directly. But from time to time, this resolution of mine slips, and I catch myself scrolling down, without even realising it!
F: is it the first thing you check when you wake up?
No because usually I have a message from you!! :) But yes, while I have my coffee I do a sweep on Facebook and twitter and maybe Instagram, plus the news.
F: Check how much time you spend on Facebook
And then sum it up to your email, twitter, reddit, youtube, instagram, etc. (all viable channels for ads to reach you)
We have an answer. More on that later. Clearly in this episode there is some form of addiction we would like to talk about. So let’s start from the beginning: how does addiction work?
Dopamine is a hormone produced by our body, and in the brain it works as a neurotransmitter, a chemical that neurons use to transmit signals to each other. One of the main functions of dopamine is to shape the “reward-motivated behaviour”: this is the way our brain learns through association, positive reinforcement, incentives, and positively-valenced emotions, in particular, pleasure. In other words, it makes our brain desire more of the things that make us feel good. These things can be for example good food, sex, and crucially, good social interactions, like hugging your friends or your baby, or having a laugh together. Because we are evolved to be social animals with complex social structures, successful social interactions are an evolutionary advantage, and therefore they trigger dopamine release in our brain, which makes us feel good, and reinforces the association between the action and the reward. This feeling motivates us to repeat the behaviour.
F: now that you mention reinforcement, I recall that this mechanism is so powerful and effective that in fact we have been inspired by nature and replicated it in-silico with reinforcement learning. The idea is to motivate (and eventually create an addictive pattern) an agent to follow what is called the optimal policy by giving it positive rewards or punishing it when things don’t go the way we planned.
In our brain, every time an action produces a reward, the connection between action and reward becomes stronger. Through reinforcement, a baby learns to distinguish a cat from a dog, or that fire hurts (that was me).
F: and so this means that all the social interactions people get from social media platforms are in fact doing the same, right?
Yes, but with a difference: smartphones in our pockets keep us connected to an unlimited reserve of constant social interactions. This constant flux of notifications - the rewards - flood our brain with dopamine. The mechanism of reinforcement can spin out of control. The reward pathways in our brain can malfunction, and this leads to addiction.
F: you are saying that social media has LITERALLY the effect of a drug?
Yes. In fact, social media platforms are DESIGNED to exploit the rewards systems in our brain. They are designed to work like a drug. Have you been to a casino and played roulette or the slot machines?
Why is it fun to
More powerful deep learning with transformers (Ep. 84) (Rebroadcast)
Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention.In this episode I explain what these mechanisms are, how they work and why they are so powerful.
Don't forget to subscribe to our Newsletter or join the discussion on our Discord server
Attention is all you need https://arxiv.org/abs/1706.03762
The illustrated transformer https://jalammar.github.io/illustrated-transformer
Self-attention for generative models http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf
Good Intro in ML