The Magic Ingredient - Data

In the present day, it would not be a surprise to compare companies like Google and Facebook as magicians. Can you guess their magic ingredient?


You guessed it right …….. Data……….Loads and loads of data.


You might be wondering what this big fuss about data is? How could it possibly help? Well, I will try to demystify it and help you understand why data is the basic building block of technology.


When you think of magic, the most common trick that comes to our mind is the card trick where the magician guesses what card you chose. In a similar context, I will tell you how data is useful for predicting something unknown. If you want to get fancier, you can call it predictive analytics.


Let's play a game.


The Game



You have three doors in front of you. What if I tell you that I have a Lamborghini hidden behind one of them, but the other two doors are empty. Now, it's your turn to take a guess on which door to choose.


Suppose you choose door A. I already know which doors are empty and which one contains the Lamborghini. Being your friend, I will open a door from the other two doors, which is empty. For argument's sake, let's say I open door B and show you its empty.


Now, I ask you a simple question. Do you want to switch from your initial choice?


If your gut told you that it doesn't matter whether you switch or not, as there are only two doors left and you would have a 50% chance either way, then you are among the majority. This problem is known as the Monty Hall problem. This paradox of whether to switch or not switch has even troubled many Ph.D. holders.


My whole intent of choosing this problem is because you can arrive at a proper conclusion on whether to switch by applying Bayes theorem. You would have studied Bayes theorem in your 10th or 12th. If you want to refresh your understanding of Bayes theorem, you can read about it here before moving ahead.


Answer Revealed


Let us examine the choices you have in detail. You initially chose door A. Now, you know that door B is empty. ( Thanks to me 😊 ). You have to decide to stay with door A or switch to door C.


Let's apply Bayes theorem in the context of this problem.


Note that P( door = A ) means that Lamborghini is behind door A and similarly to door B and door C.


The probabilities that you need to calculate based on the choices you have are :


P( door = A | opens = B ), the probability of Lamborghini being behind door A after you know door B is empty – The case of not switching


P( door = C | opens = B ), the probability of Lamborghini being behind door C after you know door B is empty – The case of switching


Initially, the probability of any door being correct before you have made your first choice is 1/3. It means P( door = A ) = P( door = B ) = P(door = C ) = 1/3 .


If the car is actually behind door A, then I can open door B or door C. So, the probability of opening either is 50%.


This means, P ( opens = B | door = A ) = 1/2 .


If the car is actually behind door C, then I can only open door B. I cannot open A, the door you picked. I cannot open C because it has the car behind it.


This means, P ( opens = B | door = C ) = 1 .


Because I can open door B in two cases as above, the total probability of opening door B will be


P ( opens = B ) = ( P ( door = A ) * P ( opens = B | door = A ) ) + ( P ( door = C ) * P ( opens = B | door = C ) )


P ( opens = B ) = ( 1/3 * 1/2 ) + ( 1/3 * 1 ) = 1/2


Now, calculating our final probabilities using Bayes theorem


P ( door = A | opens = B ) = [ P ( door = A ) * P ( opens = B | door = A ) ] / P ( opens = B )

= [ 1/3 * 1/2 ] / ( 1/2 )

= 1/3


Similarly,


P ( door = C | opens = B ) = [ P ( door = C ) * P ( opens = B | door = C ) ] / P ( opens = B )

= [ 1/3 * 1] / ( 1/2 )

= 2/3


Now you know that you have a 66.66 % chance of winning if you switch.

If you did not understand it clearly, let me make it more simpler.


Consider the below table which covers all the possibilities. Now that you have chosen door A, I would show you one of the remaining empty doors. Let's see all the possibilities.


As you can see, you are winning by switching two out of three times.


This means you have a 66.66 % chance of winning if you switch and a 33.33 % chance of winning if you do not switch.


The Magic Ingredient – Data


The mere additional fact of opening a door without the prize allows you to improve your chances of winning the prize significantly.


In reality, it's not restricted to only three doors, and the prize does not remain the same.


Let me give you an example. I am pretty sure everybody uses Facebook. If Facebook played the game, what do you think would be the real prize for them?


If you think about it, the prize is YOU. The more Facebook knows about you, the more easily it can target its ads towards you. Now, how do you think Facebook gets closer to the real prize? In the exact similar manner how you got closer to your Lamborghini, by seeing what is behind multiple doors.


In this case, you open the doors to them. By providing data such as Location, Age, Job status, Education details, etc., you unlock more doors for Facebook, and they use it to get closer to you. In fact, Facebook uses 98 personal data points to target ads at you.


Do you think that's a lot of doors? You are not even close. By 2020, it is estimated that the amount of data available would be around 44 zettabytes ( 1 zettabyte = 1000 ^ 7 bytes ). Please don't get me wrong; my intention is not to compare each door to a byte of data or anything. All I want you to imagine is the scale of the number of doors that could exist with such vast amounts of data available.


There are a lot of secrets yet to be revealed by opening the doors. Many doors may be related to each other or entirely different. Exploring these doors to reveal far more essential truths is what data science is all about. Sounds magical, isn't it? If you want to be a magician too, start practicing Data Science.


Not just Facebook, almost all the companies have started to realize the importance of data and how it can help them in all the business domains, be it marketing, finance, supply chain, and so on. ( There is a reason jobs in the domain of Data Science and Analytics are on very high demand in the market 😊 ). My job here was to make you realize its importance too. I hope you start building your knowledge of Data Science and Technology.

Like the content?

Get it delivered straight to your inbox

 

In case you wish to connect, please reach out us at:

pawank.agrawal19@iimranchi.ac.in

sreevatsa.b19@iimranchi.ac.in