Workflow of a data science project Part 8

9 months ago
41

So, data science projects have a different workflow than machine learning projects. Let's take a look at one of the steps of a data science project. As our running example, let's say you want to optimize a sales funnel. Say you run an e-commerce or an online shopping website that sells coffee mugs and so for a user to buy a coffee mug from you, there's a sequence of steps they'll usually follow. First, they'll visit your website and take a look at the different coffee mugs on offer, then eventually, they have to get to a product page, and then they'll have to put it into their shopping cart, and go to the shopping cart page, and then they'll finally have to check out. So, if you want to optimize the sales funnel to make sure that as many people as possible get through all of these steps, how can you use data science to help with this problem? Let's look at the key steps of a data science project. The first step is to collect data. So, on a website like the one we saw, you may have a data set that stores when different users go to different web pages. In this simple example, I'm assuming that you can figure out the country that the users are coming from, for example, by looking at their computers' address, called an IP address, and figuring out what is the country from which they're originating. But in practice, you can usually get quite a bit more data about users than just what country they're from. The second step is to then analyze the data. Your data science team may have a lot of ideas about what is affecting the performance of your sales funnel. For example, they may think that overseas customers are scared off by the international shipping costs which is why a lot of people go to the checkout page but don't actually check out. If that's true then you might think about whether to put part of shipping costs into the actual product costs ,or your data science team may think there are blips in the data whenever there's a holiday. Maybe more people will shop around the holidays because they're buying gifts or maybe fewer people will shop around the holidays because they're staying home rather than sometimes shopping from their work computers. In some countries, there may be time-of-day blips wherein countries that observe a siesta, so a time of rest like an afternoon rest, there may be fewer shoppers online and so your sales may go down. They may then suggest that you should spend fewer advertising dollars during the period of siesta because fewer people will go online to buy at that time. So, a good data science team may have many ideas and so they try many ideas or will say iterate many times to get good insights. Finally, the data science team will distill these insights down to a smaller number of hypotheses about ideas of what could be going well and what could be going poorly as well as a smaller number of suggested actions such as incorporating shipping costs into the product costs rather than having it as a separate line item. When you take some of these suggested actions and deploy these changes to your website, you then start to get new data back as users behave differently now that you advertise differently at the time of siesta or have a different check-out policy. Then your data science team can continue to collect data and we analyze the new data periodically to see if they can come up with even better hypotheses or even better actions over time. So the key steps of a data science project are to collect the data, to analyze the data, and then to suggest hypotheses and actions, and then to continue to get the data back and reanalyze the data periodically. Let's take this framework and apply it to a new problem, to optimizing a manufacturing line. So we'll take these three steps and use them on the next slide as well. Let's say you run a factory that's manufacturing thousands of coffee mugs a month for sale and you want to optimize the manufacturing line. So, these are the key steps in manufacturing coffee mugs. Step one is to mix the clay, so make sure the appropriate amount of water is added. Step two is to take this clay and to shape the mugs. Then you have to add the glaze, so add the coloring, a protective cover. Then you have to heat this mug and we call that firing the kiln. Finally, you would inspect the mug to make sure there aren't dents in the mug and it isn't cracked before you ship it to customers. So, a common problem in manufacturing is to optimize the yield of this manufacturing line to make sure that as few damaged coffee mugs get produced as possible because those are coffee mugs you have to throw away, resulting in time and material waste. What's the first step of a data science project? I hope you remember from the last slide that the first step is to collect data. So for example, you may save data about the different batches of clay that you've mixed, such as who supplied the clay and how long did you mix it, or maybe how much moisture was in the clay, how much water did you add. You might also collect data about the different batches of mugs you made. So how much humidity was in that batch? What was the temperature in the kiln and how long did you fire it in the kiln? Given all this data you would then ask the data science team to analyze the data and they would, as before, iterate many times to get good insights. So, they may find that, for example, that whenever the humidity is too low and the kiln temperature is too hot that there are cracks in the mug or they may find out that because it's warmer in the afternoon that you need to adjust the humidity and temperature depending on the time of day. Based on the insights from your data science team you get suggestions for hypotheses and actions on how to change the operations and manufacturing line in order to improve the productivity of the line. When you deploy the changes, you then get new data back that you can reanalyze periodically so they can keep on optimizing the performance of your manufacturing line. To summarize, the key steps of a data science project are to collect the data, to analyze the data, and then to suggest hypotheses and actions. In this video and the last video you saw some examples of machine learning projects and data science projects.

Loading comments...