Successful data science projects and functions may look very different, but our experience has shown that they tend to have a few things in common. More specifically, they never do certain things. These “thou shall nots” form the seven deadly sins of data science, and in this part, we’ll examine the first three.
1. Never be hurt by rare events
When you are using models with real-life data, the expectation is that they will, at some point, break or behave unexpectedly. Rather than chasing a non-existing perfect algorithm, the key is to ensure that you don’t get hurt when the model eventually behaves unexpectedly.
Case: Amazon’s discriminatory candidate screening algorithm
A great example of a model behaving unexpectedly comes from Amazon. To reduce the time it was taking to hire executives, Amazon trained an algorithm that trimmed down the number of prospective candidates. After a while, it became evident that the model was discriminating against ethnic minorities.
But this wasn’t happening because it was coded into the model. The model was instead identifying correlates from a biased dataset. In the end, Amazon was honest about the flaw and the incident sparked a new burst of research in the area of AI and ethics.
The solution to these types of problems is to build models that are actually focused on the outliers and even benefit from them. In fact, sometimes the most important information is ‘cleaned’ away at the beginning of a data exercise.
Case: E-commerce store looking to increase sales
A helpful example might be to consider an e-commerce business looking to increase sales. Although the most typical data science task would be to try and predict when someone is going to convert (or complete a purchase), the success rate for these types of models tends to be low. There’s an argument that data science models are much better suited to finding outliers amongst people already spending.
So, the solution? Try building a recommendation algorithm that heavily upsells to those outliers
2. Never build a model you can’t prove is working
In an interesting podcast, the ex-CEO of Groupon talks about the aftermath of how Groupon was killed by a thousand A/B tests. The offer and voucher platform was an avid user of A/B testing and would make significant changes to their products based on the results of these tests. But instead of improving the company’s offering, the extensive testing actually ended up harming and degrading the products.
The reason behind this can be found in the assumption that every test produces statistically significant results. Yet, we have found that there is typically too much noise in the data to find meaningful improvement opportunities. When combined with the considerable effort it takes to run such tests, it is often the case that they simply do not help the business.
So if you wish to carry out A/B testing, the key is to make sure that any changes made to the offering are based on truly meaningful test results.
3. Never hire optimists or historians
Despite its name, data science is closer to engineering than the traditional sense of science. Modern data science is mostly about tinkering and finding what works through practice, and requires a “bottom-up” rather than a “top-down” approach. Consider a civil engineer. If they are overly optimistic and things go wrong – “the bridge” can collapse. The same analogy can be applied to overly optimistic data scientists.
In turn, the unfortunate problem with historians is that they are trained in the art of survivorship and hindsight. Historians avoid making forecasts and predictions and seek to only explain what has happened in the past.
The key is to find a nice middle ground of people who know the limitations of the technology that they are using, and are taking a realistic approach to doing data science.
If you are interested in adopting AI but you are unsure where to start or whether your business is ready for it, our Data Science Readiness Report can get you on the right path. Get in touch with us to find out more.