What do you do when you don't have a $200k data scientist?

We’re going to start the year off with a question: what do you do when you don’t have a $200k data scientist?

$200k is the top end of the salary range, but don’t let that fool you. A data scientist or AI specialist is considered by LinkedIn as the hottest job for 2020 with a growth of 74% in the last 5 years. WOW! That is amazing because this just means that companies are evolving with their data and are taking serious investments into their future. But, what can you do when you can’t afford a data scientist?

If you think that by only adding a data scientist or two or 10, you will get the innovation that you are looking for, you are mistaken! Not because your data scientists can’t do the work, but because without a strategy or plan, you won’t achieve any progress. You will end up having data scientists work on projects that may or may not be solving important problems. Don’t fall into the trap that because you see large companies with huge data science teams, that you must follow suit quickly. I was at the AI Summit in NYC in December and Target’s Chief Data & Analytics Officer was giving an overview of his organization. Target  has 1,000 data scientists, 50 of those are PhDs. That’s the largest number I’ve ever heard. What I recall from his presentation is that Target has a focus on the problems they are trying to solve. They have only a handful of areas that their team is concentrating on and those are directly aligned to the company goals. That’s the key takeaway. It is not about how many data scientists you have, it is whether or not you have your projects aligned to your company goals.

So back to the question I first asked

what do you do when you don’t have a $200k data scientist?

Let’s start with where are you in your data journey? What types of analysis are you doing right now? Is your data in a great place? How much data wrangling are your analysts doing? 

If you answered the above with: I don’t know where we are at, we’re pulling data from csv/Excel files and I don’t know what data wrangling means, then you don’t need any data scientist right now. You need to focus on getting your foundations right before going into hiring data scientists or even buying any AI tool. If you’ve heard the saying walk before you run, that is exactly what I’m telling you to do here. You can’t jump from having basically zero analysis of your business to straight AI modeling. If you hire data scientists prematurely, you will have people spending most of their time cleaning data instead of delivering results. You’ve heard the phrase before, data scientists spend 80% of their time finding and cleaning data. The goal is not for your future data scientists to have zero data discovery and preparation, but to have a foundational sound enough data structure to reduce the 80% of time spent on data preparation and clean up.

What are the areas that you should be focusing on?

Using the Pareto Principle, also known as the 80/20 rule; let’s focus on the 20% of projects that will generate 80% of the improvements. When it comes to data, that 20% has to be around the structure of your data. Without going into the details of everything you need to do right away, ask yourself the following questions and we’ll take it from there.

What types of analytics are you doing today? Do you have a good understanding of the state of your business. If you are in sales, do you know how many sales you did in 2019? Do you know how many you did last week? How about year over year, month over month, or week over week? Insert here whatever metric or KPI is important to your business. The key point is : are you tracking it?

Do you know who your clients or customers are? Do you know how many sales they made with you? Do you know the segment they belong to? Do you know which clients you need to go after this year to meet the targets you’ve set for 2020?

If you couldn’t answer most of those questions, let me ask you a follow up question. Why don’t you know? If you cannot measure your performance than how do you know if you are doing better, worse, or the same. You cannot improve what you don’t measure. 

If you don’t have metrics, well then let’s start there.

Defining metrics

This may seem elementary for some of you, but if we cannot get the basics right, then it doesn’t matter what complex analysis or models we layer on top. Metrics in the simplest word is what you measure, a number or value that is important for your organization to keep track of, to understand, and maybe take some action on. Metrics are not just KPIs( key performance indicators), but all KPIs are metrics. The distinction being that some metrics may be supplemental data than just what your performance is measured on.

A few examples of metrics include: number of orders, sales revenue, number of new clients, gross margin, number of units sold, capacity of plant, etc.

Defining metrics is important because if you measure the wrong thing, your projects and effort will be in vain. Let’s take an example of a small supermarket chain. The measure that the supermarket owner is focused on is the number of sales that they do. There’s nothing technically with keeping track of that number, but is that number something that tells the supermarket owner the full health of the business? It doesn’t. I’ll explain to you why. Number of sales relates to the number of purchases people have made at the store. If that number is 100, can you tell me whether the store made money, lost money, or broke even? No, you can’t. The number of sales they did can probably only help in staffing the supermarket. In the case of a supermarket, the owner needs to be aware of the gross margin, inventory per sku, basket size , and the frequency of their buyers. Those are metrics that the supermarket owner can review to understand current health of the business, understand what to order next, understand what the buyers are doing at the store which can be used for grocery circulars.

As you’re defining your metrics, you need to find out if you have that data. The worst thing that can happen is that you are relying on some data for your metric and you don’t have it. Back to my previous point, if you can’t measure it, you can’t improve on it. I’ve heard some people say, if you can’t measure it, it’s like you didn’t do it.

Where is the data?

You’ll feel like Lewis and Clark going on an expedition to find your data, but it’s needed. Start with your engineering team. If there’s documentation on the data inputs and outputs in your system, then use that as a starting point. If there’s none, you’ll have to work in tandem with your engineers to get that information documented. I stress on the documentation because what’s the point of having to tell the story over and over again when someone could just read it. It saves time in the long run and it also allows a big picture view on areas of improvement. It will help those data scientists spend less time searching for data because you’ve done the discovery portion of their work. 

Once you’ve found the data, you need to know whether or not the data is the right data to use or if it has any anomalies, edge cases, etc. You need to know that because you need to be able to account for those edge cases when defining your metrics and reporting on them.

I’ll jump a few steps ahead and go from walking to jogging. Once you have the basics down: you know where your data is, how good it is, the metrics you need for your business, then and only then can you start doing some analysis on that data. 

What kind of analysis? Let’s start with the basics: what is the state of your business today. Start with that. Then add layers to it: customer segmentation, forecasting, etc.

If you can’t get the simple answers, don’t bother with the complex ones. Data science is definitely here to stay and it's not just a buzzword. Nevertheless, I don't want you to rush into something without understanding the basics. Walk first, then run.

Previous
Previous

Making a Change

Next
Next

It takes a village