Not all data teams are created equal
This post is all about data teams and how they can best collaborate. Let’s start off this post by answering a few questions.
What are the different kinds of data teams?
There are data engineering teams. There could be one or many data engineering teams in an organization. Data engineers are responsible for the data pipelines: the infrastructure of how data flows through your systems. This could include ETL/ELT code, data warehousing, data marts, and BI tool infrastructure. They may or may not be responsible for the governance of the data.
There are data analytics teams. At some places this could be a business analytics team, divided into further teams depending on business functions. This is the one team that you will see diff practices of it: centralized vs. decentralized. Analytics is responsible for understanding the insights that the data provides. This could be through some exploratory analysis, by building out dashboards, and or by finding insights from such analysis and providing that to other teams. Data analysts are not just there to provide you with data, they are the experts of the data and explain the trends and insights that are being seen.
There are data scientists. The newer kids to the block of data. With the rise of Big Data came the rise of ML and AI. The evolution of how data was used went from simple analytics in Excel to sophisticated models in python, r, etc. that learn and can predict what comes next. Data scientists play a role in that development as the ones that are a hybrid of engineering and analytics. They can scrub data clean and make magic as I would say out of disparate data sets.
Do we always see these teams in every organization?
Not all organizations are the same, so we are not going to assume that every organization has all these teams. Let's talk about how these teams could appear in your organization. For example, Data engineering may not even exist in the organization as separate role. The work could be done by software engineers. The skillsets are similar, but the titles may differ in responsibility. Data engineering is relatively a new term. For many years data engineers lived within software engineering as either teams that worked on data warehousing, dba, and infrastructure pipelines. As the use of data has matured, so has the need for a separate role that looks for specific skills sets that are required for data engineering. Such skill sets include: auditing mindset, database performance and tuning, and what I think is very important skill data modeling with the user in mind. An experienced data engineer knows to ask who is going to be using the tables, performance needs, and how that data needs to be accessed/stored.
Moving unto Analytics, this team may exist only on the business side with no support from tech. Analysts that are in a decentralized model will have limited visibility into what the different analysts may be working on and at times are repeating the same analysis.
For data scientists, they may be part of the analytics organization with no separation of responsibilities. They may not even exist at all. This all depends on the data needs of the organization.
When it comes to organizational structure, these three teams may or may not be reporting up to the same person, a CTO.
This separation of these teams may sound like it’s suited for large organizations. How does this work with a smaller organization (<100). In smaller organizations, I’ve seen that analytics and data engineering can be the same people. That’s because the organization can't afford to staff entire teams dedicated to those tasks. It could also mean that the organization is in its infancy and building fast is a priority.
Why have these teams?
These teams are the pillars by which a data-driven company stands on. Without having a team dedicated to the data pipelines, then you won’t have clean, reliable, and trustworthy data. Without analytics, you won't have a single source of metrics and reporting. Without data science, you won’t be able to use your historical data for predictive analysis or NLP.
How should these teams work?
Collaboration is key. One way to think about collaboration is through office space layout. Now, this all depends on the floor layouts (open floor vs cubes vs flex space), on whether the teams are located in the same geographic location. If this is a global company, do members of the teams work out of different locations?
Let’s take a simple example and analyze if there are any improvements to how these teams should work.
Example: Same office building, analytics team is centralized, and all three teams report up to the Chief Technology Officer. In this organizational structure, these teams are probably already working well since the barriers that could break down the communication (different reporting chains, conflicting priorities, different geo location) don’t exist. A few things could throw this organizational structure out of whack. For example, all of the data engineers sit on the 25th floor and analytics is on the 50th floor and data science is on the 39th floor. This configuration doesn’t work because going to another floor becomes an inconvenience for people. We’ve all been there before; even going one floor up, you’d rather just slack or message someone because it’s quicker. Although messaging is used all the time now as a form of communication, things may get lost in translation even in this day in age. Now, the only thing I would recommend in this example is to ensure that if you can’t move the teams on the same floor, then look at two options. You can create smaller teams that are mixed with analysts, engineers, and data scientists. They would work on a project together and see it from beginning to end. Another option is to invest heavily in having a communication strategy between the teams. This could mean that you have daily stand ups in which everyone talks about what they are working on (this could be troublesome for very large teams), it could mean bi-weekly meetings that people share updates and present the work they’ve done. It could also just mean a data team event for team members to get to know each other more than just inside the walls of the building.
Now that we’ve covered a simple scenario, let’s take a look at a more complicated one.
Let’s say the organization is a global company. Analysts are located in NY, Engineers in Ireland, and data scientists in California. This is a bit tough scenario because it involves different geographic locations. If you haven’t been in those scenarios, it can be tricky to manage. How do you get people to communicate when others are sleeping? Although some of the issues are more larger in scale (outsourcing versus offshoring), we can think of it in the lens of what systems you need to have in place so that collaboration still occurs. Some tactics that can be done to increase collaboration:
Common messaging channel: if you are skype, teams, slack- whatever the tool it is, you need to have a common “room” or group chat for all your teams. Set the purpose of the group chat and invite everyone that is part of those groups. As part of on-boarding, make sure new hires are added to those groups.
Cadence on what you’re working on - if this in an agile development model, ensure that there is a clean active sprint that people have visibility to and tag people along the way that may be impacted or interested in the work that is being done.
Common knowledge space that everyone contributes to. Just like the common messaging channel. This would be a wiki space or intranet page that is being maintained by these teams. This is important, because as new team members come onboard, if you don’t have great documentation, every single new hire will have similar if not the same questions. Having a knowledge space allows for a reduced learning curve.
We covered 2 scenarios and there’s more flavors that span that spectrum. What I want you to take away is that not all data teams are created equal. Even if you’re company doesn’t have these teams or maybe it has more ( data ops, dba , data testers, etc), I want you to realize that one size doesn’t fit all.
If you are part of these data teams at your company, reflect on a few things:
How are your teams working?
Do you know what each team is working on?
Do you know what is everyone’s role?
Are there areas of improvement? Could you collaborate better?
Does on-boarding new people take a long time?
What can you start doing today to make it better?
If you are managing these teams, ask yourself this:
Is it clear to the organization what your team does?
Do your stakeholders/ partners know what your team is accomplishing?
If the answer to that is no, then start by creating a communication strategy for your team:
Send out your team’s release notes if you are in some form of development cadence
If not, send out a monthly newsletter to all your stakeholders/ the whole company on the top things that your team has accomplished that month
Meet with your most important business partners and go over with them what your plan is for next month so they know what value you are bringing to them and also what they should be expecting. It gives them a level of excitement and everyone appreciates transparency.
Resources
https://www.oreilly.com/ideas/data-engineering-a-quick-and-simple-definition
https://www.northeastern.edu/graduate/blog/data-analytics-vs-data-science/