Launching and Scaling Data Science Teams: Three Years Later

An update on building and managing your data science team

Ian Macomber
8 min readFeb 15, 2022

Three years ago I published a four-part Medium series called “Launching and Scaling Data Science Teams”, intended to be best-practice playbooks for building analytics into your organization. The project was part of my business school program; true to typical MBA form, I wrote ten thousand words advising companies on challenges I had not faced myself.

Over the last three years, I’ve had the privilege of leading the analytics organization at Drizly. During this time, Drizly hit massive COVID scale, completed a series C fundraise, suffered a data breach, and sold to Uber for $1.1B. Meanwhile, the analytics team grew from five people in Boston to twenty-five all over the US, dealt with the challenges of a rapidly shifting privacy and security landscape, supported two due diligence data rooms and an acquisition, all while transitioning to a modern data stack and philosophy.

There were many growing pains. For the most part, we were able to right the ship thanks to fast iterations from the team, kind advice from mentors, and great feedback from stakeholders. Now that I’ve learned it the hard way: here are some takeaways I’ve learned about building analytics into organizations today that I didn’t know three years ago.

Focus on distribution and adoption:

From Justin Kan: “First time founders are obsessed with product. Second time founders are obsessed with distribution.” A founder can be so excited about what they’ve built that they assume the rest of the world will feel the same way. Founders may believe they have built a product so obviously great and necessary, it does not need to be sold or marketed, and customers will line up to figure out how it works — no training manual necessary. Analytics professionals are often in the same boat. If your team has a tendency to focus on building v3, v4, v5 of something, while being surprised or disappointed that v1 hasn’t been adopted yet, you need to focus on distribution.

This lesson can be extremely frustrating to learn, but is universally true. If you have not socialized the product, process, or next steps with our stakeholders to the point where they are willing to implement your work, you have not done your job. Selling your vision is essential to having your product adopted.

To achieve distribution, build for who your stakeholder truly is, not for the stakeholder you want them to be. You may wish your stakeholders could write their own SQL for that “quick pull”, took a passionate interest in dashboarding, understood the bias-variance tradeoff, or even answered their emails faster. That would certainly make analytics work easier to adopt. But your stakeholders are just as busy as you are, and a model can only have value when it’s deployed. It’s on you to bridge the analytics distribution gap, and to set the standard that work is done when it’s being used in production, regardless of why it’s blocked.

Security, privacy, and compliance are your responsibility and your advantage:

I’ve had two stints in consumer analytics: Wayfair from 2013–2017, Drizly from 2019–2022. During this time period, the California Consumer Privacy Act (CCPA) was passed, amended, and went into effect. Additionally, Apple released iOS 14 (killing IDFAs, fingerprinting) and iOS 15 (private relay, email pixel security). The last decade witnessed a seismic change in how society perceives data privacy, how regulators restrict data usage, how marketers assess the efficacy of their campaigns, and what tech platforms and partners allow and enforce.

The analytics culture in 2013–2017 in consumer tech was largely one of minimally compliant solutions. Analytics teams consumed as much data as they could, built complex non-interpretable models, assumed no one would look under the hood, and thought security and compliance slowed down the pace of iteration. Today, that assumption needs an amendment. If they want, companies can still consume data haphazardly, build complex models, and assume no one will look under the hood. This will be true, until the second the company inks a big partnership, raises a big round, or has a massive exit. Diligence into data practices comes hard and fast, and you have to be compliant when it matters most. Increasingly successful company outcomes are correlated with increasing levels of data scrutiny.

As an example: in a company’s earlier stages, it’s more common for a variety of teams to add consumer tracking pixels and SDKs without intentionality or process. Some for ad vendors, some for measurement partners, some for strategic partnerships, some for behavioral data, some for prod eng telemetry. Without a centralized inventory or owner of pixels and SDKs, no one person is sure which third parties get what data when. This may never be an issue! But when it does become an issue, it may threaten a deal from closing.

Analytics leaders have a career advantage if they understand these tradeoffs, and can communicate the rapidly shifting landscape of security, privacy, and compliance. CEOs / CMOs / CTOs may not understand the nuances, but they will understand that for many companies, the dollar, focus, and cost of a data breach or non-compliance can be fatal. This means today’s analytics teams cannot be indifferent or reactive on topics of data security, privacy, and compliance — they must be experts, and drive cross functional data standards and culture for the rest of the company.

Develop a perspective and network on what, when, and how to build vs. buy:

Most analytics leaders started as analysts (shocking!). When you’re twenty-two, you don’t spend a lot of time thinking or being asked about whether the current data warehouse and BI tool will support the next few years of growth. If you spend the early stages of your career at a megatech company, you may not receive any exposure whatsoever to commercially available tools, because your stack is entirely homegrown. This was me — the first time I led a data warehouse evaluation, negotiation, implementation, I had never been in a room (or Zoom) with a vendor. I had no framework to assess what I was looking for, or how to communicate tens of thousands of dollars in spend to a leadership team.

Think of the rise of the modern data stack: Fivetran/Stitch, Snowflake/GCP, dbt, Looker, Census/Hightouch were, for the most part, not commercially prevalent or even founded five years ago. The last five years have seen brand new categories of data tools invented, funded at massive valuations, implemented, and disrupted by open source alternatives. It is a certainty that analytics teams in 2027 will use a piece of technology that isn’t available today. There will be too many vendors to assess them all. Analytics leaders must develop a strong framework and network to guide the decisions they make.

The framework I use for build/buy has two points, loosely paraphrased from a Zack Kanter podcast. The first: build what is both necessary and strategic, buy what is necessary and not strategic, ignore everything else. The second: build what you must do better than anyone else, buy what you believe someone else will compound faster than you can. Using Fivetran as an example of both: first, every company that wants to build marketing attribution needs to move spend data from Google Ads into a data warehouse. This is necessary, but not strategic: moving data from Google Ads to a data warehouse extremely well is not a competitive differentiator. Second, Fivetran gets better at moving data from Google Ads to a data warehouse — they will only exist as a company if they can increase reliability + features and decrease cost over the long run. Compare that to building a homegrown Google Ads pipeline. That homegrown pipeline will not improve every day, and will require a data engineer to drop everything when it inevitably breaks. Code is not an asset, code is a liability.

There are too many vendors to consider individually. This means you need to develop a network and community to build your perspective. On the network side, look for mentors at companies 2–10x yours in terms of analytics headcount and data size and ask what they’re using. On the community side, leverage company and open source slack communities, meetups, and events to find others grappling with similar problems to yours. Two great questions to ask yourself are, “If our headcount / data / company grows 10x, what breaks first?”, and, “What is keeping my team from deploying twice as fast?”. Answer those questions, estimate the cost of not solving these problems, and use your network and framework to decide what, how, and when to build and buy.

Understanding business context and narrative can be your superpower or your ceiling:

The last few years have seen a push towards running analytics organizations like product teams. This can mean adopting software engineering best practices, both for developing codebases (unit testing, CI/CD, data platforms) and cultures (agile, roadmaps, O/KRs). Today’s analytics teams act like engineers: focusing on proactive product development instead of reactive insights-as-a-service. However, engineers often (for better or worse) have the luxury of delegating prioritization and business context to product managers and senior leadership. The challenge for analytics leaders is transitioning their teams to engineering development standards while also raising the bar on business context for their team and company.

Bad analytics teams wait for a very specific question, and provide a very specific answer (SQL pull dumped to .csv, shared via email, lost to time). A modern “Data Product” team takes a bit more time to build a model that scales, and can be combined modularly to answer similar types of questions in the future, thus, operational leverage. However, you can run your data team like a product team without understanding the unit economics of your business, or what keeps your CEO up at night. A great analytics team doesn’t just build data products, they understand the business context and set the narrative for the entire company about how to measure what matters.

Understanding the narrative that companies want to spin and knowing how to back it up with data is a superpower for analytics leaders. If you understand the vision your CEO wants to set (and what that means for the unit economics), you will inherently be more capable of using data to support that story than anyone else at the company, and invited into more rooms and meetings. If you don’t understand the vision, you will always be seen as a support function for other teams. Analytics leaders expand their sphere of influence by using both data and business context to help set the product roadmap, the marketing and ops O/KRs, the company goals, and the definition of success.

The push towards analytics teams adopting product engineering standards is good. However, engineering teams are not known for focusing on the incrementality of marketing dollars, how to structure sales incentives, where operating leverage comes from, how an investor wants to see margin trending, or what metrics the CEO looks at every day. Don’t let running analytics as a product team prevent you from spending time to understand (and set!) the business context and narrative for your team and company.

--

--

Ian Macomber

Leading Analytics @ramp. Intersection of Data and Business Leadership. Previously @drizly, @harvardHBS, @zillow, @wayfair, @dartmouth