Why...

...You probably shouldn't outsource data work

2 Jul 2023 1833h

I was having a conversation with David and the subject of outsourcing came up.

I’ll start by stating I am not against outsourcing. There are definitely situations where it makes sense, especially for resource-constrained organizations who can’t possibly cover every single function by themselves. (On a personal level, hosting this site on SquareSpace is also a form of outsourcing.)

Outsourcing does work for one-off projects where it doesn’t make sense for an organization to hire long-term.  So, if you’re working on a one-off data project, it probably makes sense to outsource the work.

Problem 1: Most data projects are not one-off, but recurring.

Most data projects are complex to a certain extent, and require innate understanding of how the data was generated, how the data is processed, and what business questions need to be answered. Is this knowledge easily transferred? How is the organization going to respond to changing needs for the data in the future if the knowledge is not retained within the organization, but exits together with the third party?

Problem 2: Quality of the work is correlated with ownership/understanding

Some might find what I’m going to say next controversial: data work is uninteresting without context and/or ownership. It is difficult to trudge through data cleaning, pre-processing, etc. without seeing what the end goal is. That’s not even considering that decisions made on how to process the data are affected by the end goal(s). I can’t speak for my peers, but I find it hard to work on projects where there is no clearly defined question/hypothesis/problem.

So is it any surprise that a third-party contractor (who might/might not be underpaid) delivers unoptimized/sub-par implementations after the problem has been poorly/mis-explained?

So here’s what I think can be done better: stakeholders need to communicate more if they are outsourcing, and at least give some ownership of the decisions to the people actually dealing with the data. Be upfront about objectives: if there is ambiguity that needs to be resolved with a first cut of the data, say so and be clear about what is needed to resolve the ambiguity. Everyone will be clearer and happier, with less time wasted.