Working in several projects as Data Science consultant I’ve realized about the need of spreading the word about project planning in this field. This is neither an Agile apology nor an open letter criticizing project managers (PMs) that prefer other methodologies. It’s more a post trying to help analysts struggling with deadlines when their manager’s Gantt diagram is not really helpful.
There are tons of different project management methodologies (see a nice article here), and different ways to apply Agile (doing or being Agile). However, in this article I’ll focus on a comparative Waterfall vs Agile (doing) approaches as on my experience they’ve been the most commonly used in big organizations.
The traditional waterfall methodologies have many advantages such as good traceability of the progress inside the project life cycle or its ease of use for the PM. But although these make life easier for Project Managers, they create many challenges for the actual ‘doers’, specially if the project requires data exploration.
First, waterfall is designed for projects with requirements and scope that are understood beforehand and which tasks have a clear and well defined order. Data exploration usually implies a loop between the data analyst and the source subject-matter-expert (SME). The first one tries to gain understanding and leverage knowledge, while the second one is often (specially in big data projects) not aware of the nastiness and complexity of the data. Understanding some complex datasets takes weeks or months in some cases. Setting a fixed process before these iterations may compare to start constructing a building without a proper ground & earth testing.
Planning or designing the project pipeline not really understanding the complexity of the problem may cause issues setting proper deadlines. This usually leads to rushing while coding or analyzing data. While the first issue may cause bugs or performance bottlenecks in the code developed, the second one could produce and loose understanding of the business rules in the underlying data.
In many occasions, the waiting times on iterations between the analyst and the SME produce delays in communicating some findings to Project Managers. Sometimes this information doesn’t reach senior VPs and stakeholders, so the planning remains the same, only impacting later when it hasn’t gone as expected. While in Waterfall, each of these delays would impact each consecutive task, in Scrum it may only impact to the individual performing the task, usually from the next sprint onwards.
Data have flags always. These can be data generating processes (issues in either side of the process), biases in data (e.g. corner cases not displayed in early samples)… Testing is therefore also a challenge. Agile may not be the perfect solution for every project, but it helps in these kind of scenarios due to its adaptability to move things around.
At the end everything depends on the goal of your project. Agile could help with certain time constraints, but if your project is very well defined another methodology could still help to meet the deadlines. Sometimes the cost of picking the wrong one is an stressful environment if something doesn’t go as planned.
If you’re still in an early stage of the project, have a conversation with your project manager. Make sure both of you are at the same page before the cascading failures start!