Working on several projects as a Data Science consultant, I’ve realized the need to spread the word about project planning in this field. This is neither an Agile apology nor an open letter criticizing project managers (PMs) who prefer other methodologies. It’s more of a post trying to help analysts struggling with deadlines when their manager’s Gantt diagram is not really helpful.

There are tons of different project management methodologies (see a nice article here) and different ways to apply Agile (doing or being Agile). However, in this article, I’ll focus on a comparative Waterfall vs. Agile (doing) approaches, as, in my experience, they’ve been the most commonly used in big organizations.

Traditional waterfall methodologies have many advantages, such as good traceability of progress inside the project life cycle and ease of use for the PM. However, although these methodologies make life easier for Project Managers, they create many challenges for the actual ‘doers’, especially if the project requires data exploration.

      • First, waterfall is designed for projects with requirements and scope that are understood beforehand, and whose tasks have a clear and well-defined order. Data exploration usually implies a loop between the data analyst and the source subject-matter-expert (SME). The first one tries to gain understanding and leverage knowledge, while the second one is often (especially in big data projects) not aware of the nastiness and complexity of the data. Understanding some complex datasets takes weeks or months in some cases. Setting a fixed process before these iterations may be compared to starting to construct a building without proper ground & earth testing.

     

      • Planning or designing the project pipeline not really understanding the complexity of the problem may cause issues setting proper deadlines. This usually leads to rushing while coding or analyzing data. While the first issue may cause bugs or performance bottlenecks in the code developed, the second one could produce and loose understanding of the business rules in the underlying data.

     

      • On many occasions, the waiting times on iterations between the analyst and the SME produce delays in communicating some findings to the Project Managers. Sometimes this information doesn’t reach senior VPs and stakeholders, so the planning remains the same, only impacting later when it hasn’t gone as expected. While in Waterfall, each of these delays would impact each consecutive task; Scrum may only impact the individual performing the task, usually from the next sprint onwards.

     

    • Data always has flags. These can be data-generating processes (issues on either side of the process), biases in data (e.g., corner cases not displayed in early samples)… Testing is, therefore, also a challenge. Agile may not be the perfect solution for every project, but it helps in these kinds of scenarios due to its adaptability to move things around. 

     

    At the end everything depends on the goal of your project. Agile could help with certain time constraints, but if your project is very well defined another methodology could still help to meet the deadlines. Sometimes the cost of picking the wrong one is an stressful environment if something doesn’t go as planned.

    If you’re still in an early stage of the project, have a conversation with your project manager. Make sure both of you are at the same page before the cascading failures start!