We use cookies. You have options. Cookies help us keep the site running smoothly and inform some of our advertising, but if you’d like to make adjustments, you can visit our Cookie Notice page for more information.
We’d like to use cookies on your device. Cookies help us keep the site running smoothly and inform some of our advertising, but how we use them is entirely up to you. Accept our recommended settings or customise them to your wishes.
×

Agile vs Waterfall in Data Science: Frenemies?

Frenemy: A person with whom one is friendly despite a fundamental dislike or rivalry

Agile development has been rapidly growing in popularity in the data science world since its formal inception in 2001.

For those that are new to it, Agile is a collaborative development approach in which cross-functional teams design and build end-to-working solutions in short time boxes, review them with business owners then keep refining them. The Agile approach favours rapid development over project governance and document production.

What is Waterfall?

On the opposite side, Waterfall development has a ‘big bang’ development approach with a series of project stages leading to a significant end-to-solution release. The stages typically include scope, requirements, design, build, test and deployment. This approach requires significant project governance including reporting, risk management, hand offs and documentation.

In theory, every short time box build in agile has all of these components.

The advantages of Agile

Agile works really well in the data science world for many reasons:

  1. Data scientists and the business owner agree on the high-level requirements (called a backlog) early in the development lifecycle and these are continually reviewed. The business owner has frequent opportunities to review the model/solution being delivered, and to make decisions and changes throughout the model build. Gathering and documenting detailed requirements in a meaningful way is often the most difficult part of data science projects. The business owner may not have a detailed view of the data quality, the precise business outcome to be modelled or how the model outputs will be integrated with decision systems.
  2. Agile data science produces evolving models and solution releases which are very user-focused as a result of frequent review and direction from the business owner.
  3. The business owner gains a strong sense of ownership by working extensively and directly with the project team throughout the project.

The advantages of Waterfall

However, there are a number of factors that come up in data science projects that are still Waterfall in nature (and sometimes rightly so):

  1. Scope/Contracts – The typical large-scale data science project still requires control and risk management particularly around consumer data rights, business case measurement and contract and payment management where third parties are involved. This requires the agreement and documentation of scope, activities, dependencies, deliverables, risks, assumptions and plans.
  2. ‘Big ticket’ design needs to be completed early in the development lifecycle as these typically require business prioritisation, investment and can have significant lead times. Big tickets can include data use approvals, new data sources access, integration with enterprise solutions and changes to operational processes and teams.
  3. Organisational planning and culture – Large organisations may have enterprise-wide approaches to change and project management. Data science projects may have to dovetail into these approaches which tend to be Waterfall in nature. These can include RAID management (Risks, Assumptions, Issues and Dependencies), programme governance reporting and budget tracking.
  4. Organisational resource management. Agile is a high-paced, collaborative approach but it depends on commitment from the organisational business subject matters. Highlighting dependencies and bottlenecks is key to keeping on track with the project milestones.

Agile first, project aware

So are agile and waterfall friends or enemies? Can they work together and is that a benefit? I believe the answer is a resounding yes. There are massive benefits to data science projects in an iterative, collaborative approach with the clear objective of working solutions delivered early and frequently.

However, data rights, access and use need to be carefully assessed and managed. The integration of data science solutions with customer-facing enterprise systems and teams needs to be planned and tested thoroughly. Investment, business cases and value in data science needs to be tracked and communicated.

From my experience, an agile first but project aware approach has generated great outcomes for the business owner and the project.

How to manage the two together

Agile and Waterfall frenemy good practice:

  • The early Agile sprints should include:
    • An agreed project vision and scope.
    • Development of a ‘lite’ design and architecture to aid planning and highlight ‘big ticket’ dependencies. Flush out key dependencies such as data privacy, data sourcing, data use, development backlogs and hidden project costs. The technical ecosystem of a data science project involves many variants and possible customisations – data (both structured and unstructured), software (SAS/SQL/R/Python to name a few), techniques (e.g. supervised, unsupervised, reinforcement learning), enterprise application integration (such as marketing automation, risk decisioning, digital platform).
    • A ‘lite’ version of project governance set up including milestones, dependencies, documents and budget tracking which takes into account the organisation’s project management approaches and contractual commitments.
  • Agreement of key use cases or user stories to deliver with the business owner and added into the project backlog.
  • Socialising the project approach/playbook including tools, techniques early on.
  • Highlighting demands on business subject matter experts and likely project schedules.
  • During the subsequent sprints, create project ‘air traffic control’ to track milestone, progress, dependency issues and resource bottlenecks. Check in with the business owner team to ensure their ‘NPS’ is where you want it.
  • During the project, demo the working model/solution widely both with the business owners, the downstream users of the solution and integration developers. Feedback is good and will improve the solution. Make the business owner a ‘superstar’ as we like to do in Merkle.