Tech blog
Engineering Management
Managing incoming requests in software engineering: Balancing dynamic and predictable work
A cloud platform team at Grover used a hybrid approach combining different management frameworks for effective workload management and delivering quality services.
article-thumbnail
7.4.2023

What is the difference between dynamic and predictable workloads?

In software engineering, incoming requests can come in many forms, such as ad-hoc support cases, feature requests, or product initiatives. Understanding the dynamics of these requests is crucial for delivering high-quality software on time and within budget.

One common approach to managing incoming requests is to use a project management framework, such as Kanban or Scrum. Both of these frameworks have their strengths and weaknesses and can be used to manage different types of incoming requests as shown by Henrik Kniberg and Mattias Skarin at Kanban and Scrum - Making the Most of Both.

Kanban is a visual management system that helps teams understand and manage their work. It is particularly well suited for managing dynamic incoming requests, such as ad-hoc support cases or bug fixes. In a Kanban system, work items are represented as cards on a board, and teams use columns to represent the different stages of work, such as "To Do", "In Progress", and "Done". This visual representation helps teams understand the flow of work and make informed decisions about prioritization and scheduling.

Scrum is an Agile framework that helps teams deliver high-quality software in a predictable manner. It is particularly well suited for managing predictable incoming requests, such as feature requests or product initiatives. In Scrum, teams work in sprints, with each sprint having a defined goal and deliverables. This approach helps teams stay focused on delivering value and helps stakeholders understand what to expect.

Well, what about platform teams? Which framework to pick?

As a cloud platform team, managing incoming requests and continuously improving the platform can be a challenging task. On the one hand, you have to provide timely support to your users and address any issues that arise, and on the other hand, you also need to focus on building new features and improving the platform to meet the evolving needs of your users. Balancing predictable and unpredictable workloads is key to delivering high-quality services to your users.

One approach is to use Scrum, where each sprint has a defined goal and deliverables prioritizing and managing the development of new features and improvements to the platform while handling unpredictable workloads such as support requests with a more flexible framework, such as Kanban.

One approach that cloud platform teams can use is to have a dedicated support team or rotate people on support duty, in order to handle incoming support requests and triages them based on their priority. This team or person can use a Kanban board to manage the flow of work and prioritize requests based on their impact and urgency. The rest of the team can focus on delivering new features and improvements to the platform, using a Scrum-based approach.

How does the Cloud Platform teamwork at Grover?

HashiCorp’s Co-founder and CTO, Armon Dadgar highlights What is a Platform team and What Problems do They Solve and explains how most companies evolve from their startup phase by everyone doing everything separately and guess what? Grover was no exception. In the beginning, there was no one taking care of the platform and teams were pushing their code everywhere until we decided to take care of this and centralize a bunch of things within a specific team, the Cloud Platform team, of course, initially called DevOps Team.

Well, the problems started because we didn’t have a proper team, but a single person which was randomly jumping from one fire to another fixing this and that while building common infrastructure. So, the first action was to hire a formal team and an engineering manager to push our infrastructure to the next level. It was back on March 2022 that the final member o the team arrived and we started organizing the work we did by first, identifying what kind of activities we were doing, what we were responsible for, and how much the engineering community needed us to do their day-to-day jobs.

A few weeks already into March we decided to create what we called AdHoc duty, which is basically the way we found to keep the support happening for our platform while helping the other three engineers in the team to focus on initiatives we wanted to move forward.

The AdHoc duty rotated every week and we also decided that we would work with initiatives in a one-week cycle, which can be called a sprint, where we plan what we would like to commit to delivering, following the backlog prioritization made by the Cloud Platform engineering manager, who would always align the backlog with the broader engineering priorities and themes. For example, in 2022, the Engineering themes were Platforms, Speed, Scalability, and Security, and now for 2023 themes are Accelerate, Simplify, and Collaborate.

No predictability, comms everywhere, and no time to focus on long-term initiatives

While the Cloud Platform team was focused on our engineering community’s needs, keeping everyone happy and the platform operating nicely, something else was happening as a side effect, as everyone was added to a ton of different Slack channels while also getting requests by driving by their desk or on their direct messages. The Jira project had many tickets that would never be worked on and the ones that were there, had no description.

Even though everything was working and the team was happier with the challenge, we lacked capacity to push long-term initiatives as well as we the ability to predict how much time it would take to do some of our initiatives because we didn’t have any data. So, the first thing we did as soon as we created the AdHoc duty, was to create also a separate Jira project, where we organize the incoming requests for support sorting them by High, Medium, and Low priority. We also pushed all comms towards a single Slack channel, where one of our team members found a nice way to create Jira tickets directly from a Slack thread, which helped us organize the requests and also keep our customers happy and informed.

We also started paying more attention to our backlog by doing every two weeks an optional Backlog Grooming, which started by cleaning up all the old and non-valid tickets. By the end of this activity, we knew the initiatives we had to work on based on the engineering themes.

With data coming in for almost a year, we could tell how much time we were spending on initiatives as well as on AdHoc requests, helping us to change the gear within the team pushing into a more structured Scrum customization, where we can focus more on initiatives than on AdHoc requests as well as recognize where our users were strugling to use the platform by themselves and required our help either because they lacked access, tools or knowledge, increasing our backlog of initiatives with things that would keep empowering them while keeping everything fast, cheap, reliable and secure.

Well, to summarize the actions we took to move from where we were to where we are where:

  1. Understand where we are and what we have;

  2. Split dynamic and predictable demands;

  3. Create the rules of engagement for your Cloud Platform team by:

    1. Limiting the surface of contact to a single point;

    2. Shielding the rest of the team from incoming AdHoc requests when they’re not on AdHoc duty;

  4. Crafting and maintaining a backlog of activities that maps out with the companies goals;

  5. Measure and recalibrate the amount of time someone spends with AdHoc and initiatives;

Conclusion

Managing predictable and unpredictable workloads is a critical challenge for cloud platform teams. Using a hybrid approach that combines Agile methodologies and flexible frameworks, such as Scrum and Kanban, can help teams effectively manage their workload and deliver high-quality services to their users. By having a dedicated support team or a support person, cloud platform teams can balance their focus on support requests and platform improvements, and ensure that they are delivering value to their users.

author

Daniel Fernandes

Latest posts

article-thumbnail
Engineering Management
Managing incoming requests in software engineering: Balancing dynamic and predictable work
author

By

Daniel Fernandes

article-thumbnail
Engineering Management
Principal engineers? Engineering principles.
author

By

Phillipe Moreira