5 steps for transferring on-call rotation from DevOps to Development
Historically, DevOps, or more generally someone from the Ops/IT team, has been tasked with being on-call to handle issues when applications crash. This allocation of duties made plenty of sense at the time as they were responsible for how code was deployed and more generally issues in the infrastructure. However, the rise of CI/CD has caused a seismic shift, dialing the volume of code being pushed out by Dev up to 11. The new scale and rate of code deployments have given Dev more of an impact than ever before. Now, the Devs can essentially bypass DevOps when it comes to making changes to the code. The same automated processes that improve productivity have also reduced DevOps’ ability to act as an effective gatekeeper.
Addressing the impact of changes in development and deployment on incident response
The challenge is though that while Dev has these new powers and independence, DevOps is still responsible for fixing issues in the applications. And increasingly, these issues are arising from code that DevOps had no role in either writing or reviewing before it was integrated. So when they do get that 3 AM phone call to come in and fix the problems, they may find themselves starting at zero in trying to figure out where to begin.
This is far from an ideal way to solve issues. Especially when speed and efficiency are at a premium. It is also not a good use of human capital.
As our friends in Dev gain more ownership over their code, including the ability to push it more autonomously to production, we should be thinking about how we can empower them to play a bigger role in responding to issues as part of the on-call team.
After all, with great power comes great responsibility.
The question for DevOps leadership though is how to best go about transitioning parts of these responsibilities over to Dev in a way that gives them the tools they need to succeed. Undergoing this transfer is in part a cultural shift, as well as a technological one.
My on-call rotation duty sharing checklist
Here are my five steps that can help to make this a more effective process that both Dev and DevOps can get on board with.
Get buy-in from Dev
The key to any transfer of responsibility is to have a partner willing to work with you. Therefore the first step is to talk to your Dev team and involve them in the process from the get-go. They are a stakeholder, so make sure to get their buy-in.
Do not just assume that because you have told them that they will be playing a bigger role in maintaining the application, that they understand that you are reducing your footprint in some of these areas. Failure to communicate properly will lead to the ball getting dropped.
You need to explain that this is not just a transfer of more work. They are also gaining more capabilities that will empower them to have a bigger impact and be more independent within the organization.
Give them the opportunity to succeed — Know the state of your infrastructure
Our goal here is to help Dev take more responsibility for the code that they write. This means that we have to take responsibility for our infrastructure and make sure that we are not the cause of the problems impacting app behavior.
Take the time in the transition to assess your current state. Check to see where issues are coming from.
Are they coming from the database? Is it the firewalls or somewhere else in the infrastructure?
Do not expect to pass infrastructure alerts over to Dev. They lack visibility over your infrastructure and it will only serve to frustrate them. It is not their domain and it can quickly sour important trust that you have with them.
The bottom line here is to make sure that you hand them over a system that is working the way it should and that they have a shot at succeeding at running well.
Provide them with the right visibility tools for the job
A good attitude for stepping up to the plate is not enough for your Dev to succeed in handling their new responsibilities. They need the tools that will give them the necessary visibility over the applications to identify what and where the problems are.
Just handing them creds to access the dashboards that your DevOps crew is using is not the best option. It is likely way too high of a level to be of any real use to them. Remember, simplicity is key here so keep it focused on what they need to do their job.
Take your 30,000 ft comprehensive dashboard and splice and dice it up to give them the relevant insights that they will need. Try to keep it narrow in its scope and as actionable as possible.
Grant them access
Talking about empowering Dev is great, but meaningless unless you are willing to give them the actual access that they need to make an impact. Failure to do so means just adding them into the chain of responsibility without giving them the ability to really accomplish anything. Eventually, they will just end up passing the work back to DevOps, defeating the purpose of the exercise — leaving everyone more frustrated and sleep-deprived.
Provide them with the permissions to view logs, make rollbacks, or have general access to critical resources or tools that will allow them to fix what is broken.
This is however a balancing act. Dev probably only needs access to a subset of the tools that fall under DevOps’ purview.
These include: -Specific Jenkins jobs they can trigger in order to do a rollback – Ability to deploy environment all by themselves – AWS console / Kubernetes edit permission for specific resources
Iterate together and build out best practices over time
Do shared shifts at start and see where the gaps are.
This is a long-term process (that will hopefully pay dividends) so communication and collaboration are key. As well as finding the right partners/champions in Dev who are willing to play ball with you.
Identify who your most enthusiastic partners in Dev are that will help you get those early wins that will show that this is the right direction for your organization. Be patient and actually listen to their feedback. Then act on it to make this a system that they will actually use. If you shape it the way that you want it and not how they can effectively implement it, then you dramatically reduce your chances for success.
Make Shifting Left a win-win for all
As a general rule, it will always be cheaper, less complex, and faster to fix issues closer to the source.
The Shift Left approach that empowers your Dev team to fix their issues earlier on in the SDLC has a role to play in how we deal with problems that emerge later on in the cycle.
Organizations that are early adopters of Shifting Left increase their ability to reduce downtime, have a more capable Dev, and hopefully a better rested DevOps team that can focus on their own set of challenges.