ChatOps ... A Love Story

Page content

If you couldn’t tell from the title, I am a complete homer when it comes to ChatOps. I’ll admit that I’ve lost all objectivity. ChatOps has had a profound impact on me as a leader, our teams of Infrastructure Engineers, and the sister teams that went on a journey with us.

As we all stand on the shoulders of those before us, thank you to some of the pioneers of ChatOps who taught us and were very generous with their time. Perhaps most importantly, thank you for validating our ideas and for inspiring us to put our own spin on ChatOps. I believe there are many correct ways to implement ChatOps. I hope you will forge your own path as you apply the fundamentals of PDCA.

These ChatOps articles are a great place to start your ChatOps education:

ChatOps can be so much more than the technical details.

As I reflect I’ve been most proud of the collaboration, people, and teaching our organization to learn. At it’s best it has been a pure manifestation of C.A.L.M.S (Culture, Automation, Lean, Metrics, Sharing).

Our DevOps journey was positively impacted by ChatOps. In fact, ChatOps was often a tireless change agent in helping new members understand how to work while also helping our veterans to learn to contribute in new ways. In fact, ChatOps was so instrumental to our progress that we even had a birthday party for our ChatBot (RYU).

One of my favorite stories during this time occurred when we had a team outing at top golf…unfortunately, we were in the middle of storage migration. Rather than miss the event this engineer created ChatOps scripts to execute the migration as well as monitor the progress.

100 Days Challenge

Onboarding

ChatOps absolutely shines in helping new members get up to speed. They can observe how we interact and become familiar with our internal customers, peers, and services within the organization.

We were inspired by the Etsy day 1 program. If you aren’t familiar, Etsy requires new team members deploy on their first day. Similarly, our new engineers commit on day 1 to our production ChatOps environment. The great thing about having new hires deploy day 1 is that we reinforce how excited we are to have them join and how much confidence we have in the safety of our processes. Can you imagine the conversation at dinner for our new engineer and their family?

Because ChatOps is a shared console, pairing can be used to train or coach team members. Pairing sessions perfectly lend themselves to a crawl/walk/run approach … you might think of it as training wheels for your new hires. In addition, new hires tend to ask great questions that help our veterans become better mentors.

To make onboarding easier, we developed a process that autogenerates coffee scripts from PowerShell after commit using a PowerShell Wrapper process. New members can focus on the mechanics of git, branching, and, check-in using a simple one liner. We’ve found this to help team members come up to speed quicker as it abstracts complexity. When team members require more features or control they can move to working with coffee scripts. In fact, many engineers will start this process by editing the system generated coffee script.

ChatOps is gentle (and self contained) introduction to our technology stack of Git, GitLab CI, PowerShell, Hubot, and CHEF. While working with their first scripts we explain how the sausage is being made so they will have a conceptual model for how we are implementing CI/CD, DevOps, Pull Requests. This same conceptual model is transferrable to many other aspects of practices.

Because ChatOps is such a compelling capability, it created additional incentive for our long-tenured IT Pros to work in a different way.

Finding friction

If our experience is represenative, you should reasonably expect to encounter numerous examples of friction during your transformation. At the time we were working in cross functional infrastructure squads composed of various disciplines of infrastructure engineering (storage, compute, orchestration, databases, etc.). During any given week members of the squad were going outside their traditional areas of focus. In this case ChatOps was both an identifier of friction and a means to address the friction. Afer you understand your sources of friction you can do something about it.

  • Permissions - Over many years our engineers built systems to operate themselves. As you might imagine, elevated permissions were required to operate these systems. What do you think happened when we started working as cross functional squads? You got it, roadblock after roadblock and lots of frustration. Our ChatOps practice created a sense of urgency (bias for action) that forced us to confront and ultimatley minimize these islands of permissions walls. We were finding friction (such as permissions) so frequently that we even created a ChatOps script to log it for future prioritization and action.

  • Dependencies - In the Linux world we have powerful package management systems (rpm, apt, yum, etc.). As a mostly Windows shop we could use Chocolately to manage packages, yet we’d still find ourselves fighting issues due to dependencies, path variables, etc. Often times someone would grab a PowerShell script from version control (or some shared drive - shhhh don’t tell anyone) and then have to install many modules to be able to run it. As we moved into CHEF it was common to see our leads having to troubleshoot Ruby, Gems, PowerShell Modules, PS Gallery issues, etc. Moving to artifactory and ChatOps gave us stability we could not have achieved otherwise.

  • Training & Experience - How do newer members of the team gain the training and experience they need to operate within the team (especially in a high risk environment)? It can be incredibly frustrating for members to feel like they are not full fledged members of the team. ChatOps can help reduce this type of friction by pairing. Furthermore, newer members (or lowly managers) can search for how a given command has been used. In this way someone can learn the commands as a self-paced tour through working examples.

  • Context - How many times have you been added to a disjointed email thread? How many times have you had to catch a senior leader up during an incident by summarizing what they missed? You might find that both of these scenarios are dramatically reduced as you start working in your ChatOps environment. ChatOps minimizes these poor context situations because we typically see: Stimulus - Discussion - Agreement - Action - Outcome - Discussion (sometimes memes too). Others in channel receive indirect training because they can see how another engineer has worked a given issue. In addition, leaders can gain a sense for how teams are collaborating and problem solving together by skimming Chat threads.

  • Timezone & Language - Prior to ChatOps we’d see routine cross time-zone requests take multiple days to satisfy. Remote engineers and admins are frustrated because they have to jump through a million hoops to get anything done. Not only has ChatOps encouraged more decentralized execution, but the outcomes have been superior to what we could come up with thousands of miles away. I believe we’ve achieved better outcomes because teams have been empowered to solve issues in real time, which leads to happier stakeholders.

  • Organization structure - When routine requests go through multiple channels/esclations we may experience friction as well as waits/waste. ChatOps can enable more productive cross-team interactions including new patterns of collaboration that don’t need to be initiated by management. Management doesn’t have to be the bottleneck or on the critical path! In effect, the organization is developing organic networks that can lubricate our other activities.

  • Tribal Knowledge - Prior to ChatOps we had scripts in email accounts, on hard drives, sometimes in version control but often disconnected from any context of why they exist, how often they are used, or who typically runs them. As engineers started to move scripts to the shared Chat environment others would ask questions about how to use the scripts (which intiated a virtuous cycle of documentation, discussion, and increased usage). As we gained more users of Chat these commands would be refactored and improved. As our velocity of new commands increased we created a Chat command that would enable a user to discover new commands (ryu get new commands).

Supporting the learning organization

As we accelerated our ChatOps practice, it has encouraged more scripts to be created and shared. At this point we have well over 500 scripts spanning all major technologies such as :

  • Active Directory
  • Airwatch
  • All things retail (see some examples below)
  • Azure
  • Carbon Black
  • Certificates Management
  • CHEF
  • Commvault
  • Docker
  • EMC
  • GitLab
  • Informatica
  • ITSM (Tickets, etc)
  • Jenkins
  • Licensing
  • NetScaler (backup, restore, config compare, VIPs, etc.)
  • Network Managment
  • SAP
  • SCCM
  • SDWAN
  • Security Group Administration
  • Sensu
  • SharePoint
  • SQL
  • Tableau
  • Teradata
  • User Provisioning / Management
  • VDI
  • vmware
  • VSTS

As mentioned, a key unlock for us was the wrapper process for PowerShell scripts. The first wrapper was so well received that we eventually created a Python wrapper as well. This was a big hit with the Network engineers! Our practice has made it safe for engineers to learn, apply, and improve their coding practices.

Examples of ChatOps

Prologue : After I first released this blog I’ve had a number of folks say “What about this command? I can’t live without it!”. Consequently, I’ve added a number of commands. This small interaction reinforces my thesis that ChatOps has become a community of users committed to continuous improvement that are evolving the system in ways not originally planned.

Service Desk / NOC visibility & empowerment

I love these 3 examples as they provide a comprehensive view of operations and enable even the newest of team members to gain an understanding of current status and contribute to our vision of more transparency, greater cross-training, and decentralized execution. I also love that they are some of our newest scripts and the engineers who wrote them were looking to solve a problem.

100 Days Challenge

100 Days Challenge

100 Days Challenge

Testing a server

100 Days Challenge

Jenkins Menu

This example was created by an engineer to enable adhoc jenkins jobs, especially by stakeholders who didn’t have access to Jenkins.

100 Days Challenge

Cloud Activities

This example ensures that every new repository is created in a consistent manner. If new requirements emerge, new instances will be created without having to re-train users.

100 Days Challenge

Retail Operations

Here’s an example of creating a new retail store. With this scripting we get a high degree of standardization/repeatability.

100 Days Challenge

Software Packaging

This example was created by a Tier 2 admin in Asia who wanted to manage his application packaging process better. Once you have an extensible platform, if you get out of the way your teams will do great things.

100 Days Challenge

CHEF cookbook promotion

As we have matured our ChatOps practices our engineers have automated the things that cause them the greatest pain (like having to cart around a laptop). This is an example of being able to promote chef cookbooks through our test environments. 100 Days Challenge

SQL Performance tuning

Ask a friendly DBA, this is a pretty sweet capability. One of the best features of ChatOps is that team members can create the capabilities they need to be effective.

100 Days Challenge 100 Days Challenge

Grafana

100 Days Challenge 100 Days Challenge