FISPAN Speeds Up MTTR by 67% With Komodor

Company size:

51 - 200 employees

Industry:

Financial services

Komodor Installation:

3 clusters/375 services

About FISPAN

FISPAN’s contextual business banking platform makes it simple for banks to offer commercial banking services embedded within ERP and business applications. FISPAN enables banks to provide a best-in-class commercial banking experience by removing friction and adding value to the systems that clients rely on to run their businesses every day.

The Challenge

FISPAN provides technological solutions for major banking firms that serve millions of clients, like the customers of J.P Morgan Chase. By utilizing Kubernetes and a distributed system architecture they are able to remain agile while scaling up and delivering continuously. This scale of operation requires FISPAN’s team to be highly sensitive to downtimes and insist on providing quick resolution to all issues, without compromising on security or reliability.

The Problem

FISPAN’s infra team was struggling to pinpoint the root cause of the outages they were experiencing and spent hours trying to troubleshoot recurring incidents using their existing tools.

The team suspected a non-Kubernetes resource was ‘sucking up’ memory out of the K8s resources, resulting in an ‘Out Of Memory’ error (OOMKilled) that eventually crashed the master node. However, the team had very little visibility into nodes and couldn’t pinpoint the problem.

FISPAN’s team reached out to Komodor looking for a troubleshooting platform that would help them better understand node-related issues affecting their services. They were also looking for a way to simplify their troubleshooting process and expand the number of developers on the team that could troubleshoot Kubernetes issues.

The Solution

After a two-week trial, it was evident that Komodor’s troubleshooting platform was the ideal solution for FISPAN’s needs. Komodor was able to address FISPAN’s needs in two main ways:

  1. Providing the infrastructure team with complete and constant visibility of node statuses and node-related health events, detecting faulty or unready nodes, and correlating node issues with the impact on the services. In addition to providing much-needed visibility, this also allowed FISPAN’s team to quickly exclude irrelevant reasons for outages, immediately steering the on-call engineers in the right direction.
  2. Offering detailed step-by-step instructions for remediation of common Kubernetes errors (e.g, ‘OOMKilled’). This reduced resolution time across the board and also closed expertise gaps, allowing FISPAN to expand its on-call roster. This significantly reduced escalations and freed up valuable time for the more senior members of the team, who were handling most of the issues so far.

Download this Case Study in PDF

Download PDF