Home
Resource library
Webinars
[Webinar] Troubleshooting in Fast-Paced Environments w/ Coralogix

[Webinar] Troubleshooting in Fast-Paced Environments w/ Coralogix

Itiel Shwartz

Co-Founding CTO @Komodor

Oded David

Head of DevSecOps @Coralogix

Widespread adoption of agile methodologies, CI/CD pipelines, distributed architectures, and more have enabled software development to reach a rate and scale that would have seemed unimaginable just a few years ago.

Of course, along with the benefits of new methodologies and technologies comes a new set of troubleshooting challenges that need to be addressed as well. In this panel discussion, we talked about the new challenges in accelerated pipelines and how to overcome them

[Beginning of transcript]

Tali: Thank you everyone for joining us. Today we’re going to be talking about the acceleration of software development and how that’s affected our ability to troubleshoot issues that arise. My name is Tali. I’ll be moderating the conversation today. I am the Director of Content here at Coralogix. Coralogix is a real-time streaming analytics platform that enables teams to manage their observability data without needing to index it. And we are joined today or we’re joined today by Oded David, who is our Head of DevSecOps, and Itiel Schwartz, who is the co-founder and CTO at Komodor. Oded, would you like to introduce yourself?

Oded: Hi, everybody. My name is Oded. I’m leading the DevSecOps team at Coralogix. I’m really happy to be here.

Tali: Thank you. And Itiel, would you like to introduce yourself and tell us a little bit about Komodor?

Itiel: Happy to be here. So, my name is Itiel. I’m the CTO and co-founder of Komodor. We are the first Kubernetes troubleshooting platform. Basically, we help Dev and DevOps people to understand, given an issue, given an incident, what changes happened in their system. And when I say changes, it’s anything from code change to feature flag to deployment in Kubernetes. So, everything is in one place. And this is us and we will also show a quick demo on that.

Tali: Great. Look forward to that. Now, a little bit about what we’re going to be covering today. We’re going to be talking just a little bit about what’s actually stopping teams from moving faster and delivering code more frequently. And I will talk quite a bit about what issues are introduced as we start to move faster, and where new tools and processes come into that. Now, before we jump in, I want to just remind everybody that this is a live session, there is a Q&A box, and we will be answering questions or I will be answering questions and bring up the questions as we go. So, please do feel free to send those as they come up. Now, let’s go ahead and get started. So, yeah, before we talk about really like what are the issues with troubleshooting and faster-paced environments, I just wanted to touch a little bit on what do you guys think about what’s stopping teams from moving faster? What’s kind of getting in the way of that?

Itiel: So maybe I’ll start. I think, like, there are two main obstacles in moving faster. The first one is it’s not even technology, like tech-wise, it’s more for culture change. People need to trust their tools and their system to move faster. And I see a lot with like, more for old-fashioned companies that used to release like, once every month, or once every couple of months, that when they try about releasing once per day, or even once per week, they’re afraid. They’re likely it’s going to break, I already have all of these processes in place. And once we are going to release once a day, what will happen, everything will break, and so on. So, I think like culture is a very big barrier in moving faster. And even companies that are trying to do the shift, don’t really pay a lot of attention to like changing this specific part. Oded?

Oded: I tend to agree. Also on that, I have to say that I see that basically, boundaries are broken with everything that includes developers or any R&D in the company. It means that production becomes — any environment is basically a production. The mainline of thinking is that your product needs to be up and running for 24 hours, 24 hours, seven days a week. On that, the need to be updated in everything, the need to add to customers as you continue to develop, the need to understand the product. You have so many new disciplines that you need to understand to basically take your time and you pay with a lot of contact switch.

Tali: I think that makes sense. What you guys think about in terms of — I mean, looking more at the technical side of things in terms of technical debt, and I mean troubleshooting in traditional environments, how does that kind of impact the situation?

Itiel: Yeah, so it’s a great point. Like troubleshooting as I said, it’s like the silent killer. It’s the thing that day by day you just spent so much time on. You don’t really think about it. Like, every time a developer or like a DevOps, it doesn’t really matter, when he troubleshoot his system when he’s trying to understand why things doesn’t work, he’s not writing new features. And we see it in like, in a lot of the companies that as they progress, and they grow fast, like bigger, the amount of percentage of time they spend on like troubleshooting their system, instead of writing new features, is around 50% or even more. And all of this time, if they could just troubleshoot faster, or write better tests or have better monitoring, can be saved and we can take this time basically to write new features and to improve like the overall speed of the company and the R&D team. Oded?

Oded: That being said, there is another new aspect that basically accelerated in the last few years, especially the last few years, is the open-source industry. You now use a lot of open-source as part of your product. Troubleshooting is not the traditional one that I use to write a code and check it. I have one, two, three, five people that were testing the code. Now, I have an open-source and I need to be in line with the industry, to understand the mindset, and you need better tools to be able to understand the behavior of those open-source also.

Tali: Interesting. Any other last thoughts on this? Maybe — I don’t know. I mean, you talked about fear, maybe Itiel. Do you think that there’s maybe like fear of adopting new tools that are kind of required there, or maybe — I don’t know.

Itiel: I think like everything. It’s fear from like new tools, ‘cause I already know the old tools and fear of doing things differently. And I think it’s very obvious like for companies that have like traditional QA, and you can’t release to production without the QA like checking your code. And like, obviously, if you don’t release 10 times a day or like 20 times a day, you can’t really have a QA in the middle, like, it doesn’t make any sense. You have to automate this part and you have to make sure that the system is always up and running, and working as expected. And I think like a lot of and I think that what Oded said earlier, like, the roles has changed. And if like, in the old days, I was a developer and I wrote the code and I counted on the QA to catch my bugs and just stop it before it moved to production, now not only that I am writing my code, but I also am writing the tests and I’m making sure everything passes. And if there is an issue, I’m the one who’s getting awake in the middle of the night. And a lot of the people, they have their comfort zone where they just write code and that’s it and everything else is like other people problem. And yeah, I think like, it is like a very big issue in like adoption of new tools and basically moving faster.

Tali: We have a question from the audience. But just before I take that, I’m curious, Oded, what your thoughts are there on — in terms of what you were saying about kind of not being able to have QA and really shifting — I mean, that’s a big shift in terms of the process and what people are used to. Any thoughts on that?

Oded: So, in Coralogix, I see it in the last couple of years, when you grow, to become a bigger company, and you basically stabilize the product, improve it. So, you begin, you write the code, as Itiel said, and you test it, and you see everything is okay and it’s great. And then you grow and you have more people that needs to understand your line of thinking. You have more people than you expect them to work as fast as they used to work. You have more departments that need to be oriented with one another, to have an understanding on what happens and the new features. So, a developer is not only a developer, he becomes much more. And it’s true to anyone in the organization, even customer success. Customer success suddenly need to understand the product. But this structure makes silos because I’m writing a feature. I need to be an expert on this feature because this feature also includes a lot of open-source technologies. It includes the documentation, it includes explaining it to customers, it includes writing even articles, it includes webinars etc, etc. Those features basically become another feature that are used in the industry or basically adjust the industry to use those features. So, also customer success needs to have those skills and sales and marketing. Everybody needs to have all the skills of the organization in order to make this product alive. It makes everything work faster, but it also prevents it from moving faster. And if you don’t grow into it, you basically stay behind.

Tali: That’s a bit tricky there. I like this question and it’s actually going to kind of take us to the next topic. So, I’ll go ahead and change the slide. So, the next — and this is kind of like addressing the bulk of what we’re talking about here. What issues are introduced when we move faster? And the question from the audience, do you think microservices are going to increase in number and how can devs handle that scale?

Itiel: Yeah. So, I think like microservices were born out of the need to move faster, basically to allow different teams in the organization to write their own code without having the dependency of like the monolith, where one change like, obviously impacts all of the monorepo and all of the monolith. So, like microservices was born out of the need to move fast. And I do see the industry moving more and more like into like more microservices, trying to break things into smaller parts. And with the adoption of both Kubernetes and serverless, like we do see the trend and I don’t see it like declining anytime soon. And I think like regarding like how devs can handle that scale, I think like the next slide we are going to talk about a little bit about the tools. And I think like it’s going — I try to like better explain how it’s really related to microservices, so for the next one. And Oded, do you want to answer like to sort of what the issues are produced when we move faster?

Oded: Sure. So, you mentioned the microservices architecture. One more thing that basically affects this or causing these issues is the need of managing multiple production environments. You have today, regulation and GDPR, you have the HIPAA PCI, etc., your clusters, your environment changes. There isn’t any meaning to a production environment, I would — and in the past, I had one production environment or I had one production environment that is deployed in the customer premises. Today, I have three, four, five. The requirement for 24/7 basically requires me to have high availability at all time. Which means that I need to make sure that I have no — one focal point in my system. Microservices is a good example for that because the idea behind microservices is to avoid this focal point, is to avoid this failure mechanism of one service that rules them all. The need of basically non-agnostic metrics, because we used to think that we have a matrix that shows us issues on an environment. But now we have five environments so I need to change my line of thinking on building something that is able to be able to same on multiple environments, where in each environment, there’s different customer and different behavior. All those issues are introduced when we move faster. The ability to mitigate faster, to make the system heal or start functioning as fast as possible. SLO, SLI, a lot of concepts that were added and are needed in order to avoid those issues or due to those issues.

Tali: So, it’s interesting points. Do you think then the actual troubleshooting challenges, if we’re just talking strictly about troubleshooting challenges in these fast-paced environments, do you think that it’s related more to the speed or more to the environment.

Oded: So, it’s basically chicken and egg. We expect people to move as fast as possible because the market is changing, the market is growing, the market goes faster so features are required. New technologies that are invented by AWS are now introduced to the public. They make me able to run faster also. But by using those technologies, by using and make my work faster and my product better, I’m basically causing new issues, new line of troubles. Multiple environments is an example of this issue because of regulation or scale or number of customers. All are issue that are introduced, and most of them are because we move faster. Itiel, you have something to add?

Itiel: Yeah. I think like, you asked, like — Tali, do you mind repeating your questions?

Tali: Yeah. So, I’m wondering if we’re looking — Well, I mean, we talked already about what’s kind of preventing us from moving fast. But if we were looking at a company that’s moving fast with frequent deployments, do you think that the issues, that the troubleshooting challenges are coming more from the speed at which they’re deploying or the kind of environment, or ecosystem…

Itiel: So, I think it’s like, each on its own, it’s quite bad. But the combination of the two, it’s like this is what makes things even harder. ‘Cause what we need to do is to compare like the old state to the new state. And once you had the very big bulk of changes that happen, I don’t know, like, once a month, and everything was really well documented ‘cause we were like waterfall and every new feature, what was going to be added was really rigorously documented. But now, because the amount of changes and the speed of changes, it’s not really clear, is it in production, is behind feature flag, was it just deployed five minutes ago, an hour ago, a day ago? So, we kind of lost track on what — how the system really changed. The production system became like a living organism, basically, that keep on changing really, really fast. And no one in the organization really knows, like, what change over like the period of time. So, this is like regarding the speed part. And the other part like the environment, like we already said, like, in order to move faster, we broke the monolith to a lot of different like pieces and different environments, and different clouds and so on. And what happened is, again, like if in the old days, all I need to do is to check like one release node that was very massive. Now, I need to go to each one of those like, small, tiny units and to understand what happened there and what changed there. So, maybe I need to SSH into each machine and read the log file, or I need to somehow crack like, release nodes over like dozens of different GitHub repositories. Or I need to check the logs both in AWS and Azure and GCP. So, like, each one of them of those like issues, the speed and complexity of the system is quite bad. But when you have a really complex system that change really rapidly, once you have an issue and like people in organization have a lot of issues nowadays, you are going to be like a detective. Like, trying to figure out, where did you change what change, and so on.

Tali: Good one. Yeah, absolutely. Okay. Are we — Let’s move on to the tools. So, I’m curious, basically, obviously, with what we’re all doing in Coralogix and Komodor we’re a bit biased about the tools that are adjusted for these challenges. But I was hoping to get into a more of a high-level view of what role do these different tools and different processes play in terms of alleviating some of the pains that have been introduced?

Oded: Okay. So, from my experience and I can tell you from Coralogix side, in our team, we work a lot to create stability and availability. It comes when we work in Kubernetes specifically, with tools that are enablers for disabilities like KEDA which is event-driven to allow auto-scaling for our pods, horizontal pod autoscaling. Those like autoscaler that allows us to automatically add nodes to the cluster, reduce nodes in the cluster based on traffic, based on load, based on matrices that we use. Event exporter that basically takes all the events out of Kubernetes and provide us inputs on what happened. Not problem detectors, not termination handler to handle the pricing, the number of instances, the stability, and those all are enablers to make our products more stable.

But when there is an issue and go — if we go back for a second to the issue that we had before, and this is the part that I’m a bit biased, we need a way to take all the information that comes from those tools and understand what is happening in our cluster. Because if I have something like autoscaler that starts to scale up instances, then scale down and scale up instances, then scale down, something is happening. Now I have cost. Now I have stabilities issues. Now I have issues that I even didn’t think of. So, I need a way to be able to gather this information, to be able to troubleshoot this information, and to be able to proactively do something when something happen. And I’m sure that when Itiel will explain it, it comes — it works fine also with the system that they developed. Here comes a product, like Coralogix that basically takes those logs, and help us to do this mitigation to get this understanding to add context to the problem. Itiel?

Itiel: Yeah, yeah. So, I will say like, I’m going to comment on Komodor very briefly. But I will say that, like, if we look on it like from a historical standpoint, then the first thing that people really needed, like, the cloud really have like moving faster, right. But people needed a way to release faster, like, in order to basically to ship their code faster. And we see like the adoption of tools such as like Kubernetes, obviously, but also on the functions and so on just to make sure we can ship the code faster. Afterwards, great tools, like Coralogix or Kibana, in general, or Prometheus allows us to monitor this new system easily. So, not only that, we have Kubernetes and Lambda to ship the code faster, now we have really good open-source or like the hosted solutions that help us understand what is currently happening in my system. Because like we said, everything’s moving fast and we need to have the right tools to make sure we stay in control while moving fast.

And I would say like here at Komodor, we see ourselves like as the other missing link in this new world, new basically to allow you to understand what changed in your system and by whom very easily because of the speed of changes that we currently see in the modern stack. But I will say that like it’s a real issue like releasing fast and like even tools like Argo CD, which is another like open-source tool to help you ship faster, we see how companies that are using those modern tools, those open-source tools, are getting a very big increase in the velocity of the changes. And how by using like a lot of different tools from different areas, you can achieve speed that was like once unimaginable. I know that here in Komodor, we are quite a young startup, but we release like 15 times a day already. And that is because we use such like great, great tools to make sure everything works as expected and to make sure no matter how fast and when we will release things doesn’t really break.

Tali: That’s a good segue I would say to a question that we have from the audience. Isn’t using so many different tools just making things more difficult and complex when building fast? So, maybe Itiel, you can talk a little bit about the tools and how you guys have settled on a stack that works for you.

Itiel: Yeah, sure. So, how did we choose like the stack that worked for us? I will say there was a lot of trial and error, to be honest. We shift quite a lot. Like I said, we’re a young company, we believe in moving fast. But we change pretty much everything from like the database to our CI/CD tool. We did stay on Kubernetes, so this is the only constant over time. But other than that we pretty much like change everything. I think that when choosing a new tool, the main issue is like to evaluate it first to make sure it works as expected and solve a real issue for you. I will say the amount of new tools and new systems not — even if you like do the right choices and for example, I really think that we did a lot of like good choices that allows us to move faster. It is hard to like to manage everything. And going into like 10 or like 15 different tools once you have an issue, it is a real problem and we do see it a lot with our customers. Basically, it’s one of the things that Komodor helps you with. But it is an issue and even like, I think the ecosystem now is like, it looks like it’s flooded with new tools. Once you use all of these tools, you’re just, like, tired of like jumping between tabs, trying to scrape like the relevant data from each and every one of them. Oded, did you want maybe to add?

Oded: Can you repeat, so we can make it —

Tali: So, the question was, isn’t using so many different tools just making things more difficult and complex when building fast? So, if you could just talk a little bit about maybe how you’ve chosen tools that work for your team?

Oded: Okay. So, it’s an interesting question because basically, when you go to Kubernetes, you see that you have and we are heavy users of Kubernetes. Most of our — all of our infrastructures is Kubernetes. We run all our data stores, all our services on Kubernetes. And when you start to manage this kind of system that requires from one side, it gives you a lot and helps you. But from the other side has a lot of needs, like a child, it needs a lot of things. Then you start using open-source, you say either I will invent the wheel, which in most cases is not needed, or use an existing tools. And you start with one tool, and you say, oh, it’s great, and then it becomes two, three, four, five, seven, 20. Today, we ran over 60 different services on Kubernetes that are not related directly to our product. They give us all the infrastructure and management that we need. But it’s not related directly to Coralogix as a product. Without it, Coralogix probably won’t be able to hold the current scale. But it’s not part of the development that is happening in-house. Added to this all the needs of opening PR and adding everything that you need to the existing product, it makes a lot of issues, but I don’t see a way that you want or will not use those tools. So, it’s really interesting.

Tali: Or maybe it’s, yeah, finding a good way to connect them or just make the usage as streamlined as possible even if you do need to add. So, maybe like the answer is don’t be afraid of adding new tools, but figure out a way to make them work for you. Yeah.

Oded: Yeah, you start from a line of thinking of how do I’m going to monitor them. When you understand how you’re going to observe or monitor the system, then you know that you can use the tool in — at least in production environments.

Tali: Great. I think we can move on. We’re going to be looking at a sample troubleshooting flow. And it’s actually a really nice way to the topic because we’re going to be looking at how Coralogix and Komodor can work together without causing a lot of headaches, basically, or making anything more difficult or complex. So, before I pass it on to Oded to kind of take over and look and we’ll look directly in the Coralogix platform, and I’ll just go quickly through the flow. So, when an event occurs, an alert is triggered by Coralogix, and an alert is sent to Slackbot. And from the Slackbot, we can open the Komodor platform directly from the alerts and easily drill down to find the root cause and quickly resolve the issue and keep moving. Right? That’s the idea. So, Oded, take it away.

Oded: Sure. So, I’ll share with you Coralogix’s product pretty quickly. So, as we discussed, we have a lot of events coming to our system. And on a daily basis, we get around few millions of log from all of our environments. It’s worth around one tera of data, starting to run over it manually is basically impossible. So, what we have is a system basically that allows us to collect all those logs, aggregate them, do proactive action based on changes that are happening in the system. So, let’s take a scenario that is fit with our friends from Komodor. We basically create we — we get a lot of events to our system, we are creating an alert. This alert basically is for one of our services, the REST API. And the idea behind it is that once a specific log appears, it basically send the critical alert to Komodor. And we did the preparation in the beginning, we collected the logs, we have an alert, the alert is jumping. And when the alert jumps, it basically gets to Komodor and from there, Itiel will continue to demonstrate the solution.

Itiel: Okay. Great. Thank you very much, Oded. So, I’m going to do like a very, like — I’ll continue with your flow. So, this is the Komodor platform to all of you who don’t know Komodor. Komodor, like I said, is a Kubernetes native troubleshooting platform. Basically, we collect all of the changes of the alert from your system, and allows you to get a glance of both like the higher level, what is currently happening in my system. But also for each series, we build a comprehensive timeline, including all of the changes, alerts, config changes, and so on, all in one place. So, like Oded said, in this example, we had an issue with the REST API. Basically, one of the services in our system, I can see that it can only have like one out of one replica. I can see the full history for the last 24 hours for this service. And using Komodor, I can see everything that change in this particular service over time. So, I’m going to click on like an example of like a deploy event. And I can see everything they change both from the Kubernetes side, basically, I can see that the image change in this deployment, and I can expand it then basically to see all of the change. But more than that, Komodor know to match different tools into the same screen into the same context. So, I can know what was the GitHub changes that happen in these specific deployments.

So, now let me go like over time, and I can see that it was deployment after deployment, and each of them changed something in the image. And okay, are removing unused code. Sounds good. And the last deployment didn’t really change any image. The only thing that changed is the replica count. It changed from five into one. Basically, I can again, like zoom in and see that this is what changed. And right after this change, what happened is two things. One of them is a health issue. And I can zoom in here and see basically, the pod, the new pod is having like a out of memory, or like the only product left is having out of memory. And I can also see the alert that I got from Coralogix that helped me identify the issue, and basically tells me that the service REST API has too many unavailable replicas at the moment. So, the idea behind Komodor is to take all of this data from different tools into one place and to give our users a very simple way of looking at things basically, to troubleshoot without the need necessarily like to open each and every of those tools, but only go there when needed. Like, for example, when you need to read the logs, you can go to Coralogix directly and view them and troubleshoot like the app level. So, that’s pretty much about it. I will show just explore services, which is Komodor way of like viewing not only a specific service, but all of the cluster, basically all of the services and your current healthiness status. So, yeah, that’s pretty much about us.

Tali: So, just to clarify on the workflow here, so once the alert is set up, and that would just be like a one-time thing, right, Oded? Like, once the alert is set up, the user wouldn’t need to open both platforms. [crosstalk] Then it would be like a really easy, you know, links always available not like — Yeah, okay.

Itiel: No need for like a lot of or any like manual work, only like configure it once and it just works.

Tali: Awesome. Okay. Yeah, question, how does Slack come into play? That’s a good question.

Itiel: Yeah. [crosstalk] No, no, go for it.

Oded: Cool. So, I can tell you that, for us, Slack is really important for day-to-day. The concept of Slack is basically using channels or being online all the time. But other than that, it is a bunch of integration. Take, for example, Coralogix. So, in our scenarios, we are able to create events directly from Slack to Coralogix which allow — help us and/or allows us to see manual changes that happen in real-time as part of the flow of data. We are using it to get basically popups on things that happen on our production, on our deployment process, etc, etc. Basically, as I said, Slack today is an interface or an extension of any tool that is being used either Coralogix, Komodor, or any other tool in this department. And Itiel, if you have something to add, feel free.

Tali: One more question before I’ll hand it off to you, Itiel. So, Oded, you’re saying that actually, the connection between Slack and Coralogix is going in both direction, is that correct?

Oded: Yes.

Tali: So, or you’re sending alert data and to Slack but we’re also able to ingest that data and kind of incorporate it to add more context to the more traditional, okay. Itiel?

Itiel: Yeah, yeah. I will say that like one of the cool things that we see in Komodor is that because Slack is so strong, and obviously Komodor has a great integration also. With Slack, we have a Slackbot, and you can send alerts from like Komodor to slack as well. The very cool thing is the ability to troubleshoot or to get a very good glance on what is currently happening in your system using Slack. So, basically, because everything is being sent into Slack, if it’s like a release nodes from Komodor or alerts from Coralogix, you can get a good glimpse without opening any other tool, basically, only the tool that is already open, which is Slack. And you can understand what is currently happening in your system. I will save that our product also try to enrich existing alerts over Slack, basically, but giving our users all of the context, they need to troubleshoot efficiently. And this context might be you had this in this deployment, or you open this feature flag and did this configuration flag. But also FYI, you have those alerts from Coralogix, and those changes in this specific area of your system. So, pay attention to that when you try to troubleshoot.

Tali: Awesome. Great. Do you have any other questions from the audience? We’ll kind of hang out for a few more minutes, let some of those maybe lingering questions come out.

Itiel: And I can say like, if I have like a minute while people are thinking about questions, I think one of the biggest things that we see in organization today is that some people in the organization have become like the knowledge hub of the organization. Basically, they are the bottleneck — They are the only one in the organization who knows how to use all of these different tools and to troubleshoot in using those tools. What we try really hard is to liberate this knowledge, basically, to take the data from all of these different tools and to showcase it very easily in order to free up the bottlenecks, which are basically the most busy people inside organization. A lot of the times it’s the DevOps or like the DevSecOps, and to liberate their knowledge and the things that only they know how to do at the moment. And like using Komodor, every developer can be just as good or at least very similar to an expert DevOps and to get all of the data, all of the complex they need in a couple of minutes instead of asking Oded every time like, do you know why my system is down?

Tali: Right. Yeah. Okay. Yeah, we did have a couple of questions come in. The first one is for you, Itiel. How does the onboarding of Komodor look like and does it replace APMs?

Itiel: Yeah. So, we don’t replace APMs. We work perfectly with existing tools and APMs if it’s like Coralogix, Datadog, New Relic, and so on. And the onboarding experience is very easy. Basically, install a Kubernetes agent, which is one pod per cluster, integrate with the Komodor platform and we usually give value and see like the first value to our users in a couple of minutes. So, a couple of minutes from installation, you have this tool that already crack everything that happens inside a cluster and notify you when we have issues.

Tali: Awesome. Oded, I would like you to also go into a little bit of the integration and the setup with Coralogix, and then I have a follow-up question, you can just kind of flow into it. Are there any events in the system that don’t trigger an alert or that become invisible using traditional monitoring tools? Oded?

Oded: So, basically triggering an alert to something that is based on the company knowledge or the people in the company, there are some cases that you want an alert to jump on or to trigger up based on an event that is happening on the system. But these abilities are either reflected in Coralogix automatically, by some kind of AI that we provide, or by people manual knowledge. There are cases that I would not like to trigger someone at night, if there is an issue, I can wait for the next day. There are cases that I would like to know about something that is happening, but I wouldn’t consider it as an alert; cost exhaustion or some kind of change in the system that might be reasonable. So, there are cases that alerts are not being triggered. Most of the cases are decided manually by the team or by the company.

Tali: What are your thoughts also, in terms of — I mean, if we’re talking about traditional monitoring tools and looking at the amount of data being generated by these modern applications and systems, can you talk a little bit about in terms of I mean, events would be invisible if the data isn’t being analyzed, right?

Oded: So, the cool thing about Coralogix system and probably about Itiel system is the fact that we have endless number of integrations. Which means that you can take any part of your organization and basically integrate it to Coralogix. And if we looked on the traditional system and we considered logs as logs, today, we consider them as an event. And an event has more power. An event is something that can be driven. It can affect my system. In Coralogix, the cool thing is that as you add integration to the system, you get more visibility, you get more observability, you get the ability to create more proactive actions that are meaningful to customers, meaningful for us. I can tell you that by using our system, we basically stabilized our own product. We’re basically dog-fooding ourselves, which is pretty cool.

Tali: Yeah, that is — There’s been a lot of inception — chicken and egg concepts going on in this discussion. Great. And in terms of the setup and all the integrations that you mentioned?

Oded: So, in Coralogix, you basically an open an account with a username and password, or email in our case. And once you open an account, you have a Coralogix account, and you just start sending data. It’s transparent, it’s simple, it’s something that is done almost automatically for you. You just need to send data or decide which data you want to send. And by sending this data, we take this data, we run it through, we enrich it, we help up you get better decisions.

[Webinar] Troubleshooting in Fast-Paced Environments w/ Coralogix

Itiel Shwartz

Co-Founding CTO @Komodor

Oded David

Head of DevSecOps @Coralogix

Get started with Komodor

Get started with Komodor

AI SRE Summit 2026

You're In!