Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Automate and optimize AI/ML workloads on K8s
Easily manage Kubernetes Edge clusters
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Your single source of truth for everything regarding Komodor’s Platform.
Keep up with all the latest feature releases and product updates.
Leverage Komodor’s public APIs in your internal development workflows.
Get answers to any Komodor-related questions, report bugs, and submit feature requests.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
Webinars
hello and welcome back to the Rawkode
Academy
I am your host David Flanagan although
you may know me from across the internet
at Rock code today I’m going to guide
you through commodore
if you’re not familiar with commodore
it is a SAS based product to help you
troubleshoot and debug your kubernetes
clusters
something
I have a few opinions about
and I don’t often cover SAS based
products however Commodore just this
week
announced their new free tier meaning
you don’t need to pay to get started
with commodore
not only that Commodore sponsored my
time to produce the advanced scheduling
demo and lastly
they’ve also committed to being on an
episode of clustered where I’m going to
put them through the tests I’m going to
give them a broken cluster and they’re
convinced they can use Commodore to fix
it
so thank you Commodore for sponsoring
the advanced scheduling video
I’m sorry and thank you for joining me
on cluster very very soon
but today let’s focus on the tutorial
I’ll show you how you can get started
we’re going to start from the beginning
but also showcase some Advanced use
cases for commodore
but the first thing we need to do is go
to commodore.com
from here you can feel free to read the
marketing material go to resources
pricing documentation whatever you want
I’m going to start by logging in
which I use my Google account
so immediately we’re presented with the
service list this is a list of all of
the microservices or maybe huge Services
who knows deployed to your kubernetes
cluster
now you won’t see this list right away
this is because I’ve already added my
first kubernetes cluster but let me walk
you through the process for doing that
yourself
down at the bottom left you will see the
Integrations button
when you select this you can click on
add a cluster
you can give it whatever name you want
and hit next
this will give you the command that you
copy and run in your terminal it will
add a Helm Repository
deploy the helm chart with a Commodore
agent
and then you click next where it will
wait for the connection and confirm it
from there go back to the home page and
you’ll see your kubernetes services from
your cluster
nope there’s a couple of nice things on
this page right off the bat first
all my services are healthy hey
but secondly
we get a good overview of the workloads
running in this cluster
you can see I have a bunch of Prometheus
stuff
I’ve got one password connect with
githubs cert manager shop I
lots of cool stuff
now if things weren’t all healthy we
could either exclude the health use
or we could filter on their health feeds
if you have more than one cluster
you can filter by that too
and if you only want to take a look at
particular namespace
in my case let’s just take a look at my
community namespace
you’ll see that I’m only running a
single service
if I want to view the platform and the
community namespace I can do so as well
if you want to filter by workload type
we can click on demon set and see demon
sets just the basic sentence that you
would expect from a service overview of
the last thing I’ll point out on this
page is at the top right
here we can sort by a few options by
default is on health which makes sense
if there’s something that is unhealthy
in your cluster you want to see that
first the other viewers that I’ve been
enjoying over the last few days is
namespace
it’s a good way to break it down by
namespace without specifically filtering
on a name space itself
and if you’re only worried about things
that have changed recently go to last
modified and you’ll see the most recent
resources that have been modified within
I deployed previous today so we can see
Prometheus front and center
and that’s the service overview it’s not
life-changing
but it’s very valuable
with just enough functionality
to maybe pry Cube control over your
hands when things go wrong
so let’s see what else we can do with
commodore
so we also have the jobs option on the
left although I have no jobs in my
cluster however this is just the same as
Services if you are using the job object
or the crown job object you will see
them listed here
next we have the events this will show
you all the events from your kubernetes
cluster now this is something that can
be typically quite overwhelming to do
from the cube control command line
because events come fast and furious in
a kubernetes cluster
and when we have an abundance of
information
bring in a visual layer
to that information as how we develop
understanding
so let’s see how we can understand the
events within a kubernetes cluster with
much like the service page we have the
ability to filter these events on
cluster and namespace
however now we can filter by individual
service
we can filter by the event type
we have the ability to filter on the
status of the event as well as deploy
details and availability reasons
and we’ll get into more of these in just
a moment
but first let’s take a look at my
platform namespace
now here we can see all the events as
Commodore was deployed to my cluster and
went through the discovery fields that
is discovering all the workloads and
resources within my cluster from here we
can click on the service name
so that slides out at a nice kind of
popover model dialogue meaning we don’t
really lose our original context when
we’re debugging which I think is really
important for a debugging tool so very
nice addition
we have the service name the health
status you can see all the events for
the service as well as the pods
the nodes are scheduled on and some
additional information which gives us
access to the labels and annotations on
the service
okay let’s pop into the monitor
and we’ll select our grafana service
currently we only see information about
grafana which we’d expect
we can see the events again the nodes
pods
and our labels and annotations
now before we take a look at the best
practice recommendations let’s pop back
over to events and see here we have the
related resources button
now this is quite nice because it allows
us to select other resources within the
same namespace
if we want to be able to collectively
grip them and view their events together
so I’ll pick on kubernetes services and
I’ll mark this as related
I’ll pop over to contact Maps where I’ll
select the API server one and I’ll pick
one more which is to pop over to secret
and secret finder config
we apply the selection and there you’ll
see that the events listed for on this
resource include the related resources
now I think this feature could be
improved I’d love to see Commodore scan
the yaml for reference config Maps
secrets and services with matching
selectors
and hook this up for me however doing it
manually if there’s a few resources that
I do want to group collectively isn’t
exactly at the end of the world so it’s
a cool feature and one that could have
some really interesting improvements
over time
so let’s go back to the information
screen and we’ll see these best practice
warnings
so when we click this we have a bunch of
checks and here we can see that our
deployment has one replica now this is a
warning just because if we lose that
replica we’ve lost our service so maybe
you want to run two or three however you
know your services better than any tool
can so feel free to use the ignore
button
for grafana maybe we determined that we
do only ever want one and we’re happy
for that to be offline if something goes
wrong
we can just say ignore for 14 days 30
days 90 days or forever
so perhaps I’m not ready to make a
decision on whether this is good or bad
yet and I’ll ignore it for a couple of
weeks
next we have a critical warning telling
us that this workload has no liveness
probe
if we expand it it tells us
that life has probes are sustained to
ensure that an application stays in a
healthy State when a liveness probe
fails the Pod will be restarted
this is a pretty neat Behavior the
cubelet monitors our workloads and if it
needs to kick them it kicks them
so you should always try and have some
of these best practices whenever
possible and commodore brings that front
and center
so I’m not going to ignore that one
because you know what I should have a
lightness program that’s workload
now we’ve got some past ones here where
we have a Readiness probe we have CPU
and memory constraints
and the last one is just a pill policy
it’s not good practice to have an
average pill policy of always
usually preferred to set it to F not
present it just means when the workload
restarts you don’t need to go to an
image registry and see if it can be
pulled down and it usually means you’re
using some sort of Alias tag system
again we want to kind of get away from
that as much as possible
so it’s not a critical but it is a
warning that you maybe you need to
update this
and I think this is a nice way to gain
more insights and understanding of the
services within our cluster
so let’s see what else we can do from
the events page
so I’m not going to filter on an
individual service I don’t think that
shows as NF in you but if we scroll down
we can filter on the event type
so let’s filter by one of these event
types and see what information we get
back
let’s start with one of the most common
ones which is conflict change
this is going to tell you when a
conflict map is created modified or
deleted within your cluster
so
let’s create one
here I have cm.yaml which is a config
map called raw code
if we go to the terminal
we can apply this to our Monitor and
let’s make a quick change so that’s
conflict map and say that we no longer
want key value instead we want
name
David
go to our terminal
apply this one more time
and let’s go visualize this with
so right away we can see
that a config map was created and the
monitor namespace called Rockwood
we click on this we have all green
because it was the first time this
conflict map was created
we then have our change and this time we
can see that the key value was removed
and named David was added
and if you want to view this in more
details you can expand the death
we can see the data changed
along with some metadata about the
resource as well
and this is one of these really simple
but very valuable features when things
go wrong on a kubernetes cluster is not
because the resources haven’t changed
it’s because of our changes the things
sometimes go wrong human error is
probably still the biggest cause of
problems in a kubernetes cluster
so it’s crucial
that you understand when conflict
changes in your cluster and how that can
have a cascading effect on the workloads
within your cluster
and your ability to see those changes as
they happen
will substantially lower your mean time
to recovery
so beyond conflict change
let’s filter on availability issues
no availability issues give us an
understanding of when a workload was
unavailable
perhaps because the part was being
restarted or the probes were failing
if we take a look at the grafana one
here
you can see that this pod was unhealthy
and why it was unhealthy well because it
was container creating of course it’s
not healthy if it’s discreeting
also what’s nice here is it shows you
each of the containers and the status
for them too
if you want you can click the live pods
and logs button
this will show us the current pod and
our cluster for that selector
where we could pop it open and go to
logs so it’s nice having a logs right
front and center when required
if we pop back to details we can see the
conditions
that tells of our pod is healthy
we have the containers the images are
running the pill policy the ports the
mounts and the arguments all the useful
information that you need
we have the abilities to see the
tolerations
the volumes
and of course the events associated with
this workload
if you’re a fan of the cube control
describe command you can click the
scrape and get the exact output on the
screen as so
so to reiterate from the advanced page
we’ve seen a pod web availability issues
we went to the current instance of this
pod we’ve seen there is no problem and
we had all the information we need to
debug a problem
to debug if there was something wrong
now the rest of the Commodore UI is
pretty much more of the same
we can break down all of the resources
we can see nodes we can click on a node
we can see all of its conditions the
capacity
and allocatable resources across CPU
memory storage Etc
we have the ability to Corden and drain
a node if we wish
for workloads it’s the same we have
deployment so we can click we can add it
we can scale we can restart
you can do this for most of the
resources within your cluster
for storage we can see storage classes
or config we go to config Maps
and we can see them
pretty much you’re getting a visual
representation of everything you can do
with the cube control command
we can even list the custom resource
definitions within our cluster
and search
so I’m not going to spend any more time
going through this
because these web pages are dashboards
and as we all know dashboards are not to
be looked at until something goes wrong
so how do we get information from
Commodore to give us a layers when our
attention is needed
and for that we have monitors
we can expand our cluster
and we can see that we have some rules
already in place these are shipped by
default by Commodore we have an
availability monitor this will let us
know if any of our workloads are less
than 80 for more than 10 seconds if we
need 10 pods in our deployment
and for more than 10 seconds we have 7
or less we’ll get an alert
if our Crown jobs are failing we get
another
we can get alerts for when deployments
are updated
and we can also get alerts for when our
nodes are not healthy
so let’s take a look at one of these
alerts and then configure our own
here is the deployment earlier this is
going to let us know whenever a workload
is modified within our cluster
using the Integrations that you have
configured with Commodore you can use
these as destinations
we can use a standard webpack or publish
a message to Slack
I have a channel called SRE
and I’m going to click save
so now if we modify a deployment we
should get a notification to my slack
Channel
so let’s test it
I have my slack here but first we need a
deployment change
so I’m just going to do this through the
Commodore UI
I’m going to go to deployments
and we’ll modify the sert manager
deployment
we can click on edit yaml
where I’m just going to add a new label
we hit apply
we can see that we have new events
and we can see that our manual action to
edit a deployment
we click on it we see the change so even
though we can see the deployment changed
here this will not trigger a slack
notification
because it uses the resources generation
rather than a resource version
which is good because
that we really need a notification that
a label changed on a workload when the
workload itself was not restarted
rescheduled or modified
now let’s make one more change to a
resource
this time we’re going to add an
environment variable
which is going to have a value
of high
now this will trigger
a new generation
and if we pop over to Slack
we’ll see the notification and the SRE
and if we click on this it takes this
directly to the event with the change
we can see that the revision and
generation of this resource went from
one to two because of an environment
variable addition
let’s workload
change is also denoted here by the new
deploy event which tells us that the
image doesn’t change but other aspects
did so give it that we have a pretty
sophisticated troubleshooting and
debugging tool here
it’s also worth noting that Commodore
has
pretty elevated privileged access to
and as such we need to be able to trust
luckily Commodore pervade the ability to
protect some of our more sensitive
information from being leaked through
the Commodore UI
let’s go to resources workloads and pods
if I select the default namespace I have
a new super secret workload
if we click on this there’s not a lot to
see here but if we go to the logs we
have
a password
how do we prevent Commodore from leaking
such information no it doesn’t have to
be just an or standard out logging
although when applications crash
sometimes they do dump the environment
revealing some very sensitive
but also how do we redact this from
conflict maps and other sources of
sensitive information fortunately by
default Commodore hashes all the
information that it pulls from Secret
resources
but we do need to put in a couple of
extra steps to protect standard out and
or config Maps although I hope you’re
not storing too much sensitive
information in the config map
and I’m going to configure this through
so the first thing we want to do is to
go to configuration and config Maps
where I can select the Commodore
namespace here we have the kubernetes
Watcher config
and I’m going to go straight to the edit
page
you’ll see we have two settings here on
lines 20 and 21 called redact and redact
logs
these take a list of Expressions to
redact from kubernetes resources and
from log data
now the six steps regex pattern matching
like you can do with any sophisticated
login Library
but I’m going to keep it very simple for
this demo
I’m going to explicitly say that I want
my password one two three omitted
and we’ll add one more
this time we’ll do a reject match for
anything we’ll click password equals
let me open the matcher to dot star
and we’ll stick a space on the end
so now we will have to kick the
Commodore agent
so that reloads its configuration
we can just delete
and already we can see we have a new one
running
so let’s go back to our pod and the
default namespace
where we have our super secret secret
workload
and if we view the logs our password one
two three has been redacted
so let’s modify this
at the deployment level
or we can say edit yaml
and we have password one two three but
let’s also add password equals
hello
not secret like so
what caused this pod which we can see
here to be terminated with a new one
and if we pop open the logs for here
you can see that both values have been
properly redacted
so this is a very important feature
but also a very cumbersome feature
because security is hard it’s never easy
it would probably be worthwhile for your
team or organization to have convention
to well not log sensitive values but if
you do always make sure there’s some
marker in place so you can configure
tools like Commodore and other logging
systems to redact that information as
fast as possible
and you can even run the container
locally to test your redactions before
pushing them to your kubernetes cluster
if I go to my terminal I have a just
fail
it’s like a MIG fail however it allows
positional arguments on targets and it’s
generally just a little bit nicer to
work with
from here we can say redact and you can
already see the autocomplete here and
the documentation from the just file but
we provide a redaction phrase and a log
Lane
so I’m going to say let’s redact raw
code
and then the log Lane that I want to
test is to say I and raw code hear me
type
just pulls down a container image does a
little bit of plumbing and then shows
you the input log and the output and you
can see what it was before and after the
reduction
this means that you can test your regex
patterns all you want
say you want to do password equals
dot star
question mark
raw code because maybe we got it wrong
and then in our test
with the password
equals
La Raw code without any
so this won’t redact but we have a
problem we can fix it
so let’s run that again with the E on
the end
and oh is still broken
well clearly I don’t know how to spell
Rockwood
there we go where’d you do it right it
works that string was redacted because
we were able to test the regex
now this is really cool you can actually
Plumb in a shell script a whole bunch of
example log lanes that you have from
your application but redactions that you
know always have to be satisfied and
hook this into your CI system and that
way you know right away whenever you’ve
got Secrets leaking that should be
redacted
so that’s a quick overview of commodore
there’s an awful lot to love and it
gives you great visibility into the
wonderfully complex system that is
kubernetes running our wonderfully
complex applications which are our
microservices
this was just part one there’s going to
be a part two of this video dropping
early next week and part two we’ll be
taking a look at more of the Commodore
Integrations
we were seeing how to integrate your
Source control via GitHub we’ll also
take a look at hooking up to Sentry for
exception tracking
and grafana an alert manager giving you
full visibility across all of your
observability stack
and then at the end of next week part
three will drop where we take a look at
two final features one at the humble web
hook and how we can get information from
Commodore to do whatever the hell we
please
and then one of my favorite features
the v-cluster integration
deploying Commodore to all your virtual
and multi-tenant kubernetes environments
we’ll be back next week with the next
video Until then have a wonderful day
and I’ll see you all soon
and start using Komodor in seconds!