Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Automate and optimize AI/ML workloads on K8s
Easily manage Kubernetes Edge clusters
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Your single source of truth for everything regarding Komodor’s Platform.
Keep up with all the latest feature releases and product updates.
Leverage Komodor’s public APIs in your internal development workflows.
Get answers to any Komodor-related questions, report bugs, and submit feature requests.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
Webinars
You can also view the full presentation deck here.
*This is an auto-generated transcript
1:45
enemy and we have built a multi-cluster on-prem platform that we call Jupiter
1:51
namely because platforms needs names and
1:57
today we’re going to talk about DNS and more specifically multi-cluster than
2:03
ever and how we’ve try to solve that it’s amazing it’s amazing how you made
2:09
the company and actually are your name are you name the product and the internal platform that allows you to put
2:16
in great solution I think on-prem is always something that interesting people to hear because most of the knowledge
2:22
out there isn’t mostly on just Cloud public clouds and going into yeah
2:29
that’s very interesting and so before we are deep dying maybe let maybe explain a
2:35
little bit about the challenges that you had for sure um yes so uh I joined about two years
2:43
ago and uh yeah let’s on this devops journey whatever that really is and I
2:49
tried to look at what are the pain points of the company currently um one thing I saw is that the way we
2:57
managed and operated I.T infrastructure and software we developed
3:03
our social media also had to operate in regard to running the business in general from
3:08
third-party windows I saw that it was way it was very scattered around
3:14
virtual machines bare metal servers a bit of containers but the containers
3:19
running in a on something we call I don’t even know if that’s a term out there but we call
3:24
it Docker hosts so yeah Dr container containers with the dagger uh engine and
3:30
but running on single hosts for some environment so no though really robust uptime and so on so
3:37
I thought uh what can I do to streamline uh the way we run things uh so I set out to
3:45
create a platform based on kubernetes as we know it’s a
3:51
distributed system for orchestrating containers and because we for one thing
3:56
all News software we developed namely we we do it for containers anyway that was
4:03
already a theme before game so I thought hey we’re always doing this for new
4:09
things and pretty for some years and also the the older stuff was also being developed to run in
4:16
in a container so far we need to orchestrate this and also create a
4:21
platform uh on that on in on the way that can honor and run like the general
4:27
things components software and so on we need to run and there you go with people
4:32
namely and one of the challenges was then to actually get DNS riggered in an
4:37
automated way and when I started it it was a user supposed to get the email and then
4:46
he waited and then then I say up to four days but I actually experienced that it
4:51
took up to four days because there’s some first level handing in the world you
4:57
know that the old ticket gas ticket Bingo or whatever we call it these days
5:03
uh so it could take many days totally unproductive days of you know okay then
5:10
you figure out something else and you but he’s we’re still waiting for this DNS record which is a really simple
5:16
thing just to to connect to your service running somewhere right and besides that you also pinpoint here
5:22
it’s it was very manual and Automation and all
5:28
um and then in a container orchestrated world
5:33
and with kubernetes you really have to um
5:40
be in control and being able to know what’s going on when
5:45
it comes to DNS because of kubernetes has always been this right you have CNN so many layers you have the cube DNS
5:52
service you have uh nodes local DNS you have outside the
6:00
name and your course if you have stuff coming in for the public incident you also have you know public uh the public
6:06
DNS infrastructure so there’s a lot of layers of DNA so I wanted um a robust and performance and
6:13
manageable way of yeah operating a dnf service for the different needs we haven’t had at the
6:20
time um so yeah I think that’s I I think it’s really amazing because
6:27
DNS is usually the things that back in the days someone will open a ticket and
6:34
someone else will like edit manually the the configuration file to add the
6:39
records that you wanted and when you come to kubernetes or Docker when everything is is more fast everything
6:45
spawn go that and you need this service Discovery ability quick and fast it’s
6:51
becoming necessary to make it shorter and faster on the other side we know that DNS
6:58
problems are actually I think the second most common problem to causing
7:04
infrastructure down downtime because when you don’t have DNS you don’t have anything so it’s also this critical component
7:12
that when you don’t have it right like your all infrastructure is going down yeah and maybe if we have time I could
7:20
come into a really we had the perfect storm of uh of of DNS failure that that
7:27
really cascaded into like certificate certificates not renewing
7:34
so uh yeah that was nice incidental or a nice uh in hindsight
7:41
yeah yeah if you learn from them they’re really good I agree I agree so let’s talk maybe a
7:48
little bit about the goals what you’re trying to achieve after you discover all these challenges and problems
7:56
yes um yeah so we set out the with them The Horizon that we just talked about
8:02
the the scene which is the elaborated on we’d set out to achieve these goals of all too many DNS making it a
8:10
out of the box platform service on kubernembly um and pretty much something that that is
8:17
there and we of course we have to update the components of the DNS packets we have but we also
8:25
don’t have to you know push push it along every day and nurture it so that it it should be somewhat
8:32
a pretty robust and live by itself uh in in most cases
8:37
and also we didn’t later on found that oh okay we started out with some fully
8:42
qualified domain names of X Dot
8:49
plus the part of the name that whatever soft domain we’re in control of and and we wanted to abstract away uh to the
8:56
developers what cluster your service is actually running on because one they don’t care two it’s extra
9:04
characters you type in in your browser uh Fields search panel
9:11
uh it’s also it doesn’t it it doesn’t
9:16
give meaning to them what is this this whatever across the water but it was like I’m okay I know I’m it’s in my
9:24
service in test but I don’t I don’t care watch server specifically I just want to hit this thing and work with the right
9:31
so that was also a part of uh of our journey later on
9:37
and some of these decisions uh have to be very pragmatic because starting out
9:43
it was it was me and two uh one one uh one student at a
9:50
university if so only 15 hours a week and another one that was pretty green for you on a green field but green and
9:57
new in in the world’s Cloud native and kubernetes so I had to make some decisions that
10:03
okay this really have to be able to live on its own and be
10:08
their hands off and as I write here I think that’s interesting because
10:15
sometimes when people think about climb Cloud native maybe an organization that
10:20
started on cloud native it’s it’s it’s pretty nice but when you take a look a
10:26
look about organization that haven’t started them and then you need to create everything on your own is becoming like
10:32
you’re doing everything on your own and and changing a lot of things within
10:38
the organization that are not related to technology and and that’s and that’s amazing and I
10:45
think the challenges that you break through in terms of bringing a new platform to the organization and also
10:51
solving each part of it on your own and with low resources that’s amazing it’s
10:58
really amazing I agree and and one thing with we haven’t mentioned here is but I
11:03
just think of now that we talk about it is low resources also in the same sense of
11:09
Finance like like you know budget uh
11:14
there wasn’t really any you know budget for buying all types of software so I
11:21
quickly saw and and concluded that what we do needs to be open source and yes we
11:29
want to give back to certain things and and I’ve been active in different uh projects to give the feedback or
11:37
maybe a pull request or whatever but it needs to be something we could start out with and then maybe you know buy the
11:44
Enterprise option or whatever and I think DNS is like a good example
11:49
because core DNS is one of the top uh commonly adopted open source DNS
11:58
solution I think like most or almost all of the deployment of kubernetes I know
12:03
with a lot of our customers rely on core DNS and it’s amazing that you pick this
12:09
open source solution which is a core component infrastructure which 10 years ago you don’t I’m not sure if people
12:16
would intend to put this infrastructure on open source solution you are able to
12:22
bring in uh by the way if someone got any
12:27
questions feel free to drop it on the chat we will be happy to have to to
12:34
answer the question and help you understand much better uh about mem League goals challenges and what we are
12:41
going to discuss later in this talk
12:47
so maybe let’s jump in for the people that are not DNS expert DNS is like it’s
12:52
complex and as you said you have multiple configuration multiple layers within the node that side of the node in
12:59
the cluster outside of the cluster so maybe we will make sure that everyone
13:05
gets the same basic of understanding about DNS and then we will be able to show what solution did you put in place
13:13
in order to get this multi-cluster DNS and it’s not only about that it’s more
13:21
cluster I think what you mentioned in here the the fact that it’s fast the fact that it’s Dynamic and the fact that
13:28
developers just don’t care whether it’s run on that’s give the power to people
13:35
that don’t need to understand and learn and put in a lot of like tribal knowledge
13:42
so let’s jump in about kubernetes and DNS at its basic
13:49
so why we should even need a DNS within the cluster
13:54
so I would start with the fact that any kubernetes cluster got some sort of DNS
14:02
um why is that first of all you need the internal Dynamic resolution pods are going up or down services are
14:08
distributed a podder distributed and you need this fast resolution because it’s
14:14
so Dynamic you need something that will help your services to resolve it even
14:20
just internally to understand where the relevant pods water relevant IP and it also relates to
14:27
service discovery which helps you to understand I don’t care about how many pods are behind this service I just want
14:33
to eat a service make sure I get it in place but all of that is just inside the
14:39
cluster when we are talking about outside of the cluster it’s a different level and Lars are going to explain what
14:45
are the challenges there later the next thing is Cash you want to Cache
14:51
your request because it will make you anything the system will flow faster and upframe forwarding if you don’t have the
14:58
the right service you definitely want to forward it to some other DNS that should
15:05
have the answer for it so this is why and and there are many reasons for it
15:10
but those are the four main reasons why you should have DNS within the cluster uh I would say that in some
15:18
architectural last reference before uh you you have like a DNS local agent in
15:25
any one of your nodes hey we see that when you’re going into a large scale architecture uh that you have a lot of
15:32
traffic with in the cluster of DNS and then you want to Cache it on the Node level which is interesting
15:41
jumping a little bit deeper let’s talk about the flow like what actually happens when one of our services one of
15:49
our pods need another service in the cluster so we got the billing service and the credit card service both of them
15:55
are running in the same cluster but the pods on the credit card service they are at party you don’t know where they are
16:01
you just want to eat the service endpoint and get the results so what happens is that in the cluster
16:08
we have cluster core DNS which always communicate with the adcd to understand
16:14
the IP of the Pod where they are running ads so when a request flows through the
16:19
service then the cluster core DNS can standard the billing service the right
16:24
AP and everything will flow into the internal networking of the cluster it
16:31
doesn’t matter if a overlay Network or the cloud provider Network everything Flows at the network of the cluster
16:37
itself and this is internal very basic uh request the service as for request core
16:46
DNS reply and then you eat the right service support
16:51
but what happens when our pod need to go outside of the cluster so our billing service not need only the
16:59
other service that we got it may also need some banking service on another cluster another external service maybe
17:07
it’s a different company that you pay for them in order to use this API
17:12
so then you need to your coordinates needs to forward the request or tells
17:18
you where the DNS servers to actually help you to find out the answer and actually the
17:25
cluster kubernetes will help your ports to have this configuration in place
17:31
um and when the billing service actually go it will flow into the organization DNS
17:37
like the Google DNS whatever names that you pick name server that you
17:43
pick for that and that’s basically external cluster resolution flow when our pods actually
17:49
aims to go for external service so in the cluster we have the internal
17:55
request and the external request when our service actually requests some information
18:01
from other service but what happens maybe if we are taking
18:06
a look at the other way around we we need some other way to
18:13
[Music] actually for the billing service of someone else
18:18
to able to get an answer from our cluster from our
18:23
service from our pods and sometimes you even don’t know where the pods and the
18:29
service running on and that’s a big problem and large
18:37
this is what you want to achieve and maybe you can elaborate a little bit about how you did achieve that
18:46
[Laughter] maybe you could show the slide of the
18:51
stack ing how did let me solve it almost specifically I solve it yeah
19:00
just to give a overview and a components let’s stack level
19:05
we we have a puberty in the cluster it’s a kubernetes cluster uh we’re running the k3s distribution
19:13
we have active directory we are outside the kubernetes platform
19:20
to a large extent they might just have Microsoft house
19:25
but um so we have an expiratory and we have coordinates uh running in of 450
19:32
net instance or as an approach within this instance so we’re exposing it outside the glossner Via a network
19:38
service object student subject and then we have a very important
19:43
um service workload here Cadence Gateway by some a company called Ori a British AKA
19:53
UK company and then we have of course the internal kubernetes
19:59
service it’s also called DNS instance so exactly accordion is all over the blade
20:05
and then we are considering and I’m reading up on it and so on to see if we
20:11
actually need the node local DNS cache Service as well
20:16
to yeah making is resolving more performance so it doesn’t need to go to
20:21
another node to reach the internal kubernetes part to get something resolved but um
20:28
we might actually not need it because we’re running the qpns service as a demon set on highly available clusters
20:34
so we actually have a recording as part and all nodes so it’s like not needed so
20:40
yeah that was the the stack if you go to the next one then
20:47
um with this back let me go through the flow of the um of the DNS packet on the
20:56
UDP protocol um I think this is the best way and easiest
21:02
way to do it so let’s say we have some service called uh for my service on the
21:08
production tier of uh qnemly and Dot TLD stands for top level domain we have a
21:15
specific domain assigned to the kubernetes platform that our approach
21:21
within this instance on the production all the environments we have we have in the Forge business instance
21:27
um servicing that here for uh for DNS we’re solving from the outside
21:33
so on to active directory um setup we have in namely we have
21:39
to a Zone per um environment finding sort of 40 Fitness service on the management
21:46
cluster for that environment and the management class they have some specific workloads that honor different services
21:53
that the entire fear slash environment needs from kubernetes to service the downstreams
22:01
that’s worker classes so a package for buying service that
22:08
product TLD goes to on let’s say guys computer goes to uh
22:14
to the active directory um and that server because the active dnf the actors with DNS were showers are
22:21
registered or configured on your laptop to be the ones that you want to uh to get in DNS resolve into program
22:28
and the active directory setup says yeah that’s cool I know where this domain is
22:33
but I’m not approachable word because that’s delegated to over here and over here is the production management
22:39
cluster and now we are hitting the approach within this instance of Dimension
22:45
cluster and this is a coordinate instance um that is configured to be authority
22:52
over um all tld.prod dot whatever shop domain
23:01
right so um it says thank you I’m receiving a DNS
23:07
requests to resolve this if you didn’t but it goes through its uh you know the
23:14
accordion is configuration and it’s you know if you read up in the
23:19
code in this documentation you can figure out how it’s it’s passing this configuration so it finds the the most
23:27
significant uh hit in the configuration so the matching the most matching uh uh Stone
23:35
in its server zone for for the for my service.tld
23:40
and now we hit that one and in there we have configured a external coordinates
23:46
plugin in the quality Network we have internal plugins and external ones
23:52
and this external plugin is so it’s made by some company uh open source plugin
23:58
given back to the coordinates project it’s called pan out and what it does is
24:06
very basic but very nice so let’s say I have a DNS server I’m a
24:12
DNS server instance running on Downstream clusters why and guy is a downstream cluster uh it’s a downstream
24:19
DNS server running on cluster X so someone trying to reach for my
24:25
service to be one was then hitting actual directory the product management
24:30
cluster Authority internet server and that DNS request now hits all 1000 plus
24:37
on the test or the sorry in this case the production tier so both the X and Y
24:44
Downstream cluster in this case receives this request
24:51
it’s configured to uh with the external IP of the Cadence Gateway code DNS
24:58
instance by this company called Ori o r i
25:06
um and what this does is it’s really cool really cool service it it takes the
25:11
request and then it looks up uh request or looks up the kubernetes API
25:18
is very it’s very uh service type of the type load balancer or increase
25:25
with this uh name so we can resolve the name by different you can you can put an
25:31
accordion as hostname annotation on your service load balancer or increase or it
25:36
can be the actual host name if it’s an Ingress or it can be a combination of you know
25:43
service name but namespace so on uh and if
25:48
if if if the if if there’s no service of the type Loop answer increase on the
25:54
downstream cluster let’s say in this case it’s uh the cluster X that doesn’t have the for my service workflow
26:03
it will be a edx domain which is a rhc DNS uh response problem response and but
26:11
the other Downstream plus the why hey hey this Gateway coding net service found a service
26:18
with the ipv4 of this for my service and we charge that response to the
26:25
Authority image server on the management cluster for the production tier and that
26:30
is then returned to the end user so that’s the Journey of a DNS packet uh
26:38
trying to be resolved to a itv4 on the
26:43
production tier of kubernembly that is a long journey uh explaining it
26:48
but it it happens really quick that was one of the things I was
26:54
concerned about with the fan out plugin because you’re stealing this inquiry to to potentially a lot of Downstream
27:00
Clauses we have four maybe on one of the tiers so it’s
27:06
not that heavy but it’s pretty fast because you have dnfs UDP
27:12
and it’s a very small package and then of course
27:17
these queries there’s cash involved when it comes to DNS so it’s not counting the
27:24
Clusters too hard so we’re okay but the fan of talking is actually the
27:31
plugin that made it possible to tally abstract the way what cluster for
27:37
my service is running on because now you’re just yeah for my service but product CLD and
27:44
it every Downstream cluster will be queried and the one actually having the service will now respond
27:50
but like for the end use because yeah you don’t have the cluster specific part of the name
27:56
and you could just reach your service bill it’s automated and it’s nice for us it’s platform operators because we can
28:04
we are more free to move the service to another cluster for example if Downstream cluster why is having a bad
28:09
day we can move it to Cluster X if validated this in this case if we really
28:14
think that it’s important the GTL is 300 on our records and so or wait for the
28:21
cash to be invalidated itself and then now the the response will go to Trust Banks so that’s also nice
28:28
feature for us to have that’s that’s really impressive like the
28:33
ability to use this plugin in order to get response for where where the actual
28:39
service is really is is a key feature in this solution but how it looks from the
28:45
developer perspective like when they trigger the CI CD they don’t care where it runs
28:51
no um so we’re using uh we like to go by
28:56
the githubs paradigm so there’s an um part OCD that’s the product we ended
29:03
up with on um all clusters and yeah it’s the usual you know you
29:09
configured to to hit your integrate with your versioning management system
29:14
and so on and and they get a cluster assigned to do that
29:19
and then then they hit the and then Argo CD will just you know deploy the workload a part of that workload is
29:26
either low balancer type service mostly it increased because most of the services it’s restful apis
29:34
um and that interest will just you know have a sqdn resulting in an English
29:41
object being created and that’s it because now user whoever
29:48
and remember that students type it in your browser whatever and it will follow this
29:54
packet this journey of the package and now it’s on will reach Downstream cluster X if
30:02
the service occurring elephants with the QQ if creating an elephant but proper
30:08
TLD you get it after after it’s deployed and the increases object is created
30:15
it’s working wow wow in just a matter of seconds from the deploy time to the time that
30:21
you get your rally they can click on it basically exactly yeah amazing we have
30:26
one question from the audience so the question is what was the main
30:33
reason for having two clusters okay we have more than two clusters
30:39
actually we have we have as many as we want it was basically uh
30:47
a capability of one that I didn’t want to be limited by amount of Clauses
30:53
because I like um
31:00
the capability of you know shifting things away from from
31:05
one cluster to another there might be some you know security reasons there might be
31:14
the phone was reasoned other isolation reasons this this this
31:20
fundamental capability gives me now the option to say oh maybe I should have a
31:27
staple set cluster only because we all know staple said is that yeah it’s a little more challenging
31:33
right so if I’m if I commit if I dedicate my this cluster to staple
31:39
workloads at least this is all this is the only class that where I know that are it’s extra cumbersome because you’re
31:46
wanting you’re running this design they have this specific works and all that so
31:51
that’s some of the reasons um and also if you want to scale and we want to
31:58
scale you know all to scale with nodes as well as Parts HPA vpa but it’s really
32:04
if you if you need to scale with like huge like say 100 gigabyte
32:10
30 CPUs type notes it it’s going to take a longer time it’s going to be heavier
32:16
cluster and so we try to have smaller nodes also because if you have huge nodes you
32:22
have more parts it will take longer drain longer time to scale so if you can have smaller nodes yes you could have
32:29
more uh nodes in one cluster but still you can you have more wiggle
32:34
room moving these workloads around when you have uh yeah
32:40
this this feature of running as many Clauses as you need
32:46
I I think that that when you’re talking about uh highly availability and it
32:52
would collect the flexibility of our environment this is where multi-cluster or things like that are taking in place
32:59
but I would say that when you’re building a platform we see that it was
33:05
not very common before but now even on development and staging uh teams used to
33:11
have multiple clusters because you don’t want to get this message or in slack on
33:17
teams like please do not merge the staging cluster is a little bit broken I will fix it and then you will merge
33:24
afternoon you want this velocity and the flexibility of moving one service or
33:30
testing it on another cluster without affecting on anything on your testing
33:35
system or the CISD pipeline give you this availability and and that’s super
33:41
interesting and I think we see that trend of even multi-cluster on staging in lower environment as well
33:48
yeah and then you could go totally crazy with with a project like v-cluster where you have clusters within the cluster
33:54
yeah yeah I I would say that we um a few months ago we did a webinar only
34:02
about virtual cluster and the ability to spin up ephemeral environment and use
34:08
Commodore for that in order to give access to people in that if you want it’s it’s on the YouTube channel
34:15
and just before we are downloads anything else
34:21
uh um yeah I mean I can I can talk about this
34:29
very beautiful incident if if there is isn’t any more questions
34:34
I could elaborate on that incident and what we learned from it we don’t
34:40
have any at the moment if someone have questions but drop it as you did
34:46
drop it onto one um so
34:52
a little while ago we are fortunate I was so wise to choose to Let’s Have Some
34:59
Testing on that thing you know extra cash so we had a another external plugin
35:05
called redisk r-e-d-i-s-c which basically gives you the
35:11
opportunity with the with coordinates is an external coordinates plugin to half
35:18
store DNS cache in a radius cluster and
35:24
when when it worked it was pretty pretty cool because queries would just be of course entered
35:30
from the the cache of the approach to the news server so it wouldn’t even go to the downstream clusters
35:37
then one day something happened that something was the 01c Killer
35:45
and uh that killed at some point the CSI
35:50
manager the manager of the CSI we use and that killed the the storage for the
35:57
written now you have no cash but I had with the documentation and said yeah I thought even in this case it
36:05
would be a no-off as you say no operation it would just be go on and query the downstream clusters like
36:11
really query them Thursday okay that’s that’s okay that’s a pretty robust system but so it turns out that it’s only a no
36:19
op if red is is not available when coding it is started up
36:25
so if it started off with the radius cache it would try to hit the radius cache and then find out after a nice
36:32
turn on timeout up many hundred seconds that ah radius is not available
36:56
so that made the the life of things really miserable because it ended up
37:02
because this is the this incident took what’s happening over some hours that we
37:07
have very short-lived details on our certificates we can come back today in another day if that’s the subject we
37:13
want to talk about but this is very shortly CTL in after 24 hours means that
37:18
if something is um yeah going on for too long and it can’t resolve that
37:23
the ca that we’re running and then your certificate is not renewed resulting in
37:30
some specific domains not being certificate not being renewed for these domains and people not being able to put
37:36
items in their basket on email.com so that was pretty great so um
37:42
what we learned from that was to not run this plugin because
37:47
it’s pretty shitty to to have the cash go down in these
37:55
prolonged timeouts um but then when I dig into the code of the plugin I found that oh the way it’s
38:01
actually it’s using some go libraries to actually integrate with radius and and and that
38:07
was deprecated like one and a half years ago or well yeah so um
38:13
and then I thought I actually reach out to this to to this black community of the coordinates uh
38:20
Korea’s Channel I think it’s on cncs community on slack and ask what do other
38:26
people do here and reach out to to some of the maintainers and it was pretty
38:31
much like yeah you don’t really need this cash thing I mean so now we we we’re okay with the we we
38:38
have several instances several replicas of the authorities in it and we just use the internal cache
38:45
plugin so the the pot the one replica one of the all the replicas will have their own internal
38:50
cache and it works it works beautifully uh of course the thing is that maybe you have
38:58
the cash in this replica but the but because of load balancing this replica was hit
39:04
and that part did not have the Dennis record in cash so now it goes to answer but
39:09
we’re not seeing any performance issues with that so far at least
39:15
so um yeah I guess it’s a pasture things really well in in all
39:21
cases it’s also a don’t over engineer when it’s not needed
39:27
um I think that’s some of the learnings we we have with with that and we went way away from it and
39:35
yeah I think I’m all happy now I think it’s a it’s an amazing lesson
39:40
learned with the fact that first of all when you’re using some open source
39:46
plugins the downside they can get deprecated or not useful and you need to keep in Pace when you’re using Cloud
39:51
native software but on the other side it’s really nice that when you ask something the community it’s actually
39:58
there to help you unfortunately it was after the incident and you learned a lot from it but it’s
40:04
good that you have someone to go ask and get a real response in action yeah sure
40:10
so before we are done we have one last question from the audience
40:15
[Applause] um they ask about like what is a
40:20
performance impact of this kind of multi-class architecture so what I’m
40:26
saying is is multiple cluster of DNS the final of the plugins it’s it’s not
40:31
always the obvious one maybe you can explain a little bit if there is and what is it the performance if the impact
40:38
of this kind of architecture yeah so so um
40:44
very clearingly we are using more resources because we have several
40:50
control planes um and we have two options we have the
40:55
highly available cluster and we have the I call it the app cluster because it’s it’s um
41:01
would only be running low low risk uh and
41:07
not highly uh workloads needing high high
41:13
performance uh and those app classes would only have one control plane node
41:18
it’s it’s a I don’t think we actually have that with other kubernetes distributions than k3s not that I know
41:26
of at least it’s an option where you can run a single control node uh API
41:31
activities control pane yeah so we have these two options we only have one cluster running the app cluster version
41:38
of things um to save resources and because it’s only
41:43
running a specific performance workload who work three times a day but for the
41:50
other sources the more regular ones where we want the uptime the robustness the the the full stay in your domain
41:56
Promises of the distributed system that kubernetes is yes we are using some more resources
42:02
because we have you know at least you need three notes uh on the control plane
42:08
side so we have these extra nodes but besides that because of the advantages of the
42:14
capabilities it gives us spreading out load um um spreading out the stable domain of
42:20
the black radius of something going totally bunkers on some cluster it won’t touch the workers and other clusters and
42:28
these isolations uh I think this isolation capability gives us is pretty nice
42:34
um and also because it’s kubernetes yes I mean it’s because it’s k3s the you
42:43
can have etcb embedded on the control play notes um
42:48
and case queries in general is really lightweight um but I am looking into both other
42:55
distributions but also ways of having
43:02
um multiple clusters more integrated in a high in in a more
43:09
tightly way so you you could have one master control plane and some more
43:15
knee-jerk uh Downstream clusters with less of it thinks it has a control plane
43:21
but it’s more or less controlled by this Puppet Master up here and you have different projects out
43:27
there and I’m looking into different ones also you could do something like the clustermist from uh from Ice
43:32
surveillance the creators of psyllium so I think I’ve answered the questions
43:38
to some extent at least yeah yeah that’s amazing uh so before we are
43:46
done Lars I really want to thank you for joining us and sharing your knowledge
43:52
the challenges that you add and the solution that you put in place we’re really trying to bring in people that
43:58
will share with their own experience knowledge that is not commonly shareable in the internet and not really uh is to
44:05
find so thank you very much for your time thank you everyone for joining us thank you for the opportunity
44:12
I hope you learned something I learned a lot and I I know that everyone that was live and we are going
44:18
to stream it live or upload it to YouTube they will be able to watch it again and learn more from your DNS
44:26
Journey thank you bye everyone
and start using Komodor in seconds!