Home
Resource library
Webinars
Kubernemlig’s Multi-Cluster DNS Setup

Kubernemlig’s Multi-Cluster DNS Setup

Lars Bengtsson

Guy Menahem

You can also view the full presentation deck here.

*This is an auto-generated transcript

1:45

enemy and we have built a multi-cluster on-prem platform that we call Jupiter

1:51

namely because platforms needs names and

1:57

today we’re going to talk about DNS and more specifically multi-cluster than

2:03

ever and how we’ve try to solve that it’s amazing it’s amazing how you made

2:09

the company and actually are your name are you name the product and the internal platform that allows you to put

2:16

in great solution I think on-prem is always something that interesting people to hear because most of the knowledge

2:22

out there isn’t mostly on just Cloud public clouds and going into yeah

2:29

that’s very interesting and so before we are deep dying maybe let maybe explain a

2:35

little bit about the challenges that you had for sure um yes so uh I joined about two years

2:43

ago and uh yeah let’s on this devops journey whatever that really is and I

2:49

tried to look at what are the pain points of the company currently um one thing I saw is that the way we

2:57

managed and operated I.T infrastructure and software we developed

3:03

our social media also had to operate in regard to running the business in general from

3:08

third-party windows I saw that it was way it was very scattered around

3:14

virtual machines bare metal servers a bit of containers but the containers

3:19

running in a on something we call I don’t even know if that’s a term out there but we call

3:24

it Docker hosts so yeah Dr container containers with the dagger uh engine and

3:30

but running on single hosts for some environment so no though really robust uptime and so on so

3:37

I thought uh what can I do to streamline uh the way we run things uh so I set out to

3:45

create a platform based on kubernetes as we know it’s a

3:51

distributed system for orchestrating containers and because we for one thing

3:56

all News software we developed namely we we do it for containers anyway that was

4:03

already a theme before game so I thought hey we’re always doing this for new

4:09

things and pretty for some years and also the the older stuff was also being developed to run in

4:16

in a container so far we need to orchestrate this and also create a

4:21

platform uh on that on in on the way that can honor and run like the general

4:27

things components software and so on we need to run and there you go with people

4:32

namely and one of the challenges was then to actually get DNS riggered in an

4:37

automated way and when I started it it was a user supposed to get the email and then

4:46

he waited and then then I say up to four days but I actually experienced that it

4:51

took up to four days because there’s some first level handing in the world you

4:57

know that the old ticket gas ticket Bingo or whatever we call it these days

5:03

uh so it could take many days totally unproductive days of you know okay then

5:10

you figure out something else and you but he’s we’re still waiting for this DNS record which is a really simple

5:16

thing just to to connect to your service running somewhere right and besides that you also pinpoint here

5:22

it’s it was very manual and Automation and all

5:28

um and then in a container orchestrated world

5:33

and with kubernetes you really have to um

5:40

be in control and being able to know what’s going on when

5:45

it comes to DNS because of kubernetes has always been this right you have CNN so many layers you have the cube DNS

5:52

service you have uh nodes local DNS you have outside the

6:00

name and your course if you have stuff coming in for the public incident you also have you know public uh the public

6:06

DNS infrastructure so there’s a lot of layers of DNA so I wanted um a robust and performance and

6:13

manageable way of yeah operating a dnf service for the different needs we haven’t had at the

6:20

time um so yeah I think that’s I I think it’s really amazing because

6:27

DNS is usually the things that back in the days someone will open a ticket and

6:34

someone else will like edit manually the the configuration file to add the

6:39

records that you wanted and when you come to kubernetes or Docker when everything is is more fast everything

6:45

spawn go that and you need this service Discovery ability quick and fast it’s

6:51

becoming necessary to make it shorter and faster on the other side we know that DNS

6:58

problems are actually I think the second most common problem to causing

7:04

infrastructure down downtime because when you don’t have DNS you don’t have anything so it’s also this critical component

7:12

that when you don’t have it right like your all infrastructure is going down yeah and maybe if we have time I could

7:20

come into a really we had the perfect storm of uh of of DNS failure that that

7:27

really cascaded into like certificate certificates not renewing

7:34

so uh yeah that was nice incidental or a nice uh in hindsight

7:41

yeah yeah if you learn from them they’re really good I agree I agree so let’s talk maybe a

7:48

little bit about the goals what you’re trying to achieve after you discover all these challenges and problems

7:56

yes um yeah so we set out the with them The Horizon that we just talked about

8:02

the the scene which is the elaborated on we’d set out to achieve these goals of all too many DNS making it a

8:10

out of the box platform service on kubernembly um and pretty much something that that is

8:17

there and we of course we have to update the components of the DNS packets we have but we also

8:25

don’t have to you know push push it along every day and nurture it so that it it should be somewhat

8:32

a pretty robust and live by itself uh in in most cases

8:37

and also we didn’t later on found that oh okay we started out with some fully

8:42

qualified domain names of X Dot

8:49

plus the part of the name that whatever soft domain we’re in control of and and we wanted to abstract away uh to the

8:56

developers what cluster your service is actually running on because one they don’t care two it’s extra

9:04

characters you type in in your browser uh Fields search panel

9:11

uh it’s also it doesn’t it it doesn’t

9:16

give meaning to them what is this this whatever across the water but it was like I’m okay I know I’m it’s in my

9:24

service in test but I don’t I don’t care watch server specifically I just want to hit this thing and work with the right

9:31

so that was also a part of uh of our journey later on

9:37

and some of these decisions uh have to be very pragmatic because starting out

9:43

it was it was me and two uh one one uh one student at a

9:50

university if so only 15 hours a week and another one that was pretty green for you on a green field but green and

9:57

new in in the world’s Cloud native and kubernetes so I had to make some decisions that

10:03

okay this really have to be able to live on its own and be

10:08

their hands off and as I write here I think that’s interesting because

10:15

sometimes when people think about climb Cloud native maybe an organization that

10:20

started on cloud native it’s it’s it’s pretty nice but when you take a look a

10:26

look about organization that haven’t started them and then you need to create everything on your own is becoming like

10:32

you’re doing everything on your own and and changing a lot of things within

10:38

the organization that are not related to technology and and that’s and that’s amazing and I

10:45

think the challenges that you break through in terms of bringing a new platform to the organization and also

10:51

solving each part of it on your own and with low resources that’s amazing it’s

10:58

really amazing I agree and and one thing with we haven’t mentioned here is but I

11:03

just think of now that we talk about it is low resources also in the same sense of

11:09

Finance like like you know budget uh

11:14

there wasn’t really any you know budget for buying all types of software so I

11:21

quickly saw and and concluded that what we do needs to be open source and yes we

11:29

want to give back to certain things and and I’ve been active in different uh projects to give the feedback or

11:37

maybe a pull request or whatever but it needs to be something we could start out with and then maybe you know buy the

11:44

Enterprise option or whatever and I think DNS is like a good example

11:49

because core DNS is one of the top uh commonly adopted open source DNS

11:58

solution I think like most or almost all of the deployment of kubernetes I know

12:03

with a lot of our customers rely on core DNS and it’s amazing that you pick this

12:09

open source solution which is a core component infrastructure which 10 years ago you don’t I’m not sure if people

12:16

would intend to put this infrastructure on open source solution you are able to

12:22

bring in uh by the way if someone got any

12:27

questions feel free to drop it on the chat we will be happy to have to to

12:34

answer the question and help you understand much better uh about mem League goals challenges and what we are

12:41

going to discuss later in this talk

12:47

so maybe let’s jump in for the people that are not DNS expert DNS is like it’s

12:52

complex and as you said you have multiple configuration multiple layers within the node that side of the node in

12:59

the cluster outside of the cluster so maybe we will make sure that everyone

13:05

gets the same basic of understanding about DNS and then we will be able to show what solution did you put in place

13:13

in order to get this multi-cluster DNS and it’s not only about that it’s more

13:21

cluster I think what you mentioned in here the the fact that it’s fast the fact that it’s Dynamic and the fact that

13:28

developers just don’t care whether it’s run on that’s give the power to people

13:35

that don’t need to understand and learn and put in a lot of like tribal knowledge

13:42

so let’s jump in about kubernetes and DNS at its basic

13:49

so why we should even need a DNS within the cluster

13:54

so I would start with the fact that any kubernetes cluster got some sort of DNS

14:02

um why is that first of all you need the internal Dynamic resolution pods are going up or down services are

14:08

distributed a podder distributed and you need this fast resolution because it’s

14:14

so Dynamic you need something that will help your services to resolve it even

14:20

just internally to understand where the relevant pods water relevant IP and it also relates to

14:27

service discovery which helps you to understand I don’t care about how many pods are behind this service I just want

14:33

to eat a service make sure I get it in place but all of that is just inside the

14:39

cluster when we are talking about outside of the cluster it’s a different level and Lars are going to explain what

14:45

are the challenges there later the next thing is Cash you want to Cache

14:51

your request because it will make you anything the system will flow faster and upframe forwarding if you don’t have the

14:58

the right service you definitely want to forward it to some other DNS that should

15:05

have the answer for it so this is why and and there are many reasons for it

15:10

but those are the four main reasons why you should have DNS within the cluster uh I would say that in some

15:18

architectural last reference before uh you you have like a DNS local agent in

15:25

any one of your nodes hey we see that when you’re going into a large scale architecture uh that you have a lot of

15:32

traffic with in the cluster of DNS and then you want to Cache it on the Node level which is interesting

15:41

jumping a little bit deeper let’s talk about the flow like what actually happens when one of our services one of

15:49

our pods need another service in the cluster so we got the billing service and the credit card service both of them

15:55

are running in the same cluster but the pods on the credit card service they are at party you don’t know where they are

16:01

you just want to eat the service endpoint and get the results so what happens is that in the cluster

16:08

we have cluster core DNS which always communicate with the adcd to understand

16:14

the IP of the Pod where they are running ads so when a request flows through the

16:19

service then the cluster core DNS can standard the billing service the right

16:24

AP and everything will flow into the internal networking of the cluster it

16:31

doesn’t matter if a overlay Network or the cloud provider Network everything Flows at the network of the cluster

16:37

itself and this is internal very basic uh request the service as for request core

16:46

DNS reply and then you eat the right service support

16:51

but what happens when our pod need to go outside of the cluster so our billing service not need only the

16:59

other service that we got it may also need some banking service on another cluster another external service maybe

17:07

it’s a different company that you pay for them in order to use this API

17:12

so then you need to your coordinates needs to forward the request or tells

17:18

you where the DNS servers to actually help you to find out the answer and actually the

17:25

cluster kubernetes will help your ports to have this configuration in place

17:31

um and when the billing service actually go it will flow into the organization DNS

17:37

like the Google DNS whatever names that you pick name server that you

17:43

pick for that and that’s basically external cluster resolution flow when our pods actually

17:49

aims to go for external service so in the cluster we have the internal

17:55

request and the external request when our service actually requests some information

18:01

from other service but what happens maybe if we are taking

18:06

a look at the other way around we we need some other way to

18:13

[Music] actually for the billing service of someone else

18:18

to able to get an answer from our cluster from our

18:23

service from our pods and sometimes you even don’t know where the pods and the

18:29

service running on and that’s a big problem and large

18:37

this is what you want to achieve and maybe you can elaborate a little bit about how you did achieve that

18:46

[Laughter] maybe you could show the slide of the

18:51

stack ing how did let me solve it almost specifically I solve it yeah

19:00

just to give a overview and a components let’s stack level

19:05

we we have a puberty in the cluster it’s a kubernetes cluster uh we’re running the k3s distribution

19:13

we have active directory we are outside the kubernetes platform

19:20

to a large extent they might just have Microsoft house

19:25

but um so we have an expiratory and we have coordinates uh running in of 450

19:32

net instance or as an approach within this instance so we’re exposing it outside the glossner Via a network

19:38

service object student subject and then we have a very important

19:43

um service workload here Cadence Gateway by some a company called Ori a British AKA

19:53

UK company and then we have of course the internal kubernetes

19:59

service it’s also called DNS instance so exactly accordion is all over the blade

20:05

and then we are considering and I’m reading up on it and so on to see if we

20:11

actually need the node local DNS cache Service as well

20:16

to yeah making is resolving more performance so it doesn’t need to go to

20:21

another node to reach the internal kubernetes part to get something resolved but um

20:28

we might actually not need it because we’re running the qpns service as a demon set on highly available clusters

20:34

so we actually have a recording as part and all nodes so it’s like not needed so

20:40

yeah that was the the stack if you go to the next one then

20:47

um with this back let me go through the flow of the um of the DNS packet on the

20:56

UDP protocol um I think this is the best way and easiest

21:02

way to do it so let’s say we have some service called uh for my service on the

21:08

production tier of uh qnemly and Dot TLD stands for top level domain we have a

21:15

specific domain assigned to the kubernetes platform that our approach

21:21

within this instance on the production all the environments we have we have in the Forge business instance

21:27

um servicing that here for uh for DNS we’re solving from the outside

21:33

so on to active directory um setup we have in namely we have

21:39

to a Zone per um environment finding sort of 40 Fitness service on the management

21:46

cluster for that environment and the management class they have some specific workloads that honor different services

21:53

that the entire fear slash environment needs from kubernetes to service the downstreams

22:01

that’s worker classes so a package for buying service that

22:08

product TLD goes to on let’s say guys computer goes to uh

22:14

to the active directory um and that server because the active dnf the actors with DNS were showers are

22:21

registered or configured on your laptop to be the ones that you want to uh to get in DNS resolve into program

22:28

and the active directory setup says yeah that’s cool I know where this domain is

22:33

but I’m not approachable word because that’s delegated to over here and over here is the production management

22:39

cluster and now we are hitting the approach within this instance of Dimension

22:45

cluster and this is a coordinate instance um that is configured to be authority

22:52

over um all tld.prod dot whatever shop domain

23:01

right so um it says thank you I’m receiving a DNS

23:07

requests to resolve this if you didn’t but it goes through its uh you know the

23:14

accordion is configuration and it’s you know if you read up in the

23:19

code in this documentation you can figure out how it’s it’s passing this configuration so it finds the the most

23:27

significant uh hit in the configuration so the matching the most matching uh uh Stone

23:35

in its server zone for for the for my service.tld

23:40

and now we hit that one and in there we have configured a external coordinates

23:46

plugin in the quality Network we have internal plugins and external ones

23:52

and this external plugin is so it’s made by some company uh open source plugin

23:58

given back to the coordinates project it’s called pan out and what it does is

24:06

very basic but very nice so let’s say I have a DNS server I’m a

24:12

DNS server instance running on Downstream clusters why and guy is a downstream cluster uh it’s a downstream

24:19

DNS server running on cluster X so someone trying to reach for my

24:25

service to be one was then hitting actual directory the product management

24:30

cluster Authority internet server and that DNS request now hits all 1000 plus

24:37

on the test or the sorry in this case the production tier so both the X and Y

24:44

Downstream cluster in this case receives this request

24:51

it’s configured to uh with the external IP of the Cadence Gateway code DNS

24:58

instance by this company called Ori o r i

25:06

um and what this does is it’s really cool really cool service it it takes the

25:11

request and then it looks up uh request or looks up the kubernetes API

25:18

is very it’s very uh service type of the type load balancer or increase

25:25

with this uh name so we can resolve the name by different you can you can put an

25:31

accordion as hostname annotation on your service load balancer or increase or it

25:36

can be the actual host name if it’s an Ingress or it can be a combination of you know

25:43

service name but namespace so on uh and if

25:48

if if if the if if there’s no service of the type Loop answer increase on the

25:54

downstream cluster let’s say in this case it’s uh the cluster X that doesn’t have the for my service workflow

26:03

it will be a edx domain which is a rhc DNS uh response problem response and but

26:11

the other Downstream plus the why hey hey this Gateway coding net service found a service

26:18

with the ipv4 of this for my service and we charge that response to the

26:25

Authority image server on the management cluster for the production tier and that

26:30

is then returned to the end user so that’s the Journey of a DNS packet uh

26:38

trying to be resolved to a itv4 on the

26:43

production tier of kubernembly that is a long journey uh explaining it

26:48

but it it happens really quick that was one of the things I was

26:54

concerned about with the fan out plugin because you’re stealing this inquiry to to potentially a lot of Downstream

27:00

Clauses we have four maybe on one of the tiers so it’s

27:06

not that heavy but it’s pretty fast because you have dnfs UDP

27:12

and it’s a very small package and then of course

27:17

these queries there’s cash involved when it comes to DNS so it’s not counting the

27:24

Clusters too hard so we’re okay but the fan of talking is actually the

27:31

plugin that made it possible to tally abstract the way what cluster for

27:37

my service is running on because now you’re just yeah for my service but product CLD and

27:44

it every Downstream cluster will be queried and the one actually having the service will now respond

27:50

but like for the end use because yeah you don’t have the cluster specific part of the name

27:56

and you could just reach your service bill it’s automated and it’s nice for us it’s platform operators because we can

28:04

we are more free to move the service to another cluster for example if Downstream cluster why is having a bad

28:09

day we can move it to Cluster X if validated this in this case if we really

28:14

think that it’s important the GTL is 300 on our records and so or wait for the

28:21

cash to be invalidated itself and then now the the response will go to Trust Banks so that’s also nice

28:28

feature for us to have that’s that’s really impressive like the

28:33

ability to use this plugin in order to get response for where where the actual

28:39

service is really is is a key feature in this solution but how it looks from the

28:45

developer perspective like when they trigger the CI CD they don’t care where it runs

28:51

no um so we’re using uh we like to go by

28:56

the githubs paradigm so there’s an um part OCD that’s the product we ended

29:03

up with on um all clusters and yeah it’s the usual you know you

29:09

configured to to hit your integrate with your versioning management system

29:14

and so on and and they get a cluster assigned to do that

29:19

and then then they hit the and then Argo CD will just you know deploy the workload a part of that workload is

29:26

either low balancer type service mostly it increased because most of the services it’s restful apis

29:34

um and that interest will just you know have a sqdn resulting in an English

29:41

object being created and that’s it because now user whoever

29:48

and remember that students type it in your browser whatever and it will follow this

29:54

packet this journey of the package and now it’s on will reach Downstream cluster X if

30:02

the service occurring elephants with the QQ if creating an elephant but proper

30:08

TLD you get it after after it’s deployed and the increases object is created

30:15

it’s working wow wow in just a matter of seconds from the deploy time to the time that

30:21

you get your rally they can click on it basically exactly yeah amazing we have

30:26

one question from the audience so the question is what was the main

30:33

reason for having two clusters okay we have more than two clusters

30:39

actually we have we have as many as we want it was basically uh

30:47

a capability of one that I didn’t want to be limited by amount of Clauses

30:53

because I like um

31:00

the capability of you know shifting things away from from

31:05

one cluster to another there might be some you know security reasons there might be

31:14

the phone was reasoned other isolation reasons this this this

31:20

fundamental capability gives me now the option to say oh maybe I should have a

31:27

staple set cluster only because we all know staple said is that yeah it’s a little more challenging

31:33

right so if I’m if I commit if I dedicate my this cluster to staple

31:39

workloads at least this is all this is the only class that where I know that are it’s extra cumbersome because you’re

31:46

wanting you’re running this design they have this specific works and all that so

31:51

that’s some of the reasons um and also if you want to scale and we want to

31:58

scale you know all to scale with nodes as well as Parts HPA vpa but it’s really

32:04

if you if you need to scale with like huge like say 100 gigabyte

32:10

30 CPUs type notes it it’s going to take a longer time it’s going to be heavier

32:16

cluster and so we try to have smaller nodes also because if you have huge nodes you

32:22

have more parts it will take longer drain longer time to scale so if you can have smaller nodes yes you could have

32:29

more uh nodes in one cluster but still you can you have more wiggle

32:34

room moving these workloads around when you have uh yeah

32:40

this this feature of running as many Clauses as you need

32:46

I I think that that when you’re talking about uh highly availability and it

32:52

would collect the flexibility of our environment this is where multi-cluster or things like that are taking in place

32:59

but I would say that when you’re building a platform we see that it was

33:05

not very common before but now even on development and staging uh teams used to

33:11

have multiple clusters because you don’t want to get this message or in slack on

33:17

teams like please do not merge the staging cluster is a little bit broken I will fix it and then you will merge

33:24

afternoon you want this velocity and the flexibility of moving one service or

33:30

testing it on another cluster without affecting on anything on your testing

33:35

system or the CISD pipeline give you this availability and and that’s super

33:41

interesting and I think we see that trend of even multi-cluster on staging in lower environment as well

33:48

yeah and then you could go totally crazy with with a project like v-cluster where you have clusters within the cluster

33:54

yeah yeah I I would say that we um a few months ago we did a webinar only

34:02

about virtual cluster and the ability to spin up ephemeral environment and use

34:08

Commodore for that in order to give access to people in that if you want it’s it’s on the YouTube channel

34:15

and just before we are downloads anything else

34:21

uh um yeah I mean I can I can talk about this

34:29

very beautiful incident if if there is isn’t any more questions

34:34

I could elaborate on that incident and what we learned from it we don’t

34:40

have any at the moment if someone have questions but drop it as you did

34:46

drop it onto one um so

34:52

a little while ago we are fortunate I was so wise to choose to Let’s Have Some

34:59

Testing on that thing you know extra cash so we had a another external plugin

35:05

called redisk r-e-d-i-s-c which basically gives you the

35:11

opportunity with the with coordinates is an external coordinates plugin to half

35:18

store DNS cache in a radius cluster and

35:24

when when it worked it was pretty pretty cool because queries would just be of course entered

35:30

from the the cache of the approach to the news server so it wouldn’t even go to the downstream clusters

35:37

then one day something happened that something was the 01c Killer

35:45

and uh that killed at some point the CSI

35:50

manager the manager of the CSI we use and that killed the the storage for the

35:57

written now you have no cash but I had with the documentation and said yeah I thought even in this case it

36:05

would be a no-off as you say no operation it would just be go on and query the downstream clusters like

36:11

really query them Thursday okay that’s that’s okay that’s a pretty robust system but so it turns out that it’s only a no

36:19

op if red is is not available when coding it is started up

36:25

so if it started off with the radius cache it would try to hit the radius cache and then find out after a nice

36:32

turn on timeout up many hundred seconds that ah radius is not available

36:56

so that made the the life of things really miserable because it ended up

37:02

because this is the this incident took what’s happening over some hours that we

37:07

have very short-lived details on our certificates we can come back today in another day if that’s the subject we

37:13

want to talk about but this is very shortly CTL in after 24 hours means that

37:18

if something is um yeah going on for too long and it can’t resolve that

37:23

the ca that we’re running and then your certificate is not renewed resulting in

37:30

some specific domains not being certificate not being renewed for these domains and people not being able to put

37:36

items in their basket on email.com so that was pretty great so um

37:42

what we learned from that was to not run this plugin because

37:47

it’s pretty shitty to to have the cash go down in these

37:55

prolonged timeouts um but then when I dig into the code of the plugin I found that oh the way it’s

38:01

actually it’s using some go libraries to actually integrate with radius and and and that

38:07

was deprecated like one and a half years ago or well yeah so um

38:13

and then I thought I actually reach out to this to to this black community of the coordinates uh

38:20

Korea’s Channel I think it’s on cncs community on slack and ask what do other

38:26

people do here and reach out to to some of the maintainers and it was pretty

38:31

much like yeah you don’t really need this cash thing I mean so now we we we’re okay with the we we

38:38

have several instances several replicas of the authorities in it and we just use the internal cache

38:45

plugin so the the pot the one replica one of the all the replicas will have their own internal

38:50

cache and it works it works beautifully uh of course the thing is that maybe you have

38:58

the cash in this replica but the but because of load balancing this replica was hit

39:04

and that part did not have the Dennis record in cash so now it goes to answer but

39:09

we’re not seeing any performance issues with that so far at least

39:15

so um yeah I guess it’s a pasture things really well in in all

39:21

cases it’s also a don’t over engineer when it’s not needed

39:27

um I think that’s some of the learnings we we have with with that and we went way away from it and

39:35

yeah I think I’m all happy now I think it’s a it’s an amazing lesson

39:40

learned with the fact that first of all when you’re using some open source

39:46

plugins the downside they can get deprecated or not useful and you need to keep in Pace when you’re using Cloud

39:51

native software but on the other side it’s really nice that when you ask something the community it’s actually

39:58

there to help you unfortunately it was after the incident and you learned a lot from it but it’s

40:04

good that you have someone to go ask and get a real response in action yeah sure

40:10

so before we are done we have one last question from the audience

40:15

[Applause] um they ask about like what is a

40:20

performance impact of this kind of multi-class architecture so what I’m

40:26

saying is is multiple cluster of DNS the final of the plugins it’s it’s not

40:31

always the obvious one maybe you can explain a little bit if there is and what is it the performance if the impact

40:38

of this kind of architecture yeah so so um

40:44

very clearingly we are using more resources because we have several

40:50

control planes um and we have two options we have the

40:55

highly available cluster and we have the I call it the app cluster because it’s it’s um

41:01

would only be running low low risk uh and

41:07

not highly uh workloads needing high high

41:13

performance uh and those app classes would only have one control plane node

41:18

it’s it’s a I don’t think we actually have that with other kubernetes distributions than k3s not that I know

41:26

of at least it’s an option where you can run a single control node uh API

41:31

activities control pane yeah so we have these two options we only have one cluster running the app cluster version

41:38

of things um to save resources and because it’s only

41:43

running a specific performance workload who work three times a day but for the

41:50

other sources the more regular ones where we want the uptime the robustness the the the full stay in your domain

41:56

Promises of the distributed system that kubernetes is yes we are using some more resources

42:02

because we have you know at least you need three notes uh on the control plane

42:08

side so we have these extra nodes but besides that because of the advantages of the

42:14

capabilities it gives us spreading out load um um spreading out the stable domain of

42:20

the black radius of something going totally bunkers on some cluster it won’t touch the workers and other clusters and

42:28

these isolations uh I think this isolation capability gives us is pretty nice

42:34

um and also because it’s kubernetes yes I mean it’s because it’s k3s the you

42:43

can have etcb embedded on the control play notes um

42:48

and case queries in general is really lightweight um but I am looking into both other

42:55

distributions but also ways of having

43:02

um multiple clusters more integrated in a high in in a more

43:09

tightly way so you you could have one master control plane and some more

43:15

knee-jerk uh Downstream clusters with less of it thinks it has a control plane

43:21

but it’s more or less controlled by this Puppet Master up here and you have different projects out

43:27

there and I’m looking into different ones also you could do something like the clustermist from uh from Ice

43:32

surveillance the creators of psyllium so I think I’ve answered the questions

43:38

to some extent at least yeah yeah that’s amazing uh so before we are

43:46

done Lars I really want to thank you for joining us and sharing your knowledge

43:52

the challenges that you add and the solution that you put in place we’re really trying to bring in people that

43:58

will share with their own experience knowledge that is not commonly shareable in the internet and not really uh is to

44:05

find so thank you very much for your time thank you everyone for joining us thank you for the opportunity

44:12

I hope you learned something I learned a lot and I I know that everyone that was live and we are going

44:18

to stream it live or upload it to YouTube they will be able to watch it again and learn more from your DNS

44:26

Journey thank you bye everyone

Kubernemlig’s Multi-Cluster DNS Setup

Lars Bengtsson

Guy Menahem

Sign up for FREE