Event-driven multicluster autoscaling with Calisti

Friday, April 22nd, 2022
Multi-cluster service meshes have emerged as an architecture pattern to enable high availability, fault isolation, and failover for distributed applications. In my experience, this setup can also empower teams to run services across various cloud providers and on-premise resources. Why would this be advantageous? Well aside from mitigating potential vendor-lock, it can enable teams to optimize discrepancies in infrastructure resource availability, scalability, and cost. Within a hybrid-cloud mesh, for instance, we can manage the tradeoff between scalability and cost by splitting or replicating workloads across on-prem and cloud-hosted clusters.
In this blog, we will explore a dynamic, or event-driven, method for replicating workloads across a multi-cluster service mesh. We will create an event-driven autoscaler, utilizing service-mesh properties and APIs from Calisti in addition to Kubernetes’ client-go. This model may be of particular interest for hybrid cloud meshes to implement cloud-bursting, where demand spikes trigger a burst of on-prem services into the cloud. With this in mind, we will design the autoscaler for a primary/peer service mesh setup. Building upon the concept of Kubernetes' horizontal-pod-autoscaler, we can ingest host-level as well as application-level metrics to inform scaling events.
Section 1: Multi-Cluster Mesh Setup
Let us begin by installing Calisti on our primary and peer clusters, creating a multi-cluster, single mesh control plane.
On our peer cluster:
smm install -a
On our primary cluster:
smm install -a
smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE>
smm istio cluster status
Check out the Calisti docs for more installation and usage details. Once both clusters are attached, forming a single mesh, we can deploy an application spanning the mesh. As we will see, cross-cluster service discovery enables microservices in both clusters to communicate with each other. A feature of the namespace sameness concept, dictating that all services across the clusters are shared by default.
smm demoapp install -s frontpage,catalog,bookings,postgresql
smm -c <PEER_CLUSTER_KUBECONFIG_FILE> demoapp install -s movies,payments,notifications,analytics,database,mysql —peer
Now that our mesh is set up, we can turn to implement the autoscaler. As mentioned, we will utilize two primary building blocks; Calisti, as well as Kubernetes’ client-go. We will define the autoscaler as a control loop with a listener, informer, and work queue. To utilize service-level metrics we will instruct the listener to periodically queries Calisti’s graphql API. The informer will then compare these metrics to scaling and reconciliation policies to determine if our microservices need to be replicated across the mesh.
Section 2: Policies
Before we implement these components we will first define an event policy configuration. Since we aim to prioritize resource availability and cost, our policies could take into account three levels of metrics; provider metrics (cost), host metrics (resource availability), and service metrics (health). For the following examples, we will create a policy for service metrics, namely service requests-per-second.
kind: multi-cluster-autoscaler
metadata:
namespace: smm-demo
spec:
groupVersionKind:
kind: Deployment
selector:
app: bookings
policy:
type: “rps-burst”
burst-value: 100
reconcile-value: 60
throttle-delay: 120
This event policy indicates that our control loop should
watch request-per-second metrics for the bookings
microservice, of type deployment. The event is triggered if
over 100 requests-per-second (rps) are measured for a period
of 120 seconds. At which point the controller should
replicate or burst bookings
into the peer cluster.
Conversely, if the bookings
deployments receive less than
60 rps for 120 seconds, the traffic and microservice should
be scaled back to the primary cluster. Note that these
policies could be implemented as Kubernetes
custom-resource-definitions, but for simplicity, we will
stick with yaml configs.
Section 3: Metrics
For all host and service level metrics we can utilize Calisti's
graphql API. For instance, we can retrieve the
requests-per-second received by the bookings
microservice
by sending a query to http://127.0.0.1:50500/api/graphql
.
Note that the Calisti dashboard must be running to access the
API via localhost.
{
service(name: "bookings", namespace: "smm-demo") {
metrics(evaluationDurationSeconds: 5) {
latencyP50
rps
}
}
}
...
{
"data": {
"service": {
"metrics": {
"latencyP50": 0.08447085452073383,
"rps": 30.281078348468693
}
}
}
}
If our informer determines the policy is met, enqueuing an
event, we will employ our own multi-cluster replication
controller to replicate or reconcile runtime objects across
clusters. Additionally, we will create an Calisti virtual
service and route rule to split application traffic
accordingly. The core of the autoscaler implementation lies
in the Kubernetes multi-cluster replication controller. We
will be discussing two implementations, one which solely
utilizes Kubernetes’ client-go scaffolding while the other
builds upon Calisti's internal cluster-registry-controller
.
Section 4: Multi-Cluster Replication Controller
Let’s first look at how we can create a multi-cluster
replication controller using Kubernetes’ client-go library.
Our control flow will be as follows; Upon replication or
scale-out, retrieve the desired runtime spec from the
primary cluster then apply the in-memory spec to the peer
cluster. Upon scale back, simply remove the resources from
the peer cluster. For the following example, we will be
showing sample code from the Deployment
multi-cluster
replication handler. Note that the implementation is
practically identical for all k8s core types given their
distinct clientset interfaces.
Given an app label or deployment name from the informer, we can retrieve the desired runtime obeject spec.
func (d *DeploymentHandler) GetDeploymentsByAppLabel(
cl *kubernetes.Clientset,
ns string,
app string) (*app.DeploymentList, error) {
client := cl.AppsV1().Deployments(ns)
deployments, err := client.List(context.TODO(), metav1.ListOptions{
LabelSelector: fmt.Sprintf("app=%s", app),
})
if err != nil {...}
return deployments, nil
}
We then must be able to create a deployment given the in-memory specification that was retrieved.
func (d *DeploymentHandler) createDeployment(
cl *kubernetes.Clientset,
ns string,
deployment *app.Deployment,
) error {
client := cl.AppsV1().Deployments(ns)
_, err := client.Create(context.TODO(),
deployment, metav1.CreateOptions{})
if err != nil {...}
ctx, cancel := context.WithTimeout(context.Background(), time.Second*60)
defer cancel()
// signal when pods are available
err = WaitForRCPods(cl, ctx, deployment.Spec.Template.Labels["app"], ns, int(*deployment.Spec.Replicas))
log.Println("all replicas up")
return err
}
Providing a clientset and deployment spec, we create the deployment and wait for all pod replicas to be available, or interrupt after 60 seconds. We can ensure pods are up by watching the status of pods that belong to the k8s replication controller, in this case, the deployment.
watch, err := cl.CoreV1().Pods(ns).Watch(context.TODO(), metav1.ListOptions{
LabelSelector: fmt.Sprintf("app=%s",
rcLabel),
})
...
for event := range watch.ResultChan() {
p, ok := event.Object.(*cv1.Pod)
if !ok {...}
// check status of pods
switch p.Status.Phase {
case "Pending":
...
case "Running":
...
}
}
We will now tie these two primary functionalities together to complete the cross-cluster replication. First, retrieve the desired spec using the primary cluster’s clientset, then apply the spec to the peer cluster using the peer cluster’s clientset.
func (d *DeploymentHandler) Replicate(
clSource,
clTarget *kubernetes.Clientset,
ns string,
application string) []error {
deployments, err := d.GetDeploymentsByAppLabel(clSource, ns, application)
if err != nil {...}
for _, deployment := range deployments.Items {
deepCpy := deployment.DeepCopy()
deepCpy.ResourceVersion = "" // could add uuid tag or peer-cluster id
err = d.createDeployment(clTarget, ns, deepCpy)
if err != nil {...}
}
return nil
}
Each time a runtime object is replicated to a peer cluster,
we must also replicate any corresponding services. This will
enable our virtual service to seamlessly split traffic
between the primary and replicated services. Service type
replication can be done in the same manner as our
Deployment
handler examples.
With these client methods, we can dynamically move
Kubernetes resources between participating clusters. As
mentioned, we can also achieve the cross-cluster replication
functionality by building a control layer on top of Calisti’s
internal cluster-registry-controller
. The registry
controller is responsible for synchronizing Kubernetes
resources across clusters according to certain rules,
defined by a custom-resource-definition (CRD). For instance,
the following ResourceSyncRule
CRD may be used to
synchronize or copy the matched Secret
to all
participating clusters.
apiVersion: clusterregistry.k8s.cisco.com/v1alpha1
kind: ResourceSyncRule
metadata:
name: test-secret-sink
spec:
groupVersionKind:
kind: Secret
version: v1
rules:
- match:
- objectKey:
name: test-secret
namespace: cluster-registry
Using these rules, we can redefine our multi-cluster
replication control flow. Upon replication or scale-out,
create a ResourceSyncRule
for the desired runtime objects
and associated services on the primary cluster. This will
synchronize these objects to the peer cluster(s). Upon scale
back, remove the ResourceSyncRules
on the primary cluster
and remove the associated resources on the peer cluster.
To utilize the cluster-registry-controller
for replication
we will first generate a clientset for the
cluster-registry
public CRDs using Kubernetes’ code-generator. This will give
us a type-safe method for listing, creating, and deleting
the defined custom-resource-definitions. With the generated
ResourceSyncRule
clientset, creating and deleting the CRD
is no different from core Kubernetes objects.
ruleSpec := clusterregistryv1alpha1.ResourceSyncRule{...}
rule, err := ruleCRDClient.Create(context.TODO(), ruleSpec, metav1.CreateOptions{})
The cluster-registry-controller
takes the burden of
replicating resources into the peer cluster but we will
still use the client-go to remove objects upon reconcile. We
can reuse a helper function used by both implementations to
retrive the correct deletion function for all resource
types.
func GetDeleter(cl *cls.Clientsets, kind, ns string) (func(context.Context, string, metav1.DeleteOptions) error, error) {
var deleter func(context.Context, string, metav1.DeleteOptions) error
switch kind {
case "ResourceSyncRule":
deleter = cl.ResourceSyncRuleV1(ns).Delete
case "Deployment":
deleter = cl.AppsV1().Deployments(ns).Delete
case "Statefulset":
deleter = cl.AppsV1().StatefulSets(ns).Delete
case "Daemonset":
deleter = cl.AppsV1().DaemonSets(ns).Delete
case "Pod":
deleter = cl.CoreV1().Pods(ns).Delete
case "Service":
deleter = cl.CoreV1().Services(ns).Delete
default:
return nil, fmt.Errorf("unsupported kind: %v", kind)
}
return deleter, nil
}
Upon a reconcile policy trigger, we call the deleter for the
ResourceSyncRule
in the primary cluster and the replicated
core type resources in the peer cluster.
func (r *ResourceSyncHandler) Reconcile(clPrimary, clPeer *cls.Clientsets, resourceName, kind, ns string) error {
deleter, err := GetDeleter(clPrimary, "ResourceSyncRule", ns)
if err != nil {...}
ruleName := rulePrefix + resourceName
err = deleter(context.TODO(), ruleName, metav1.DeleteOptions{})
if err != nil {...}
deleter, err = GetDeleter(clPeer, kind, ns)
if err != nil {...}
err = deleter(context.TODO(), resourceName, metav1.DeleteOptions{})
return err
}
Section 5: Traffic Shifting
The final piece of this autoscaler implementation is traffic shifting. When a policy is met and resources replicated, we will create a Calisti virtual service. The virtual service will split traffic between the microservice in the primary cluster and the replicated version in the peer cluster. We can define destination weights to tell the virtual service how much traffic to send to the two microservices. This can be accomplished by creating a gaphql mutation query. Here we have a sample mutation query that creates a virtual service with two service destinations and their weights. Note that these services are in separate clusters.
applyHTTPRoute(
input: {
selector: {
namespace: "smm-demo"
hosts: ["bookings"]
}
rule: {
route: [
{
destination: { host: "bookings", port: { number: 8080 } }
weight: 75
}
{
destination: { host: "bookings-repl", port: { number: 8080 } }
weight: 25
}
]
}
}
)
In our autoscaler controller, we can either use a golang
graphql client or marshal this query into JSON and send an
HTTP Post request to the Calisti graphql API. Note that when
sending a request to the API outside of the graphql console,
we will need to provide the authentication cookie generated
by the smm dashboard
command. To acquire this cookie we
can inspect any request from the Calisti UI to Calisti graphql API.
Once the virtual service takes effect, we should see the replicated microservice appear in the service mesh as it handles application traffic.
Section 6: Autoscaler Demo
Now that we have defined the core components, let’s run the
completed autoscaler and apply a sample request-per-second
event policy for the bookings
microservice. For
demonstation purposes, we will choose rps values relative to
Calisti's demo-app traffic generator, a burst-value
of 40 rps
and a reconcile-value
of 20 rps. Prior to execution, we
can confirm that the bookings
microservice is within our
primary cluster.
replctl apply bookings-controller.yaml
To quickly test the controller we can force a cross-cluster
replication event by generating additional load to the
bookings
service via Calisti’s per-service HTTP load
generator. Specifying the service, port, and method, we will
generate 100 requests-per-second for a period of 30 seconds.
Checking the controller logs, we should eventually see an
event triggered as the service’s 5-second average for
requests-per-second surpasses the bust-value
.
…
burst triggered for app=bookings
2022/03/27 14:21:55 deployments created and being evaluated
2022/03/27 14:21:55 Waiting for 2 pods to be running.
2022/03/27 14:21:56 pod status: Pending
2022/03/27 14:21:56 pod status: Pending
2022/03/27 14:21:56 pod status: Pending
2022/03/27 14:21:56 pod status: Pending
2022/03/27 14:21:58 pod status: Pending
2022/03/27 14:21:59 pod status: Pending
2022/03/27 14:22:01 all replicas up
2022/03/27 14:22:01 setting v-service route...
2022/03/27 14:22:01 route set.
We can verify that the virtual service and destination rules were added to the bookings services.
We can see that there is a route policy splitting traffic
between the bookings
service and the bookings-repl
service in the peer cluster. If we again check the topology
of the mesh, we should see the new bookings
deployment in
the peer cluster.
The topology confirms that the new deployment is up and is routing traffic to downstream microservices in the peer cluster.
Since we added a short burst of artificial HTTP load, the
received requests-per-second will eventually fall back below
our event policy’s reconcile-value
. This will trigger a
reconcile or scale-back event, removing the traffic rule and
deployment from the peer cluster.
Summary
This blog highlighted how a service mesh framework, namely
Calisti, can be leveraged to dynamically scale or replicate
services across a multi-cluster mesh. Using Calisti's graphql
API we were able to seamlessly extract service level metrics
to inform scaling events. Furthermore, utilizing Kubernetes'
client-go and Calisti's cluster-registry-controller
, we were
able to replicate and reconcile Kubernetes objects across
clusters. This is intended to be a starting point for anyone
interested in service meshes, cloud-native automation, and
of course, Kubernetes.
References
Calisti docs