BlogKubernetes

Kubernetes on AKS: Production Best Practices

Running Kubernetes in a demo is straightforward.

Author

Artan Ajredini

Artan Ajredini

CEO & Cloud Architect

5 min read
31 March 2025

Introduction: Production AKS is Different

Running Kubernetes in a demo is straightforward. A single node pool, default settings, and a few kubectl apply commands and you have something working. Production is a different challenge entirely. The configuration decisions you make when provisioning a cluster are often difficult and costly to change later — node pool architecture, networking model, identity strategy, and autoscaling configuration all have long-term implications.

AKS abstracts the Kubernetes control plane, but everything else is your responsibility: node sizing, pod resource limits, network policies, secret management, and high availability. This guide covers the production best practices we apply on every AKS cluster we deploy — not theory, but the specific configuration choices that prevent incidents.

The most expensive AKS mistakes are not the ones that cause outages — they are the configuration decisions made in week one that force a cluster rebuild in month six. Get the foundations right from the start.

  • Separate system and user node pools to protect cluster-critical components from noisy workloads.
  • Set resource requests and limits on every pod — without them, one bad deployment can starve the entire node.
  • Use Pod Disruption Budgets to guarantee availability during node upgrades and autoscaling events.
  • Replace pod-managed identities with Azure Workload Identity — the deprecated approach is a security risk.
  • Deploy across availability zones for resilience against single datacenter failures.

Cluster Configuration Best Practices

Separate system and user node pools

AKS requires at least one system node pool to run cluster-critical components: CoreDNS, the metrics server, and the kube-proxy. If a workload on the same node pool consumes all available CPU or memory, these components are evicted and the cluster becomes unstable. Always separate system and user node pools.

yaml
# System node pool — cluster-critical components only
az aks nodepool add \
  --cluster-name myAKSCluster \
  --resource-group myRG \
  --name system \
  --mode System \
  --node-count 3 \
  --node-vm-size Standard_D2s_v3 \
  --zones 1 2 3 \
  --node-taints CriticalAddonsOnly=true:NoSchedule

# User node pool — application workloads
az aks nodepool add \
  --cluster-name myAKSCluster \
  --resource-group myRG \
  --name apps \
  --mode User \
  --node-count 3 \
  --node-vm-size Standard_D4s_v3 \
  --zones 1 2 3 \
  --enable-cluster-autoscaler \
  --min-count 2 \
  --max-count 10

Resource requests and limits

Kubernetes schedules pods based on resource requests — the guaranteed minimum CPU and memory a pod needs. Without requests, the scheduler places pods arbitrarily, leading to overloaded nodes and evictions. Without limits, a single runaway pod can consume an entire node's resources.

yaml
# Every container must have requests and limits defined
spec:
  containers:
    - name: api
      image: myregistry.azurecr.io/api:1.0.0
      resources:
        requests:
          cpu: 100m       # 0.1 CPU cores guaranteed
          memory: 128Mi   # 128 MB guaranteed
        limits:
          cpu: 500m       # max 0.5 CPU cores
          memory: 256Mi   # max 256 MB — OOMKilled if exceeded

Use a LimitRange in each namespace to enforce default requests and limits for pods that do not specify them. This prevents a missing resources block from silently deploying with no constraints.

yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
    - type: Container
      default:
        cpu: 200m
        memory: 256Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi

Pod Disruption Budgets

During node upgrades, autoscaling scale-down events, or voluntary evictions, Kubernetes may need to terminate pods. Without a Pod Disruption Budget (PDB), it can terminate all replicas of a deployment simultaneously — causing a full outage. A PDB guarantees that a minimum number of pods remain available during disruption.

yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
  namespace: production
spec:
  minAvailable: 2        # at least 2 pods must remain running
  selector:
    matchLabels:
      app: api

Network Policies for micro-segmentation

By default, every pod in a Kubernetes cluster can communicate with every other pod. In production, apply Network Policies to restrict traffic to only what is explicitly needed — deny all, then allow specific paths.

yaml
# Deny all ingress by default for a namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes: [Ingress]
---
# Allow ingress to the API only from the ingress controller
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-to-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ingress-nginx

Workload Identity and Secret Management

Applications running on AKS frequently need to access other Azure services: Key Vault for secrets, Storage for files, Service Bus for messaging. The wrong way to do this is to put service principal credentials in environment variables or Kubernetes Secrets (which are only base64-encoded, not encrypted at rest by default). The right way is Azure Workload Identity.

Azure Workload Identity

Azure Workload Identity allows a Kubernetes pod to authenticate to Azure services using a federated identity — no client secrets, no certificates, no credentials to rotate. The pod's Kubernetes service account is linked to an Azure Managed Identity via OIDC federation. When the pod calls an Azure SDK, it automatically gets a token.

yaml
# 1. Annotate the Kubernetes service account with the managed identity client ID
apiVersion: v1
kind: ServiceAccount
metadata:
  name: api-service-account
  namespace: production
  annotations:
    azure.workload.identity/client-id: "<managed-identity-client-id>"

---
# 2. Label the pod to use workload identity
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  template:
    metadata:
      labels:
        azure.workload.identity/use: "true"
    spec:
      serviceAccountName: api-service-account
      containers:
        - name: api
          image: myregistry.azurecr.io/api:1.0.0
csharp
// In the application — DefaultAzureCredential picks up the workload identity token automatically
using Azure.Identity;
using Azure.Security.KeyVault.Secrets;

var client = new SecretClient(
    new Uri("https://my-keyvault.vault.azure.net/"),
    new DefaultAzureCredential()   // uses workload identity when running on AKS
);

var secret = await client.GetSecretAsync("DatabaseConnectionString");

Secrets Store CSI Driver

The Secrets Store CSI Driver mounts Azure Key Vault secrets directly into pods as files or environment variables, keeping secrets out of Kubernetes Secret objects entirely. Secrets are fetched from Key Vault at pod startup and automatically rotated when they change.

yaml
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: keyvault-secrets
  namespace: production
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    clientID: "<managed-identity-client-id>"
    keyvaultName: "my-keyvault"
    tenantId: "<tenant-id>"
    objects: |
      array:
        - |
          objectName: DatabaseConnectionString
          objectType: secret
        - |
          objectName: ApiKey
          objectType: secret

High Availability and Autoscaling

A production AKS cluster must survive the failure of a single node, availability zone, or even a temporary Azure platform issue — without an outage. High availability on AKS is achieved through a combination of multi-zone node pools, the cluster autoscaler, and the Horizontal Pod Autoscaler.

Multi-zone node pools

Deploy node pools across all three availability zones in your Azure region. AKS spreads nodes evenly across zones. Combined with pod anti-affinity rules that prevent multiple replicas from landing on the same zone, your application survives a full zone outage.

yaml
# Force replicas to spread across availability zones
spec:
  template:
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api

Cluster Autoscaler

The Cluster Autoscaler adds nodes when pods cannot be scheduled due to insufficient resources, and removes nodes when they are underutilised. Configure it with a sensible min/max range and tune the scale-down delay to avoid aggressive deprovisioning that causes pod churn.

yaml
# Cluster autoscaler profile — applied at cluster level
az aks update \
  --resource-group myRG \
  --name myAKSCluster \
  --cluster-autoscaler-profile \
    scale-down-delay-after-add=10m \
    scale-down-unneeded-time=10m \
    scale-down-utilization-threshold=0.5 \
    max-graceful-termination-sec=600

Horizontal Pod Autoscaler (HPA)

The HPA scales the number of pod replicas based on CPU utilisation, memory, or custom metrics. It works in tandem with the Cluster Autoscaler: HPA requests more pods, the autoscaler adds more nodes to accommodate them.

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70   # scale up when avg CPU > 70%
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Cluster upgrades without downtime

AKS releases new Kubernetes versions regularly. Staying within the supported version window (N-2 minor versions) is required for Microsoft support. Use the surge upgrade feature to provision extra nodes before draining old ones — this avoids capacity constraints during upgrades and, combined with PDBs, ensures zero-downtime rolling upgrades.

Want us to review your AKS configuration?

We audit production AKS clusters, identify gaps in security, reliability, and cost efficiency, and provide a prioritised remediation plan.

Book a cluster review

Closing Thoughts

Production AKS is not complicated — but it requires deliberate configuration from the start. Separate your node pools, set resource requests and limits on every pod, deploy across availability zones, and replace any pod-managed identities with Azure Workload Identity. These changes alone will make your cluster significantly more reliable, secure, and cost-efficient.

Add Pod Disruption Budgets before you enable automatic upgrades, configure the Cluster Autoscaler with conservative scale-down settings, and use the Secrets Store CSI Driver to keep credentials out of Kubernetes Secrets. The teams that invest in these foundations in week one never have to deal with the painful cluster rebuilds that come from skipping them.

More articles

View all
CI/CD Pipelines with Azure DevOps and GitHub Actions
about 1 year ago1 min read

CI/CD Pipelines with Azure DevOps and GitHub Actions

A well-designed CI/CD pipeline is the backbone of a high-performing engineering team. In this article, we compare Azure DevOps Pipelines and GitHub Actions and explain how to combine both tools to get the best of each ecosystem. We build a complete pipeline from scratch: code commit triggers a GitHub Actions workflow that runs unit tests and builds a Docker image, pushes it to Azure Container Registry, and then hands off to an Azure DevOps release pipeline for staged deployment to AKS — with approval gates between environments. We also cover secrets management with Azure Key Vault, environment-specific configuration using variable groups, and how to set up rollback strategies using deployment slots and blue-green releases. Practical YAML examples are included throughout.

Read article
Building RAG Pipelines with Azure AI Search and GPT-4o
about 1 year ago1 min read

Building RAG Pipelines with Azure AI Search and GPT-4o

Retrieval-Augmented Generation (RAG) is the architecture that turns a general-purpose language model into a domain expert grounded in your own data. Instead of fine-tuning — which is expensive and produces models that go stale — RAG retrieves the most relevant documents at query time and passes them as context to the model. In this article, we build a complete RAG pipeline on Azure: documents are uploaded to Azure Blob Storage, indexed by Azure AI Search using vector embeddings from Azure OpenAI, and retrieved at query time using hybrid search (vector + keyword). The retrieved chunks are then assembled into a prompt sent to GPT-4o, which generates a grounded answer with source citations. We cover chunking strategies, embedding model selection, index schema design, semantic ranking, and how to evaluate retrieval quality. Full code examples in Python using the Azure SDK are included.

Read article
Azure Cost Optimisation: Cut Your Cloud Bill by 40%
about 1 year ago1 min read

Azure Cost Optimisation: Cut Your Cloud Bill by 40%

Cloud costs have a habit of growing faster than the business value they deliver. In our experience working with Azure customers across industries, most organisations have between 25% and 45% immediate savings available without any impact on performance or reliability. In this article, we walk through the most impactful cost reduction techniques: right-sizing virtual machines using Azure Advisor recommendations, converting pay-as-you-go workloads to Reserved Instances or Savings Plans, enabling auto-shutdown for non-production environments, replacing always-on VMs with Azure Container Apps or Functions for batch workloads, and deleting orphaned resources like unused disks and public IPs. We also show how to set up cost alerts and budgets in Azure Cost Management so that surprises are caught early, before they appear on the invoice.

Read article