Must know
Azure
Kubernetes Best
practices and
features for
better resiliency
@MaheskBlr
Overview
AKS specific best practices after working with multiple customers
Mostly about Day-2 challenges and solve
Upcoming features, SLA’s, Node pools, Availability Zones – for maximum
resiliency
What’s Your Kubernetes Maturity?
https://www.cncf.io/blog/2021/01/12/whats-your-kubernetes-maturity/
1. Multi-tenancy
• Namespace - logical isolation boundary
• Scheduling - use resource quotas, pdb’s, advanced features like taints
and tolerations, node selectors, node and pod affinity or anti-affinity
• Networking - use network policies to control the flow of traffic in and
out of pods
• Auth and Authorization – use of RBAC and AAD, Pod Identies and Azure
KeyValut
• Containers – Azure Policy Add-on to enforce pod security, security
contexts, scanning images.
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-app-team
spec:
hard:
cpu: "10"
memory: 20Gi
pods: "10"
$ kubectl apply -f dev-app-team-quotas.yaml --namespace dev-apps
2. Enforce Resource Quota
Best practice guidance - Plan and apply resource quotas at the namespace level. If
pods don't define resource requests and limits, reject the deployment. Monitor
resource usage and adjust quotas as needed.
$ kubectl apply -f nginx-pdb.yaml
Best practice guidance - To maintain the availability of applications, define Pod Disruption
Budgets (PDBs) to make sure that a minimum number of pods are available in the cluster.
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: nginx-pdb
spec:
minAvailable: 3
selector:
matchLabels:
app: nginx-frontend
3. Use Pod Distruption Budget (PDB’s)
Best practice guidance:
Control the scheduling of pods
on nodes using node selectors,
node affinity, or inter-pod
affinity.
These settings allow the
Kubernetes scheduler to
logically isolate workloads, such
as by hardware in the node.
4. Use Node Affinity, Inter-pod affinity and Anti-affinity
Scans a cluster and reports on issues that it finds
Say, identify pods that don't have resource requests and limits in place
5. Use Kube-Advisor
Best practice guidance
Regularly run the latest version of kube-advisor open source tool to detect issues in
your cluster. If you apply resource quotas on an existing AKS cluster, run kube-advisor
first to find pods that don't have resource requests and limits defined.
https://github.com/Azure/kube-advisor
6. AKS - Uptime SLA
Uptime SLA is an optional feature to enable a financially backed, higher
SLA for a cluster.
99.95% of K8s API server endpoint for clusters that -> AZ
99.9% of availability for clusters that don't use AZ.
AKS uses master node replicas across update and fault domains to
ensure SLA requirements are met.
7. Create an AKS cluster across availability zones
az group create --name myResourceGroup --location eastus2
az aks create 
--resource-group myResourceGroup 
--name myAKSCluster 
--generate-ssh-keys 
--vm-set-type VirtualMachineScaleSets 
--load-balancer-sku standard 
--node-count 3 
--zones 1 2 3
az aks nodepool add 
--resource-group aksdayconf-rg 
--cluster-name OpsTeamAKScluster 
--name mynodepool 
--node-count 3
az aks nodepool list --resource-group aksdayconf-rg --cluster-name OpsTeamAKScluster
8. Have more than 1 Node Pool
9. Azure Policy
Continues compliance is must to maintain compliance in a proactive rather
reactive approach.
Achieve real-time cloud compliance at scale with consistent resource
governance. It has a quite an exhaustive list of policies here
https://github.com/azure/azure-policy
Best part is, we could roll out custom policies on the resources. The rules can be
written in a declarative style.
10. Auto Scale Cluster nodes and pods
As demand for resources change, the number of cluster nodes or pods that run your services can
automatically scale up or down.
Use both HPA & Cluster Autoscaler approach.
This approach to scaling lets the AKS cluster automatically adjust to demands and only run the
resources needed.
az aks nodepool add 
--resource-group aksdayconf-rg 
--cluster-name OpsTeamAKScluster 
--name mynodepool 
--enable-cluster-autoscaler 
--min-count 5 
--max-count 10 
--no-wait
11. Start and Stop AKS Cluster
1) az extension add --name aks-preview
2) az extension update --name aks-preview
3) az feature register --namespace "Microsoft.ContainerService" --name "StartStopPreview“
4) az feature list -o table --query "[?contains(name,
'Microsoft.ContainerService/StartStopPreview')].{Name:name,State:properties.state}“
5) az provider register --namespace Microsoft.ContainerService
6) az aks stop --name OpsTeamAKScluster --resource-group aksdayconf-rg
7) az aks start --name OpsTeamAKScluster --resource-group aksdayconf-rg
https://docs.microsoft.com/en-us/azure/aks/start-stop-cluster * Preserve for 12 months, supports only VMSS
12. AKS Cluster Capacity Planning
1. How many nodes do I need in my AKS cluster?
2. Does the size of the subnet of my nodes matter?
3. How many pods could be run on the cluster?
https://techcommunity.microsoft.com/t5/core-infrastructure-and-security/azure-kubernetes-service-cluster-capacity-
planning/ba-p/1474990
13. Use AKS Diagnostics
14. Use Azure Advisor
14.1 Use Azure Advisor
15. Use Azure Periscope
https://github.com/Azure/aks-periscope
when things do go wrong, AKS customers need a tool to help them diagnose and collect the logs necessary to
troubleshoot the issue.
16. Production Checklist
1. Regions - Select the region based on your compliance requirement – You cannot change later
2. Version – Select the most stable version for production
3. Use Node Pools and Az Zones – minimum of 2 pods and use AZ
4. Services - recommend using Ingress rather than exposing all of them as Load Balancer
5. VM Type – Select appropriate VM type – you can only add new node pools but cannot change types
6. Max Pods in Cluster, Max Pods in Node, Pod request (CPU/Memory), Pod limits (CPU/Memory)
7. Networking : Recommend Azure CNI instead Kubenet (Unless org has a restriction on IP Addr to be assigned to the subnet)
8. API Server Access – restrict via IP Whitelisting; Storage and Databases – use managed/PaaS as much as possible
9. Monitor – Use Prometheus, Filebeat or Azure Monitor (easy to implement)
10. Node restarts – recommend Kured for automating node reboots after OS Patching
Azure Kubernetes Service solution journey
AKS DevOps must links
- AKS Current preview features: https://aka.ms/aks/preview-features
- AKS Release notes: https://aka.ms/aks/releasenotes
- AKS Public roadmap: http://aka.ms/aks/roadmap
- AKS Known-issues: https://aka.ms/aks/knownissues
- AKS Feature Requests: https://aka.ms/aks/feature-requests
- AKS Public FAQ: https://aka.ms/aks/public-faq
MahesKBlr
Q&A - Thank you
https://www.linkedin.com/in/mfcmahesh/ Maheshk@microsoft.com
https://www.the-aks-checklist.com/
Increase your application availability with
pod anti-affinity settings in Azure
Kubernetes Service
https://www.danielstechblog.io/increase-your-application-availability-with-pod-anti-affinity-settings-in-
azure-kubernetes-service/
https://povilasv.me/vertical-pod-autoscaling-the-definitive-guide/
VERTICAL POD AUTOSCALING: THE DEFINITIVE GUIDE
https://dominik-tornow.medium.com/kubernetes-networking-
22ea81af44d0
Kubernetes Networking
A Guide to the Kubernetes Networking Model
https://sookocheff.com/post/kubernetes/understanding-
kubernetes-networking-model/
we’ll build a baseline infrastructure that deploys an Azure Kubernetes Service (AKS) cluster. This
article includes recommendations for networking, security, identity, management, and monitoring of
the cluster based on an organization’s business requirements.
• https://github.com/mspnp/aks-secure-baseline

Must Know Azure Kubernetes Best Practices And Features For Better Resiliency by Maheshkumar R

  • 1.
    Must know Azure Kubernetes Best practicesand features for better resiliency @MaheskBlr
  • 2.
    Overview AKS specific bestpractices after working with multiple customers Mostly about Day-2 challenges and solve Upcoming features, SLA’s, Node pools, Availability Zones – for maximum resiliency
  • 3.
    What’s Your KubernetesMaturity? https://www.cncf.io/blog/2021/01/12/whats-your-kubernetes-maturity/
  • 4.
    1. Multi-tenancy • Namespace- logical isolation boundary • Scheduling - use resource quotas, pdb’s, advanced features like taints and tolerations, node selectors, node and pod affinity or anti-affinity • Networking - use network policies to control the flow of traffic in and out of pods • Auth and Authorization – use of RBAC and AAD, Pod Identies and Azure KeyValut • Containers – Azure Policy Add-on to enforce pod security, security contexts, scanning images.
  • 5.
    apiVersion: v1 kind: ResourceQuota metadata: name:dev-app-team spec: hard: cpu: "10" memory: 20Gi pods: "10" $ kubectl apply -f dev-app-team-quotas.yaml --namespace dev-apps 2. Enforce Resource Quota Best practice guidance - Plan and apply resource quotas at the namespace level. If pods don't define resource requests and limits, reject the deployment. Monitor resource usage and adjust quotas as needed.
  • 6.
    $ kubectl apply-f nginx-pdb.yaml Best practice guidance - To maintain the availability of applications, define Pod Disruption Budgets (PDBs) to make sure that a minimum number of pods are available in the cluster. apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: nginx-pdb spec: minAvailable: 3 selector: matchLabels: app: nginx-frontend 3. Use Pod Distruption Budget (PDB’s)
  • 7.
    Best practice guidance: Controlthe scheduling of pods on nodes using node selectors, node affinity, or inter-pod affinity. These settings allow the Kubernetes scheduler to logically isolate workloads, such as by hardware in the node. 4. Use Node Affinity, Inter-pod affinity and Anti-affinity
  • 8.
    Scans a clusterand reports on issues that it finds Say, identify pods that don't have resource requests and limits in place 5. Use Kube-Advisor Best practice guidance Regularly run the latest version of kube-advisor open source tool to detect issues in your cluster. If you apply resource quotas on an existing AKS cluster, run kube-advisor first to find pods that don't have resource requests and limits defined.
  • 9.
  • 10.
    6. AKS -Uptime SLA Uptime SLA is an optional feature to enable a financially backed, higher SLA for a cluster. 99.95% of K8s API server endpoint for clusters that -> AZ 99.9% of availability for clusters that don't use AZ. AKS uses master node replicas across update and fault domains to ensure SLA requirements are met.
  • 11.
    7. Create anAKS cluster across availability zones az group create --name myResourceGroup --location eastus2 az aks create --resource-group myResourceGroup --name myAKSCluster --generate-ssh-keys --vm-set-type VirtualMachineScaleSets --load-balancer-sku standard --node-count 3 --zones 1 2 3
  • 12.
    az aks nodepooladd --resource-group aksdayconf-rg --cluster-name OpsTeamAKScluster --name mynodepool --node-count 3 az aks nodepool list --resource-group aksdayconf-rg --cluster-name OpsTeamAKScluster 8. Have more than 1 Node Pool
  • 13.
    9. Azure Policy Continuescompliance is must to maintain compliance in a proactive rather reactive approach. Achieve real-time cloud compliance at scale with consistent resource governance. It has a quite an exhaustive list of policies here https://github.com/azure/azure-policy Best part is, we could roll out custom policies on the resources. The rules can be written in a declarative style.
  • 14.
    10. Auto ScaleCluster nodes and pods As demand for resources change, the number of cluster nodes or pods that run your services can automatically scale up or down. Use both HPA & Cluster Autoscaler approach. This approach to scaling lets the AKS cluster automatically adjust to demands and only run the resources needed. az aks nodepool add --resource-group aksdayconf-rg --cluster-name OpsTeamAKScluster --name mynodepool --enable-cluster-autoscaler --min-count 5 --max-count 10 --no-wait
  • 15.
    11. Start andStop AKS Cluster 1) az extension add --name aks-preview 2) az extension update --name aks-preview 3) az feature register --namespace "Microsoft.ContainerService" --name "StartStopPreview“ 4) az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/StartStopPreview')].{Name:name,State:properties.state}“ 5) az provider register --namespace Microsoft.ContainerService 6) az aks stop --name OpsTeamAKScluster --resource-group aksdayconf-rg 7) az aks start --name OpsTeamAKScluster --resource-group aksdayconf-rg https://docs.microsoft.com/en-us/azure/aks/start-stop-cluster * Preserve for 12 months, supports only VMSS
  • 16.
    12. AKS ClusterCapacity Planning 1. How many nodes do I need in my AKS cluster? 2. Does the size of the subnet of my nodes matter? 3. How many pods could be run on the cluster? https://techcommunity.microsoft.com/t5/core-infrastructure-and-security/azure-kubernetes-service-cluster-capacity- planning/ba-p/1474990
  • 17.
    13. Use AKSDiagnostics
  • 18.
  • 19.
  • 20.
    15. Use AzurePeriscope https://github.com/Azure/aks-periscope when things do go wrong, AKS customers need a tool to help them diagnose and collect the logs necessary to troubleshoot the issue.
  • 21.
    16. Production Checklist 1.Regions - Select the region based on your compliance requirement – You cannot change later 2. Version – Select the most stable version for production 3. Use Node Pools and Az Zones – minimum of 2 pods and use AZ 4. Services - recommend using Ingress rather than exposing all of them as Load Balancer 5. VM Type – Select appropriate VM type – you can only add new node pools but cannot change types 6. Max Pods in Cluster, Max Pods in Node, Pod request (CPU/Memory), Pod limits (CPU/Memory) 7. Networking : Recommend Azure CNI instead Kubenet (Unless org has a restriction on IP Addr to be assigned to the subnet) 8. API Server Access – restrict via IP Whitelisting; Storage and Databases – use managed/PaaS as much as possible 9. Monitor – Use Prometheus, Filebeat or Azure Monitor (easy to implement) 10. Node restarts – recommend Kured for automating node reboots after OS Patching
  • 22.
    Azure Kubernetes Servicesolution journey
  • 23.
    AKS DevOps mustlinks - AKS Current preview features: https://aka.ms/aks/preview-features - AKS Release notes: https://aka.ms/aks/releasenotes - AKS Public roadmap: http://aka.ms/aks/roadmap - AKS Known-issues: https://aka.ms/aks/knownissues - AKS Feature Requests: https://aka.ms/aks/feature-requests - AKS Public FAQ: https://aka.ms/aks/public-faq MahesKBlr Q&A - Thank you https://www.linkedin.com/in/mfcmahesh/ [email protected] https://www.the-aks-checklist.com/
  • 26.
    Increase your applicationavailability with pod anti-affinity settings in Azure Kubernetes Service https://www.danielstechblog.io/increase-your-application-availability-with-pod-anti-affinity-settings-in- azure-kubernetes-service/ https://povilasv.me/vertical-pod-autoscaling-the-definitive-guide/ VERTICAL POD AUTOSCALING: THE DEFINITIVE GUIDE https://dominik-tornow.medium.com/kubernetes-networking- 22ea81af44d0 Kubernetes Networking A Guide to the Kubernetes Networking Model https://sookocheff.com/post/kubernetes/understanding- kubernetes-networking-model/
  • 27.
    we’ll build abaseline infrastructure that deploys an Azure Kubernetes Service (AKS) cluster. This article includes recommendations for networking, security, identity, management, and monitoring of the cluster based on an organization’s business requirements. • https://github.com/mspnp/aks-secure-baseline