CoreDNS and Service Discovery Deep Dive¶
Master Kubernetes DNS resolution and service discovery with CoreDNS. Learn DNS patterns, troubleshooting techniques, and advanced configuration strategies essential for the CKA exam and production Kubernetes environments.
Introduction¶
In the dynamic world of Kubernetes, where pods are ephemeral and IP addresses constantly change, DNS becomes the critical foundation for reliable service communication. Without DNS, your microservices architecture would crumble—applications couldn't discover each other, load balancing would fail, and the entire orchestration system would become unmanageable.
CoreDNS serves as Kubernetes' DNS server, automatically creating DNS records for every Service and Pod in your cluster. When a pod needs to communicate with a backend service, it doesn't need to know the service's IP address—it simply queries DNS for backend-service and CoreDNS returns the appropriate ClusterIP. This seamless service discovery is what makes Kubernetes' dynamic environment practical for real-world applications.
For the Certified Kubernetes Administrator (CKA) exam, DNS troubleshooting is a critical skill. You'll encounter scenarios where applications can't communicate, services aren't resolving, or DNS queries are failing. Understanding CoreDNS architecture, configuration, and debugging techniques is essential for both the exam and production operations.
In this comprehensive guide, you'll master:
- DNS architecture: CoreDNS components, plugins, and how DNS resolution works
- Service DNS patterns: ClusterIP services, headless services, and SRV records
- Pod DNS configuration: DNS policies, search domains, and FQDN construction
- CoreDNS configuration: Corefile structure, plugin chains, and custom DNS entries
- Advanced features: NodeLocal DNSCache (2025), DNS policies, and performance optimization
- Troubleshooting mastery: Systematic debugging workflows using nslookup, dig, and CoreDNS logs
- CKA exam skills: Fast DNS testing, common scenarios, and time-saving techniques
CKA Exam Relevance: DNS and service discovery appear across multiple exam domains—Services & Networking (20%), Troubleshooting (30%), and Cluster Architecture (25%). You must demonstrate proficiency in diagnosing DNS issues, understanding CoreDNS configuration, and validating service discovery under exam time pressure.
Whether you're preparing for the CKA certification or building production-grade Kubernetes expertise, understanding CoreDNS is fundamental. Let's explore how DNS makes Kubernetes networking work seamlessly.
DNS in Kubernetes Architecture¶
CoreDNS Overview¶
CoreDNS replaced kube-dns as the default Kubernetes DNS server starting in Kubernetes 1.13, and for good reason. Built in Go and designed as a modular, plugin-based DNS server, CoreDNS is faster, more flexible, and significantly easier to configure than its predecessor.
At its core, CoreDNS runs as a Deployment in the kube-system namespace, typically with two replicas for high availability. Each CoreDNS pod watches the Kubernetes API server for Service and Endpoint changes, automatically creating and updating DNS records as resources are created, modified, or deleted.
Here's the basic architecture:
graph TB
subgraph "Application Pod"
App[Application Container]
Resolver[/etc/resolv.conf]
end
subgraph "kube-system Namespace"
CoreDNS1[CoreDNS Pod 1]
CoreDNS2[CoreDNS Pod 2]
ConfigMap[CoreDNS ConfigMap]
end
subgraph "Kubernetes API"
APIServer[API Server]
ServiceRegistry[Services]
EndpointRegistry[Endpoints]
end
App -->|DNS Query| Resolver
Resolver -->|UDP/TCP 53| CoreDNS1
Resolver -->|UDP/TCP 53| CoreDNS2
CoreDNS1 -->|Watch| APIServer
CoreDNS2 -->|Watch| APIServer
APIServer -->|Service Events| ServiceRegistry
APIServer -->|Endpoint Events| EndpointRegistry
ConfigMap -->|Configuration| CoreDNS1
ConfigMap -->|Configuration| CoreDNS2
style App fill:#e1f5ff
style CoreDNS1 fill:#fff4e1
style CoreDNS2 fill:#fff4e1
style ConfigMap fill:#f0f0f0
Key Components:
- CoreDNS Deployment: Runs the DNS server pods with resource limits and anti-affinity rules
- CoreDNS Service: ClusterIP service (usually
10.96.0.10) that kubelet configures in pod/etc/resolv.conf - CoreDNS ConfigMap: Contains the Corefile configuration defining DNS resolution behavior
- Kubernetes Plugin: CoreDNS plugin that watches the API and generates DNS records
- Cache Plugin: In-memory cache for DNS queries to reduce API load
- Forward Plugin: Forwards non-cluster queries to upstream DNS servers
DNS Record Types in Kubernetes¶
CoreDNS creates several types of DNS records automatically:
Service DNS Records (A/AAAA):
Example: nginx.default.svc.cluster.local → 10.96.100.50 (ClusterIP)
Headless Service Records (A/AAAA):
<service-name>.<namespace>.svc.<cluster-domain> → Pod IPs
<pod-hostname>.<service-name>.<namespace>.svc.<cluster-domain> → Pod IP
Example: nginx.default.svc.cluster.local → 10.244.1.10, 10.244.2.15 (Pod IPs)
Pod DNS Records (A/AAAA):
Example: 10-244-1-10.default.pod.cluster.local → 10.244.1.10
SRV Records (Service Discovery with Port Information):
Example: _http._tcp.nginx.default.svc.cluster.local → 0 100 80 nginx.default.svc.cluster.local
DNS Resolution Flow¶
Understanding how DNS queries flow through the system is critical for troubleshooting:
sequenceDiagram
participant App as Application Pod
participant Kubelet as kubelet
participant CoreDNS as CoreDNS Pod
participant Cache as DNS Cache
participant API as Kubernetes API
participant Upstream as Upstream DNS
Note over App,Kubelet: Pod Creation
Kubelet->>App: Inject /etc/resolv.conf<br/>nameserver 10.96.0.10<br/>search default.svc.cluster.local
Note over App,CoreDNS: DNS Query: "backend"
App->>CoreDNS: Query "backend" (short name)
CoreDNS->>CoreDNS: Apply search domains:<br/>backend.default.svc.cluster.local
CoreDNS->>Cache: Check cache
alt Cache Hit
Cache->>CoreDNS: Return cached IP
CoreDNS->>App: 10.96.200.100
else Cache Miss
CoreDNS->>API: Query Service "backend" in "default"
API->>CoreDNS: ClusterIP: 10.96.200.100
CoreDNS->>Cache: Store in cache (30s TTL)
CoreDNS->>App: 10.96.200.100
end
Note over App,Upstream: External Query: "google.com"
App->>CoreDNS: Query "google.com"
CoreDNS->>CoreDNS: Not cluster domain
CoreDNS->>Upstream: Forward to 8.8.8.8
Upstream->>CoreDNS: 142.250.185.46
CoreDNS->>App: 142.250.185.46
Resolution Steps:
- Application initiates query: App container queries DNS using the nameserver from
/etc/resolv.conf - Search domain expansion: If querying a short name (e.g.,
backend), the resolver appends search domains - CoreDNS receives query: Query reaches CoreDNS service (typically
10.96.0.10:53) - Cache check: CoreDNS checks its in-memory cache for the record
- Kubernetes plugin lookup: If cache miss, the kubernetes plugin queries the API for Service/Endpoint data
- Response construction: CoreDNS builds the DNS response with the appropriate A/AAAA records
- Cache storage: Result is cached with TTL (typically 30 seconds for cluster records)
- External forwarding: Non-cluster queries are forwarded to upstream DNS servers
Search Domains and FQDN Construction¶
Every pod's /etc/resolv.conf contains search domains that enable short-name DNS queries:
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Search Domain Behavior:
- Query:
backend→ Triesbackend.default.svc.cluster.local, thenbackend.svc.cluster.local, thenbackend.cluster.local - Query:
backend.production→ Triesbackend.production.default.svc.cluster.local, thenbackend.production.svc.cluster.local, etc. - Query:
backend.production.svc.cluster.local→ Direct FQDN query, no search domain expansion
ndots Configuration:
The ndots:5 option determines when search domains are applied. If a query has fewer than 5 dots, search domains are tried first. This means:
backend(0 dots) → Search domains appliedbackend.production(1 dot) → Search domains appliedbackend.production.svc.cluster.local.(4 dots + trailing.) → Direct query, no search
Important: Always use the trailing dot (.) for external domains to avoid unnecessary search domain queries:
# Inefficient - tries search domains first
curl http://api.example.com/data
# Efficient - direct query
curl http://api.example.com./data
CoreDNS Configuration Deep Dive¶
Corefile Structure¶
CoreDNS is configured via a ConfigMap named coredns in the kube-system namespace. The ConfigMap contains a Corefile—a configuration format similar to nginx or Apache—that defines DNS server behavior through a plugin chain.
Let's examine a typical Corefile:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
Corefile Sections Explained:
.:53: Server block listening on port 53 for all domains (.matches everything)- Plugin order matters: Plugins are executed in the order listed (errors → health → ready → kubernetes → ...)
- Each plugin: Configures specific DNS behavior (logging, health checks, Kubernetes records, caching, etc.)
CoreDNS Plugins¶
Let's break down each plugin and its purpose:
1. errors Plugin¶
Enables error logging to stdout. Essential for troubleshooting DNS issues.2. health Plugin¶
Exposes a health check endpoint at:8080/health. The lameduck period delays the health check failure during pod shutdown, allowing time for graceful termination.
3. ready Plugin¶
Exposes a readiness check at:8181/ready. This endpoint returns 200 OK when all plugins are ready to serve queries.
4. kubernetes Plugin¶
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
This is the core plugin that provides Kubernetes DNS functionality:
cluster.local: The cluster domain for DNS recordsin-addr.arpa ip6.arpa: Reverse DNS zones for PTR lookupspods insecure: Enables pod DNS records using pod IP address (10-244-1-10.default.pod.cluster.local)insecuremode doesn't verify pod existence (faster but less strict)- Use
pods verifiedfor production to validate pods exist fallthrough: Pass queries for reverse DNS zones to the next plugin if not foundttl 30: DNS records have a 30-second time-to-live
Advanced Options:
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods verified
endpoint_pod_names
upstream
ttl 30
fallthrough in-addr.arpa ip6.arpa
}
endpoint_pod_names: Use pod names from Endpoints instead of generated namesupstream: Query upstream servers for external queries instead of returning NXDOMAIN
5. prometheus Plugin¶
Exposes Prometheus metrics at:9153/metrics for monitoring DNS performance, query rates, and cache hit ratios.
6. forward Plugin¶
Forwards non-cluster DNS queries to upstream DNS servers:
. /etc/resolv.conf: Forward all non-matched queries to nameservers in CoreDNS pod's/etc/resolv.conf- Alternative:
forward . 8.8.8.8 8.8.4.4(use specific DNS servers) max_concurrent 1000: Limit concurrent upstream queries to prevent overwhelming upstream servers
Custom Upstream Example:
7. cache Plugin¶
Caches DNS responses for 30 seconds to reduce load on the Kubernetes API and upstream DNS servers.Advanced Cache Configuration:
success 9984 30: Cache successful responses for 30 seconds (max 9984 entries)denial 9984 5: Cache NXDOMAIN responses for 5 secondsprefetch 10 60s 10%: Prefetch records that will expire in 60s if accessed by >10% of requests
8. loop Plugin¶
Detects and prevents DNS forwarding loops that could cause infinite query cycles.9. reload Plugin¶
Watches the Corefile for changes and automatically reloads configuration without restarting CoreDNS pods.10. loadbalance Plugin¶
Randomizes the order of A/AAAA records in DNS responses, providing basic client-side load balancing.Custom DNS Records¶
You can add custom DNS entries for internal services or override external domains:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
# Custom static DNS entries
hosts {
192.168.1.100 internal-db.company.local
192.168.1.101 legacy-app.company.local
fallthrough
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
Static Host Entries:
The fallthrough directive ensures that queries not matching static entries continue down the plugin chain.
Rewrite Plugin for DNS Aliasing¶
Create DNS aliases or rewrite queries:
This rewrites queries for api.internal.local to backend.default.svc.cluster.local.
Advanced Rewrite:
Service Discovery Patterns¶
ClusterIP Service DNS¶
The most common service type uses a stable ClusterIP for internal load balancing:
apiVersion: v1
kind: Service
metadata:
name: nginx
namespace: production
spec:
type: ClusterIP
selector:
app: nginx
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080
DNS Resolution:
graph LR
Query[DNS Query] -->|nginx| SearchDomain[Search Domain Expansion]
SearchDomain -->|nginx.production.svc.cluster.local| CoreDNS
CoreDNS -->|Kubernetes Plugin| API[API Server]
API -->|Service ClusterIP| Response[10.96.100.50]
Response --> Client[Client Pod]
style Query fill:#e1f5ff
style CoreDNS fill:#fff4e1
style Response fill:#d4edda
Available DNS Names (from same namespace production):
nginx → 10.96.100.50 (short name)
nginx.production → 10.96.100.50 (namespace qualified)
nginx.production.svc → 10.96.100.50 (service scoped)
nginx.production.svc.cluster.local → 10.96.100.50 (FQDN)
From Different Namespace:
# Short name fails - only works within same namespace
curl http://nginx/
# Error: could not resolve host
# Must use namespace-qualified name
curl http://nginx.production/
# Success: 200 OK
# FQDN always works
curl http://nginx.production.svc.cluster.local/
# Success: 200 OK
SRV Records for Port Discovery:
dig SRV _http._tcp.nginx.production.svc.cluster.local
; ANSWER SECTION:
_http._tcp.nginx.production.svc.cluster.local. 30 IN SRV 0 100 80 nginx.production.svc.cluster.local.
SRV record format: priority weight port target
Headless Service DNS¶
Headless services (with clusterIP: None) return pod IPs directly instead of a stable ClusterIP, enabling direct pod-to-pod communication:
apiVersion: v1
kind: Service
metadata:
name: nginx-headless
namespace: production
spec:
clusterIP: None # Headless service
selector:
app: nginx
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080
DNS Behavior:
graph TB
subgraph "DNS Query Flow"
Query[Query: nginx-headless.production.svc.cluster.local]
CoreDNS[CoreDNS]
API[Kubernetes API]
end
subgraph "Pod Endpoints"
Pod1[nginx-0<br/>10.244.1.10]
Pod2[nginx-1<br/>10.244.2.15]
Pod3[nginx-2<br/>10.244.3.20]
end
Query --> CoreDNS
CoreDNS --> API
API -->|Endpoint IPs| CoreDNS
CoreDNS -->|A Records| Response["10.244.1.10<br/>10.244.2.15<br/>10.244.3.20"]
Response -.-> Pod1
Response -.-> Pod2
Response -.-> Pod3
style Query fill:#e1f5ff
style CoreDNS fill:#fff4e1
style Response fill:#d4edda
DNS Resolution Returns All Pod IPs:
nslookup nginx-headless.production.svc.cluster.local
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: nginx-headless.production.svc.cluster.local
Address: 10.244.1.10
Name: nginx-headless.production.svc.cluster.local
Address: 10.244.2.15
Name: nginx-headless.production.svc.cluster.local
Address: 10.244.3.20
Individual Pod DNS (for StatefulSets):
Example with StatefulSet:
nslookup nginx-0.nginx-headless.production.svc.cluster.local
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: nginx-0.nginx-headless.production.svc.cluster.local
Address: 10.244.1.10
Use Cases for Headless Services:
- StatefulSets: Stable DNS names for individual pods (mongo-0, mongo-1, mongo-2)
- Database clusters: Direct communication with specific database replicas
- Custom load balancing: Applications implementing their own load balancing logic
- Service discovery: Discovering all pod endpoints for a service
Pod DNS Configuration¶
Pods have automatic DNS records based on their IP address:
Default Pod DNS:
Example: 10-244-1-10.default.pod.cluster.local → 10.244.1.10
Pod with Hostname and Subdomain:
apiVersion: v1
kind: Pod
metadata:
name: web
namespace: production
spec:
hostname: web-server
subdomain: nginx-headless
containers:
- name: nginx
image: nginx:1.25
DNS becomes:
This is particularly useful for StatefulSets where predictable DNS names are required.
Troubleshooting DNS Issues¶
Systematic Debugging Workflow¶
When facing DNS issues, follow this methodical approach:
graph TD
Start[DNS Issue Reported] --> Check1{Can pod resolve<br/>cluster services?}
Check1 -->|No| Debug1[Test DNS from pod]
Check1 -->|Yes| Check2{Can pod resolve<br/>external domains?}
Debug1 --> Test1[kubectl exec -it pod -- nslookup kubernetes.default]
Test1 --> Result1{Success?}
Result1 -->|No| Fix1[Check CoreDNS pods,<br/>resolv.conf, network policy]
Result1 -->|Yes| Check2
Check2 -->|No| Debug2[Check upstream DNS]
Check2 -->|Yes| Check3{Specific service<br/>not resolving?}
Debug2 --> Test2[kubectl exec -it pod -- nslookup google.com]
Test2 --> Result2{Success?}
Result2 -->|No| Fix2[Check forward plugin,<br/>upstream DNS servers,<br/>network connectivity]
Result2 -->|Yes| Check3
Check3 -->|Yes| Debug3[Verify service exists]
Debug3 --> Test3[kubectl get svc service-name -n namespace]
Test3 --> Result3{Service exists?}
Result3 -->|No| Fix3[Create service or<br/>fix service name]
Result3 -->|Yes| Debug4[Check endpoints]
Debug4 --> Test4[kubectl get endpoints service-name -n namespace]
Test4 --> Result4{Endpoints ready?}
Result4 -->|No| Fix4[Check pod selector,<br/>pod readiness,<br/>pod labels]
Result4 -->|Yes| Debug5[Check DNS record]
Debug5 --> Test5[kubectl exec -it pod -- nslookup service.namespace.svc.cluster.local]
Test5 --> Result5{Resolves correctly?}
Result5 -->|No| Fix5[Check CoreDNS logs,<br/>restart CoreDNS,<br/>check Corefile config]
Result5 -->|Yes| Fix6[Issue may be<br/>application-level,<br/>not DNS]
style Start fill:#e1f5ff
style Fix1 fill:#f8d7da
style Fix2 fill:#f8d7da
style Fix3 fill:#f8d7da
style Fix4 fill:#f8d7da
style Fix5 fill:#f8d7da
style Fix6 fill:#fff3cd
Essential DNS Testing Commands¶
Test DNS from a debug pod:
# Create a debug pod with DNS tools
kubectl run debug-dns --image=nicolaka/netshoot -it --rm -- /bin/bash
# Inside the pod:
# Test cluster DNS
nslookup kubernetes.default
nslookup kubernetes.default.svc.cluster.local
# Test external DNS
nslookup google.com
# Use dig for detailed information
dig kubernetes.default
dig @10.96.0.10 kubernetes.default # Query CoreDNS directly
# Test specific service
nslookup nginx.production.svc.cluster.local
# Test headless service
nslookup nginx-headless.production.svc.cluster.local
# Test SRV records
dig SRV _http._tcp.nginx.production.svc.cluster.local
Check pod's DNS configuration:
# View resolv.conf
kubectl exec -it pod-name -- cat /etc/resolv.conf
# Expected output:
# nameserver 10.96.0.10
# search default.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5
# Verify DNS connectivity
kubectl exec -it pod-name -- ping -c 3 10.96.0.10
Common DNS Issues and Solutions¶
Issue 1: Pod Cannot Resolve Cluster Services¶
Symptoms:
nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find kubernetes.default: NXDOMAIN
Debugging Steps:
# 1. Check CoreDNS pods are running
kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-5d78c9869d-abcde 1/1 Running 0 5d
coredns-5d78c9869d-fghij 1/1 Running 0 5d
# 2. Check CoreDNS service exists and has endpoints
kubectl get svc -n kube-system kube-dns
kubectl get endpoints -n kube-system kube-dns
# 3. Check CoreDNS logs for errors
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100
# 4. Verify pod's resolv.conf points to CoreDNS
kubectl exec -it pod-name -- cat /etc/resolv.conf
# 5. Test DNS directly from CoreDNS service IP
kubectl exec -it pod-name -- nslookup kubernetes.default 10.96.0.10
Common Causes:
- CoreDNS pods not running (check pod status)
- Wrong nameserver in /etc/resolv.conf (kubelet configuration issue)
- Network policy blocking DNS traffic (port 53 UDP/TCP)
- CoreDNS service ClusterIP changed (check service YAML)
Solutions:
# Restart CoreDNS pods
kubectl rollout restart deployment coredns -n kube-system
# Check network policies blocking DNS
kubectl get networkpolicy -A
kubectl describe networkpolicy <policy-name>
# Verify kubelet DNS configuration
kubectl get configmap kubelet-config -n kube-system -o yaml | grep clusterDNS
Issue 2: Cannot Resolve External Domains¶
Symptoms:
nslookup google.com
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find google.com: NXDOMAIN
Debugging Steps:
# 1. Check CoreDNS forward configuration
kubectl get configmap coredns -n kube-system -o yaml | grep -A 5 forward
# 2. Check CoreDNS can reach upstream DNS
kubectl exec -it -n kube-system <coredns-pod> -- cat /etc/resolv.conf
kubectl exec -it -n kube-system <coredns-pod> -- nslookup google.com
# 3. Test from CoreDNS pod directly
kubectl exec -it -n kube-system <coredns-pod> -- ping -c 3 8.8.8.8
Common Causes: - Upstream DNS servers unreachable (firewall/network issue) - Wrong upstream DNS configuration in Corefile - No internet connectivity from cluster nodes - DNS forwarding disabled or misconfigured
Solutions:
# Update upstream DNS servers in Corefile
kubectl edit configmap coredns -n kube-system
# Change:
# forward . /etc/resolv.conf
# To:
# forward . 8.8.8.8 8.8.4.4 1.1.1.1
# CoreDNS will auto-reload with reload plugin
Issue 3: Service Exists But DNS Doesn't Resolve¶
Symptoms:
kubectl get svc nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx ClusterIP 10.96.100.50 <none> 80/TCP 5m
nslookup nginx.default.svc.cluster.local
** server can't find nginx.default.svc.cluster.local: NXDOMAIN
Debugging Steps:
# 1. Check if service has endpoints
kubectl get endpoints nginx
NAME ENDPOINTS AGE
nginx <none> 5m # No endpoints!
# 2. Check pod labels match service selector
kubectl get svc nginx -o yaml | grep -A 5 selector
kubectl get pods -l app=nginx --show-labels
# 3. Check pods are ready
kubectl get pods -l app=nginx
# 4. Force DNS cache clear by restarting CoreDNS
kubectl rollout restart deployment coredns -n kube-system
Common Causes: - Service selector doesn't match any pods (label mismatch) - Pods exist but aren't ready (readiness probe failing) - DNS cache stale (wait 30s or restart CoreDNS) - Service in wrong namespace
Solutions:
# Fix service selector to match pod labels
kubectl edit svc nginx
# Check why pods aren't ready
kubectl describe pod <pod-name>
# Verify labels match
kubectl get pods --show-labels
kubectl get svc nginx -o jsonpath='{.spec.selector}'
Issue 4: Headless Service Returns No IPs¶
Debugging:
# Check endpoints exist
kubectl get endpoints nginx-headless
# Check pods are ready and have IPs
kubectl get pods -l app=nginx -o wide
# Verify service is truly headless
kubectl get svc nginx-headless -o yaml | grep clusterIP
# Should show: clusterIP: None
# Test DNS resolution
kubectl run debug --image=nicolaka/netshoot -it --rm -- nslookup nginx-headless.default.svc.cluster.local
CoreDNS Log Analysis¶
View CoreDNS logs:
# Follow CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns -f
# Filter for errors
kubectl logs -n kube-system -l k8s-app=kube-dns | grep -i error
# Look for specific query
kubectl logs -n kube-system -l k8s-app=kube-dns | grep "nginx.default"
Enable query logging (debugging only - high volume):
Common Error Patterns:
[ERROR] plugin/errors: 2 example.com. A: read udp 10.96.0.10:53->8.8.8.8:53: i/o timeout
→ Upstream DNS unreachable
[ERROR] plugin/errors: 2 nginx.default.svc.cluster.local. A: no such service
→ Service doesn't exist
[INFO] NXDOMAIN: nginx.production.svc.cluster.local. A: 127.0.0.1:53
→ Service not found (check namespace, service name)
Advanced DNS Topics¶
NodeLocal DNSCache (2025 Best Practice)¶
NodeLocal DNSCache is a DaemonSet that runs a DNS caching agent on cluster nodes, significantly improving DNS performance and reliability by:
- Reducing CoreDNS load (especially in large clusters)
- Minimizing DNS query latency (queries stay on local node)
- Improving reliability (local cache survives CoreDNS pod restarts)
- Reducing network hops (no DNAT traversal)
Architecture:
graph TB
subgraph "Node 1"
Pod1[Application Pod]
NodeCache1[NodeLocal DNSCache<br/>169.254.20.10]
Kubelet1[kubelet]
end
subgraph "Node 2"
Pod2[Application Pod]
NodeCache2[NodeLocal DNSCache<br/>169.254.20.10]
Kubelet2[kubelet]
end
subgraph "kube-system Namespace"
CoreDNS1[CoreDNS Pod]
CoreDNS2[CoreDNS Pod]
end
Pod1 -->|DNS Query| NodeCache1
Pod2 -->|DNS Query| NodeCache2
NodeCache1 -->|Cache Miss| CoreDNS1
NodeCache1 -->|Cache Miss| CoreDNS2
NodeCache2 -->|Cache Miss| CoreDNS1
NodeCache2 -->|Cache Miss| CoreDNS2
Kubelet1 -.->|Configures<br/>169.254.20.10| Pod1
Kubelet2 -.->|Configures<br/>169.254.20.10| Pod2
style Pod1 fill:#e1f5ff
style Pod2 fill:#e1f5ff
style NodeCache1 fill:#d4edda
style NodeCache2 fill:#d4edda
style CoreDNS1 fill:#fff4e1
style CoreDNS2 fill:#fff4e1
Installation (Kubernetes 1.18+):
# Download NodeLocal DNSCache manifest
kubectl apply -f https://k8s.io/examples/admin/dns/nodelocaldns.yaml
# Verify DaemonSet running on all nodes
kubectl get daemonset node-local-dns -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE
node-local-dns 3 3 3 3 3
How It Works:
- NodeLocal DNSCache listens on link-local IP 169.254.20.10
- kubelet configures pods to use 169.254.20.10 as nameserver
- Cache miss queries are forwarded to ClusterIP 10.96.0.10 (CoreDNS)
- Cache hit queries are served immediately from node-local cache
Performance Benefits: - 50-90% reduction in DNS query latency - 70% reduction in CoreDNS CPU usage - Improved reliability during CoreDNS pod restarts
DNS Policies¶
Pods can use different DNS policies to control DNS resolution behavior:
apiVersion: v1
kind: Pod
metadata:
name: dns-example
spec:
dnsPolicy: ClusterFirst # Default
containers:
- name: test
image: nginx
DNS Policy Options:
| Policy | Behavior | Use Case |
|---|---|---|
| ClusterFirst | Use cluster DNS (CoreDNS), fallback to node's DNS for non-cluster domains | Default, most common |
| ClusterFirstWithHostNet | Like ClusterFirst but for pods with hostNetwork: true |
Pods using host networking |
| Default | Inherit DNS config from node's /etc/resolv.conf |
Legacy workloads |
| None | No DNS configuration (must specify dnsConfig) |
Custom DNS setup |
Custom DNS Configuration:
apiVersion: v1
kind: Pod
metadata:
name: custom-dns
spec:
dnsPolicy: None
dnsConfig:
nameservers:
- 1.1.1.1
- 8.8.8.8
searches:
- my-namespace.svc.cluster.local
- svc.cluster.local
- cluster.local
options:
- name: ndots
value: "2"
- name: timeout
value: "5"
containers:
- name: test
image: nginx
ndots Configuration Tuning¶
The ndots:5 default can cause performance issues for external domain queries, as the DNS resolver tries all search domains before the direct query:
Problem:
# Query: api.example.com (1 dot, less than ndots:5)
# DNS queries attempted:
1. api.example.com.default.svc.cluster.local (NXDOMAIN)
2. api.example.com.svc.cluster.local (NXDOMAIN)
3. api.example.com.cluster.local (NXDOMAIN)
4. api.example.com (SUCCESS)
This creates 3 unnecessary queries!
Solution 1: Reduce ndots:
Solution 2: Use FQDNs with trailing dot:
Recommendation: For microservices making many external API calls, reducing ndots to 2-3 can significantly improve DNS performance.
CKA Exam DNS Skills¶
Fast DNS Testing Techniques¶
Quick service resolution test:
# From any pod, test service DNS
kubectl exec -it <pod-name> -- nslookup kubernetes.default
# Create temporary debug pod
kubectl run tmp --image=nicolaka/netshoot --rm -it -- nslookup nginx.production
One-liner DNS verification:
# Test multiple DNS names quickly
for svc in kubernetes nginx backend; do
kubectl exec -it <pod-name> -- nslookup $svc 2>&1 | grep -E "Name:|Address:"
done
Check DNS end-to-end:
# Verify full DNS chain: pod → CoreDNS → service → endpoint
kubectl run test --image=busybox --rm -it -- wget -O- http://nginx.default.svc.cluster.local
Common CKA Exam Scenarios¶
Scenario 1: "Application can't connect to database service"
# Fast troubleshooting workflow
# 1. Verify service exists
kubectl get svc database
# 2. Check endpoints
kubectl get endpoints database
# 3. Test DNS from application pod
kubectl exec -it app-pod -- nslookup database
# 4. If DNS fails, check CoreDNS
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50
Scenario 2: "Fix CoreDNS configuration to use custom upstream DNS"
# Edit Corefile
kubectl edit configmap coredns -n kube-system
# Change forward plugin
# FROM:
forward . /etc/resolv.conf
# TO:
forward . 8.8.8.8 1.1.1.1
# CoreDNS auto-reloads (no restart needed with reload plugin)
Scenario 3: "Create headless service for StatefulSet"
apiVersion: v1
kind: Service
metadata:
name: mongo
spec:
clusterIP: None # Headless!
selector:
app: mongo
ports:
- port: 27017
targetPort: 27017
Time-Saving Tips¶
- Use short names within namespace:
curl http://backendinstead ofbackend.default.svc.cluster.local - Keep debug pod ready:
kubectl run debug --image=nicolaka/netshoot --rm -it -- /bin/bash - Alias common checks:
- Know CoreDNS restart:
kubectl rollout restart deployment coredns -n kube-system - Test DNS quickly:
kubectl exec -it <pod> -- nslookup kubernetes.default
Conclusion¶
DNS and service discovery are the invisible glue that makes Kubernetes networking seamless. CoreDNS automatically creates DNS records for every Service and Pod, enabling applications to discover each other using simple, memorable names instead of ephemeral IP addresses. Understanding CoreDNS architecture, configuration, and troubleshooting is essential for both the CKA exam and production Kubernetes operations.
Key Takeaways:
✅ CoreDNS is Kubernetes' DNS server, running as a Deployment in kube-system with plugin-based configuration
✅ Service DNS follows predictable patterns: <service>.<namespace>.svc.<cluster.local>
✅ Headless services return pod IPs instead of ClusterIP, enabling direct pod-to-pod communication
✅ Search domains enable short names within namespaces, but can impact external query performance
✅ Corefile configuration defines DNS behavior through plugin chains (kubernetes, forward, cache, etc.)
✅ Systematic troubleshooting follows the DNS resolution flow: pod → CoreDNS → API → endpoints
✅ NodeLocal DNSCache (2025 best practice) dramatically improves DNS performance and reliability
✅ CKA exam success requires fast DNS testing skills and confident troubleshooting workflows
Practice Exercises:
- Create a ClusterIP service and test DNS resolution from different namespaces
- Deploy a headless service with StatefulSet and verify individual pod DNS names
- Modify Corefile to use custom upstream DNS servers (1.1.1.1, 8.8.8.8)
- Troubleshoot a broken DNS scenario (intentionally delete CoreDNS pods)
- Configure custom DNS policy with reduced ndots value
- Deploy NodeLocal DNSCache and verify improved query latency
- Use dig/nslookup to inspect SRV records for service port discovery
Further Learning:
- CoreDNS Official Documentation
- Kubernetes DNS Specification
- NodeLocal DNSCache Guide
- DNS Performance Best Practices
Master CoreDNS and service discovery, and you'll have the confidence to troubleshoot any DNS issue in your CKA exam or production environment. DNS is fundamental—get it right, and everything else becomes easier. Now go practice with real clusters and build that muscle memory!