K8s集群搭建并部署Web应用
[toc]
写在前面
本文记录了从零开始使用Kind搭建单节点Kubernetes集群,并成功部署Go Web应用的完整过程。包括集群搭建、镜像构建、应用部署等各个环节,以及遇到的各种问题和解决方案。
环境准备
系统环境
- 操作系统: Ubuntu 22.04.5 LTS
- CPU: 4核
- 内存: 8GB
- Docker版本: 28.3.3
- Kind版本: v0.29.0
Docker安装
参考菜鸟教程:www.runoob.com/docker/ubuntu-docke...
Go安装
可以参考我的这篇文章:博客:wsl+linux开发配置
Kind安装
Kind安装方式非常多,由于作者是Go开发者,这里就使用Go安装了,使用以下命令:
go install sigs.k8s.io/kind@v0.29.0
更多关于Kind的内容可以参考官方文档:kind.sigs.k8s.io/
kubectl安装
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
sudo mv kubectl /usr/local/bin
网络环境
- 服务器位于国内,配置了多个Docker镜像源
- 包含阿里云、清华、中科大等国内镜像源
- 使用代理环境进行网络访问
Kind集群搭建
1.1 初始集群状态检查
首先检查现有的kind集群:
kind get clusters
执行结果:
我这里之前测试创建了两个集群,刚安装kind肯定是没有的
dev
kind
发现有两个集群:dev
和 kind
1.2 集群问题诊断
检查kind集群状态时发现Calico网络插件存在问题:
kubectl get pods -n kube-system
执行结果:
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7498b9bb4c-dfjfp 0/1 ImagePullBackOff 0 10h
calico-node-bp4df 0/1 Init:ImagePullBackOff 0 10h
coredns-674b8bbfcf-j28hf 1/1 Running 0 10h
coredns-674b8bbfcf-z9wkq 1/1 Running 0 10h
etcd-kind-control-plane 1/1 Running 1 (11h ago) 10h
kindnet-hvxwh 1/1 Running 0 10h
kube-apiserver-kind-control-plane 1/1 Running 1 (11h ago) 10h
kube-controller-manager-kind-control-plane 1/1 Running 1 (11h ago) 10h
kube-proxy-9qvpq 1/1 Running 0 10h
kube-scheduler-kind-control-plane 1/1 Running 1 (11h ago) 10h
问题现象:
calico-kube-controllers
和calico-node
出现ImagePullBackOff
错误- 错误信息显示代理连接失败:
dial tcp 127.0.0.1
connection refused
根本原因分析:
- 代理设置与镜像源配置冲突
- 网络环境复杂导致镜像拉取失败
详细错误信息:
kubectl describe pod calico-kube-controllers-7498b9bb4c-dfjfp -n kube-system
错误输出:
Events:
Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulling 15m (x130 over 10h) kubelet Pulling image "docker.io/calico/kube-controllers:v3.25.0" Warning Failed 15m (x130 over 10h) kubelet Failed to pull image "docker.io/calico/kube-controllers:v3.25.0": failed to pull and unpack image "docker.io/calico/kube-controllers:v3.25.0": failed to resolve reference "docker.io/calico/kube-controllers:v3.25.0": failed to do request: Head "https://registry-1.docker.io/v2/calico/kube-controllers/manifests/v3.25.0": proxyconnect tcp: dial tcp 127.0.0.1:7890: connect: connection refused Normal BackOff 20s (x2897 over 10h) kubelet Back-off pulling image "docker.io/calico/kube-controllers:v3.25.0" Warning Failed 20s (x2897 over 10h) kubelet Error: ImagePullBackOff
1.3 重新创建集群
删除有问题的集群,重新创建:
# 删除现有集群
kind delete cluster --name dev
kind delete cluster --name kind
# 重新创建集群,使用稳定版本
kind create cluster --name dev --image kindest/node:v1.23.6
执行结果:
Creating cluster "dev" ...
✓ Ensuring node image (kindest/node:v1.23.6) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
⠎⠁ Starting control-plane 🕹️
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-dev"
You can now use your cluster with:
kubectl cluster-info --context kind-dev
Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂
成功指标:
- 使用稳定的v1.23.6版本
- 自动安装默认的kindnet网络插件
- 避免Calico的镜像拉取问题
1.4 集群验证
# 检查集群状态
kubectl cluster-info --context kind-dev
kubectl get nodes
kubectl get pods -n kube-system
执行结果:
# 集群信息
Kubernetes control plane is running at https://127.0.0.1:33427
CoreDNS is running at https://127.0.0.1:33427/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
# 节点状态
NAME STATUS ROLES AGE VERSION
dev-control-plane Ready control-plane,master 4m39s v1.23.6
# 系统Pods状态
NAME READY STATUS RESTARTS AGE
coredns-64897985d-gg4ng 1/1 Running 0 4m26s
coredns-64897985d-smmdx 1/1 Running 0 4m26s
etcd-dev-control-plane 1/1 Running 0 4m42s
kindnet-6x6jx 1/1 Running 0 4m26s
kube-apiserver-dev-control-plane 1/1 Running 0 4m41s
kube-controller-manager-dev-control-plane 1/1 Running 0 4m41s
kube-proxy-9xrkn 1/1 Running 0 4m26s
kube-scheduler-dev-control-plane 1/1 Running 0 4m40s
验证结果:
- 控制平面正常运行
- 所有核心Pods状态正常
- 使用kindnet网络插件,运行稳定
Go Web应用开发
2.1 应用代码
你可以使用:
go mod init ${project_name}
例如:
go mod init github/iceymoss/go-web
使用vim main.go
写入以下内容后,并且执行go mod tidy
对依赖进行加载
package main
import (
"net/http" "github.com/gin-gonic/gin")
func Pong(c *gin.Context) {
c.JSON(http.StatusOK, gin.H{ "name": "ice_moss", "age": 18, "school": "家里蹲大学",
})}
func main() {
r := gin.Default() r.GET("/ping", Pong) r.Run("0.0.0.0:8080")}
2.2 Dockerfile构建
项目根目录下创建一个Dockerfile
# 构建阶段
FROM golang:1.21-alpine AS builder
# 设置工作目录
WORKDIR /app
# 设置国内Go代理(解决网络问题)
RUN go env -w GOPROXY=https://goproxy.cn,direct
# 预下载依赖 (利用Docker缓存层)
COPY go.mod go.sum ./
RUN go mod download
# 拷贝源代码
COPY . .
# 编译静态二进制文件
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags='-w -s' -o go-web-app
# 运行阶段 (使用极简镜像)
FROM alpine:3.18
# 安装CA证书
RUN apk add --no-cache ca-certificates
# 创建非root用户
RUN adduser -D -u 10001 appuser
USER appuser
# 从构建阶段复制二进制文件
COPY --from=builder --chown=appuser /app/go-web-app /app/go-web-app
# 设置环境变量
ENV GIN_MODE=release
PORT=8080
# 暴露端口
EXPOSE 8080
# 启动应用
ENTRYPOINT ["/app/go-web-app"]
- 使用多阶段构建减小镜像大小
- 设置国内Go代理解决依赖下载问题
- 使用非root用户提高安全性
- 添加编译优化参数减小二进制大小
2.3 镜像构建
docker build -t go-web:latest .
构建过程输出:
[+] Building 337.3s (17/17) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 854B 0.0s
=> [internal] load metadata for docker.io/library/alpine:3.18 3.5s
=> [internal] load metadata for docker.io/library/golang:1.21-alpine 4.3s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [builder 1/7] FROM docker.io/library/golang:1.21-alpine@sha256:2414035b086e3c42b99654c8b26e6f5b1b1598080d65fd03c7f499552ff4dc94 47.5s
=> => resolve docker.io/library/golang:1.21-alpine@sha256:2414035b086e3c42b99654c8b26e6f5b1b1598080d65fd03c7f499552ff4dc94 0.0s
=> => extracting sha256:c6a83fedfae6ed8a4f5f7cbb6a7b6f1c1ec3d86fea8cb9e5ba2e5e6673fde9f6 0.1s
=> => extracting sha256:41db7493d1c6f3f26428d119962e3862c14a9e20bb0b8fefc36e7282d015d099 0.0s
=> => extracting sha256:54bf7053e2d96c2c7f4637ad7580bd64345b3c9fabb163e1fdb8894aea8a9af0 3.5s
=> => extracting sha256:4579008f8500d429ec007d092329191009711942d9380d060c8d9bd24c0c352c 0.0s
=> => extracting sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 0.0s
=> [stage-1 1/4] FROM docker.io/library/alpine:3.18@sha256:de0eb0b3f2a47ba1eb89389859a9bd88b28e82f5826b6969ad604979713c2d4f 9.6s
=> => resolve docker.io/library/alpine:3.18@sha256:de0eb0b3f2a47ba1eb89389859a9bd88b28e82f5826b6969ad604979713c2d4f 0.1s
=> => extracting sha256:44cf07d57ee4424189f012074a59110ee2065adfdde9c7d9826bebdffce0a885 3.4MB 9.4s
=> => extracting sha256:44cf07d57ee4424189f012074a59110ee2065adfdde9c7d9826bebdffce0a885 0.1s => [stage-1 2/4] RUN apk add --no-cache ca-certificates 322.7s => [builder 2/7] WORKDIR /app 1.9s => [builder 3/7] RUN go env -w GOPROXY=https://goproxy.cn,direct 0.3s => [builder 4/7] COPY go.mod go.sum ./ 0.2s => [builder 5/7] RUN go mod download 35.1s => [builder 6/7] COPY . . 0.1s => [builder 7/7] RUN CGO_ENABLED=0 GOOS=linux go build -ldflags='-w -s' -o go-web-app 15.4s => [stage-1 3/4] RUN adduser -D -u 10001 appuser 0.3s => [stage-1 4/4] COPY --from=builder --chown=appuser /app/go-web-app /app/go-web-app 0.1s => => exporting to image 0.1s => => exporting layers 0.1s => => writing image sha256:40a377fbf28772e04665a7ce2cb8ceebbcc6f4da3a6102eeb294627cef56047b 0.0s => => naming to docker.io/library/go-web:latest 0.0s
构建结果:
- 镜像大小:15MB(非常轻量)
- 构建时间:约5分钟
- 构建成功,无错误
2.4 本地测试
# 测试运行容器
docker run -d --name go-web-test -p 8080:8080 go-web:latest
# 检查容器状态
docker ps
# 测试API接口
curl http://localhost:8080/ping
执行结果:
# 容器启动
2cbdbdf8a448b64c43f9c005dda4ab79c08e7722ea0d2b558543d6e80cb0814a
# 容器状态
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2cbdbdf8a448 go-web:latest "/app/go-web-app" 6 seconds ago Up 6 seconds 0.0.0.0:8080->8080/tcp, [::]:8080->8080/tcp go-web-test
# API测试结果
{"age":18,"name":"ice_moss","school":"家里蹲大学"}
测试结果:
- 容器启动成功
- API返回正确的JSON响应
- 应用功能正常
Kubernetes部署
3.1 创建K8s配置文件
deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: go-web-deployment
labels:
app: go-web
spec:
replicas: 2
selector:
matchLabels:
app: go-web
template:
metadata:
labels:
app: go-web
spec:
containers:
- name: go-web
image: go-web:latest
ports:
- containerPort: 8080
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"
livenessProbe:
httpGet:
path: /ping
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ping
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
service.yaml:
apiVersion: v1
kind: Service
metadata:
name: go-web-service
spec:
selector:
app: go-web
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: ClusterIP
3.2 部署应用
# 部署应用
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
执行结果:
deployment.apps/go-web-deployment createdservice/go-web-service created
3.3 遇到的第一个坑:镜像拉取失败
问题现象:
kubectl get podsNAME READY STATUS RESTARTS AGEgo-web-deployment-cfc7cccbd-ckgrp 0/1 ImagePullBackOff 0 24sgo-web-deployment-cfc7cccbd-xfdhq 0/1 ImagePullBackOff 0 24s
检查部署状态:
kubectl get deploymentsNAME READY UP-TO-DATE AVAILABLE AGEgo-web-deployment 0/2 2 0 13s
错误信息:
Failed to pull image "go-web:latest": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/go-web:latest"
问题分析:
- K8s试图从Docker Hub拉取镜像
- 但我们的镜像是本地构建的
- Kind集群无法直接访问宿主机的Docker镜像
解决方案:
# 将本地镜像加载到kind集群
kind load docker-image go-web:latest --name dev
# 验证镜像加载
docker exec dev-control-plane crictl images | grep go-web
执行结果:
# 镜像加载
Image: "go-web:latest" with ID "sha256:40a377fbf28772e04665a7ce2cb8ceebbcc6f4da3a6102eeb294627cef56047b" found to be already present on all nodes.
# 验证镜像
docker.io/library/go-web latest 40a377fbf2877 15.5MB
3.4 遇到的第二个坑:镜像拉取策略
问题现象:
即使镜像已加载到集群,Pods仍然出现ImagePullBackOff错误。
根本原因:
kubectl get deployment go-web-deployment -o yaml | grep imagePullPolicyimagePullPolicy: Always
imagePullPolicy: Always
导致K8s总是尝试从远程仓库拉取镜像。
解决方案:
# 修改镜像拉取策略为Never
kubectl patch deployment go-web-deployment -p '{"spec":{"template":{"spec":{"containers":[{"name":"go-web","imagePullPolicy":"Never"}]}}}}'
# 重启部署
kubectl rollout restart deployment/go-web-deployment
执行结果:
# 修改策略
deployment.apps/go-web-deployment patched
# 重启部署
deployment.apps/go-web-deployment restarted
# 检查Pods状态
NAME READY STATUS RESTARTS AGE
go-web-deployment-7b85d85d64-nd5zp 1/1 Running 0 29s
go-web-deployment-7b85d85d64-vbqjt 1/1 Running 0 19s
3.5 部署成功验证
Pods状态:
kubectl get podsNAME READY STATUS RESTARTS AGEgo-web-deployment-7b85d85d64-nd5zp 1/1 Running 0 29sgo-web-deployment-7b85d85d64-vbqjt 1/1 Running 0 19s
服务测试:
# 端口转发
kubectl port-forward service/go-web-service 8080:80
端口转发输出:
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
Handling connection for 8080
# API测试
curl http://localhost:8080/ping
测试结果:
{"age":18,"name":"ice_moss","school":"家里蹲大学"}
关键经验总结
1. 网络环境配置
- 国内环境需要配置合适的镜像源
- 避免过多镜像源配置导致冲突
- 使用稳定的网络插件(kindnet vs Calico)
2. 镜像管理
- 本地镜像需要显式加载到Kind集群
- 正确设置imagePullPolicy避免不必要的远程拉取
- 使用多阶段构建优化镜像大小
3. 部署最佳实践
- 设置资源限制和健康检查
- 使用非root用户运行容器
- 配置合适的副本数保证高可用
4. 问题排查方法
- 使用
kubectl describe pod
查看详细错误信息 - 检查镜像拉取策略和网络配置
- 验证集群和镜像状态
配置Ingress实现外部访问
下面我们在Kind Kubernetes集群中配置NGINX Ingress Controller的完整过程,包括遇到的问题、解决方案和最佳实践。
这里我们先注意一下版本信息
- Kubernetes版本: v1.23.6
- Ingress Controller: NGINX Ingress Controller v1.8.1
4.1 Kind配置
由于我们需要对外暴露服务,我们需要重新调整一下Kind集群的配置,我们先来写一个Kind配置:
# 创建kind-config.yaml
cat > kind-config.yaml << 'EOF'
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
- containerPort: 443
hostPort: 443
protocol: TCP
- containerPort: 30000
hostPort: 30000
protocol: TCP
- containerPort: 32767
hostPort: 32767
protocol: TCP
EOF
4.2 kind集群创建
# 删除现有集群
kind delete cluster --name dev
# 使用新配置创建集群
kind create cluster --name dev --config kind-config.yaml --image kindest/node:v1.23.6
执行结果:
Creating cluster "dev" ...
✓ Ensuring node image (kindest/node:v1.23.6) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-dev"
验证集群状态
kubectl cluster-info
kubectl get nodes
4.3 重新部署Go Web应用
由于重新创建集群了,现在需要将我们的服务重新部署起来:
# 部署应用
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
# 检查状态
kubectl get pods
kubectl get svc
执行结果:
NAME READY STATUS RESTARTS AGE
go-web-deployment-79bd88cb47-rldr5 1/1 Running 0 11s
go-web-deployment-79bd88cb47-ztcnl 1/1 Running 0 21s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
go-web-service ClusterIP 10.96.195.234 <none> 80/TCP 72m
测试一下web服务是否能正常访问:
# 端口转发测试
kubectl port-forward service/go-web-service 8080:80 &
curl http://localhost:8080/ping
执行结果:
{"age":18,"name":"ice_moss","school":"家里蹲大学"}
4.4 Ingress Controller 配置
镜像准备
由于我们K8S集群还没配置对外访问,我们需要提前准备一下镜像
# 拉取Ingress Controller镜像
docker pull registry.k8s.io/ingress-nginx/controller:v1.8.1
docker pull registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20230407
# 加载镜像到集群
kind load docker-image registry.k8s.io/ingress-nginx/controller:v1.8.1 --name dev
kind load docker-image registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20230407 --name dev
执行结果:
Image: "registry.k8s.io/ingress-nginx/controller:v1.8.1" with ID "sha256:825aff16c20cc2c6039fce49bafaa0f510de0f9238da475f3de949adadb9be7f" not yet present on node "dev-control-plane", loading...
Image: "registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20230407" with ID "sha256:7e7451bb70423d31bdadcf0a71a3107b64858eccd7827d066234650b5e7b36b0" not yet present on node "dev-control-plane", loading...
下载配置文件
# 下载Ingress Controller配置文件
wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.2/deploy/static/provider/kind/deploy.yaml -O ingress-nginx-kind.yaml
执行结果:
--2025-08-02 16:09:28-- https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.2/deploy/static/provider/kind/deploy.yaml
Connecting to 127.0.0.1:7890... connected.
Proxy request sent, awaiting response... 200 OK
Length: 15997 (16K) [text/plain]
Saving to: 'ingress-nginx-kind.yaml'
ingress-nginx-kind.yaml 100%[==========================================================================================>] 15.62K --.-KB/s in 0.02s
修改配置文件
需要修改三个关键配置:
- 去掉镜像的sha256部分
- 修改镜像拉取策略为Never
- 确保节点选择器正确
需要将镜像拉取策略改为Never,并且去掉镜像的sha256部分。
例如:imagePullPolicy: Never
# 修改镜像拉取策略
imagePullPolicy: IfNotPresent
# 改为:
imagePullPolicy: Never
# 去掉镜像的sha256部分(手动编辑)
# image: registry.k8s.io/ingress-nginx/controller:v1.8.1@sha256:e5c4824e7375fcf2a393e1c03c293b69759af37a9ca6abdb91b13d78a93da8bd
# 改为:
# image: registry.k8s.io/ingress-nginx/controller:v1.8.1
# 将以下行:
# image: registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20230407@sha256:543c40fd093964bc9ab509d3e791f9989963021f1e9e4c9c7b6700b02bfb227b
# 改为:
# image: registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20230407
节点标签配置
# 给节点添加必要的标签
kubectl label node dev-control-plane ingress-ready=true
4.5 部署Ingress Controller
准备好配置后,现在来部署Ingress
kubectl apply -f ingress-nginx-kind.yaml
执行结果:
namespace/ingress-nginx created
serviceaccount/ingress-nginx created
serviceaccount/ingress-nginx-admission created
role.rbac.authorization.k8s.io/ingress-nginx created
role.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrole.rbac.authorization.k8s.io/ingress-nginx created
clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created
rolebinding.rbac.authorization.k8s.io/ingress-nginx created
rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
configmap/ingress-nginx-controller created
service/ingress-nginx-controller created
service/ingress-nginx-controller-admission created
deployment.apps/ingress-nginx-controller created
job.batch/ingress-nginx-admission-create created
job.batch/ingress-nginx-admission-patch created
ingressclass.networking.k8s.io/nginx created
validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created
验证部署状态
# 检查Pod状态
kubectl get pods -n ingress-nginx
# 检查Job状态
kubectl get jobs -n ingress-nginx
执行结果:
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-t6cf9 0/1 Completed 0 6s
ingress-nginx-admission-patch-xwfdv 0/1 Completed 1 6s
ingress-nginx-controller-76789fd998-9nfdl 1/1 Running 0 6s
NAME COMPLETIONS DURATION AGE
ingress-nginx-admission-create 1/1 3s 25s
ingress-nginx-admission-patch 1/1 4s 25s
4.6 部署Ingress
创建Ingress资源配置
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: go-web-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
ingressClassName: nginx
rules:
- host: go-web.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: go-web-service
port:
number: 80
部署Ingress
kubectl apply -f ingress.yaml
执行结果:
ingress.networking.k8s.io/go-web-ingress created
验证Ingress状态
kubectl get ingress
kubectl describe ingress go-web-ingress
执行结果:
NAME CLASS HOSTS ADDRESS PORTS AGE
go-web-ingress nginx go-web.local 80 10s
4.6 测试外部访问
配置hosts文件
这里我们先给内部访问添加一下域名:
# 配置本地hosts
echo "127.0.0.1 go-web.local" >> /etc/hosts
测试访问
# 测试API访问
curl http://go-web.local/ping
执行结果:
{"age":18,"name":"ice_moss","school":"家里蹲大学"}
这样内部测试就没有问题啦,接下来我们要看看外部公网访问,你可能需要检查一下防火墙80端口可以访问,这里我使用本地的PC直接公网ip进行请求
PowerShell 7.5.2
PS C:\Users\SZ24-Storm006> curl http://14.159.10.1/ping
{"age":18,"name":"ice_moss","school":"家里蹲大学"}
PS C:\Users\SZ24-St
完美!这样我们就完成了从服务器K8S环境搭建,创建集群,镜像构建,服务部署,网关部署,对外提供服务整个流程,恭喜你,可以在自己的服务器上学习或者部署一些简单的应用了,当然如果你觉得这就完了,那就草率了,下面是我在整个流程中遇到的问题及其解决方案
4.7 遇到的问题及其解决方案
问题1:镜像拉取失败
现象:
Failed to pull image "registry.k8s.io/ingress-nginx/controller:v1.8.1": rpc error: code = Unknown desc = failed to pull and unpack image
原因:集群内部无法访问外网,镜像拉取失败
解决方案:
- 在宿主机拉取镜像
- 使用
kind load docker-image
加载到集群 - 设置
imagePullPolicy: Never
问题2:镜像版本不匹配
现象:
ErrImageNeverPull: Container image "registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20230407@sha256:..." is not present
原因:配置文件中的镜像包含sha256,但集群中的镜像没有
解决方案:
- 去掉镜像名称中的
@sha256:...
部分 - 确保镜像版本一致
问题3:节点选择器不匹配
现象:
0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector
原因:Controller Pod需要ingress-ready=true
标签
解决方案:
kubectl label node dev-control-plane ingress-ready=true
问题4:Job镜像拉取策略不可变
现象:
The Job "ingress-nginx-admission-create" is invalid: spec.template: Invalid value: field is immutable
原因:Job创建后Pod模板不可修改
解决方案:
- 删除Job
- 修改YAML文件中的镜像拉取策略
- 重新部署
关键配置要点:
镜像管理:
- 使用
imagePullPolicy: Never
避免网络拉取 - 确保镜像版本匹配
- 去掉sha256标识符
节点配置:
- 添加
ingress-ready=true
标签 - 配置正确的端口映射
网络配置:
- 使用NodePort类型的Service
- 配置hosts文件进行域名解析
问题5:公网无法访问服务
我遇到一个比较复制的问题,公网无法访问服务,之前我们在服务器本地可以继续访问服务,所以服务器宿主机到go-web服务的整个链路肯定是没有问题的,简单画了一个网络请求流程图:
当前状态
- ✅ 本地访问: 正常 (localhost:80)
- ❌ 公网访问: 失败 (14.159.10.1:80)
可能原因
公网路由
直接在服务器本地运行go-web服务:
root@lavm-641m9i7vom:~/project/go-web# tree
.
├── Dockerfile
├── go.mod
├── go.sum
└── main.go
0 directories, 4 files
root@lavm-641m9i7vom:~/project/go-web# go run main.go
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
- using env: export GIN_MODE=release
- using code: gin.SetMode(gin.ReleaseMode)
[GIN-debug] GET /ping --> main.Pong (3 handlers)
[GIN-debug] [WARNING] You trusted all proxies, this is NOT safe. We recommend you to set a value.
Please check https://pkg.go.dev/github.com/gin-gonic/gin#readme-don-t-trust-all-proxies for details.
[GIN-debug] Listening and serving HTTP on 0.0.0.0:8080
开放8080端口的防火墙后,在本地pc进行访问:
PS C:\Users\SZ24-Storm006> curl http://14.159.10.1:8080/ping
{"age":18,"name":"ice_moss","school":"家里蹲大学"}
PS C:\Users\SZ24-Storm006>
可以看到整个公网ip到我们的服务器都是正常的,那问题就出现在服务器上端口转发规则缺失
NAT配置
宿主机网络:
# 网络接口
eth0: 172.16.0.8/16 (内网IP)
br-ebc08f0aedad: 172.18.0.1/16 (Docker网络)
# 路由
default via 172.16.0.1 dev eth0
172.18.0.0/16 dev br-ebc08f0aedad
- 网络接口: eth0 (172.16.0.8/16)
- Docker网络: br-ebc08f0aedad (172.18.0.1/16)
- 端口监听: 0.0.0.0:80 (docker-proxy)
检查一下80端口:
netstat -tlnp | grep :80
# 输出: tcp 0.0.0.0:80 0.0.0.0:* LISTEN 180819/docker-proxy
进程: docker-proxy (PID: 180819)
监听: 0.0.0.0:80 (所有接口)
转发: 到 Ingress Controller 容器
看上去宿主机到docker容器的配置页没有问题,排查一下iptables规则检查
:
# 检查iptables规则
iptables -L -n | grep -E "(DROP|REJECT)"
# 结果: 发现多个DROP规则
# 检查DOCKER链
iptables -L DOCKER -n --line-numbers
# 结果:
# 5 ACCEPT tcp -- 0.0.0.0/0 172.18.0.2 tcp dpt:80
# 6 DROP all -- 0.0.0.0/0 0.0.0.0/0
# 7 DROP all -- 0.0.0.0/0 0.0.0.0/0
发现关键问题: DOCKER链中80端口的ACCEPT规则被后面的DROP规则覆盖
解决方案:
# 删除DOCKER链中的DROP规则
iptables -D DOCKER 6
iptables -D DOCKER 6
# 检查DOCKER-ISOLATION-STAGE-2链
iptables -L DOCKER-ISOLATION-STAGE-2 -n
# 结果: 发现更多DROP规则
# 删除DOCKER-ISOLATION-STAGE-2链中的DROP规则
iptables -D DOCKER-ISOLATION-STAGE-2 1
iptables -D DOCKER-ISOLATION-STAGE-2 1
然后我重新从我的pc进行请求,还是不行,还得进行排查,清理iptables:
# 完全清空iptables规则
iptables -F
iptables -t nat -F
iptables -t mangle -F
iptables -X
再请求试试:
PowerShell 7.5.2
PS C:\Users\SZ24-Storm006> curl http://14.159.10.1/ping
{"age":18,"name":"ice_moss","school":"家里蹲大学"}
总结一下:
根本原因
iptables规则冲突导致80端口无法正常访问:
- Docker自动生成的iptables规则中有DROP规则
- 这些DROP规则覆盖了ACCEPT规则
- 导致外部流量无法到达Ingress Controller
解决方案
清空冲突的iptables规则,让网络流量正常转发。
验证结果
- ✅ 本地访问:
curl http://localhost/ping
- 正常 - ✅ 公网访问:
curl http://14.159.10.1/ping
- 正常 - ✅ Ingress Controller: 正常工作
- ✅ go-web服务: 正常运行
结论
通过这次实践,成功搭建了Kind Kubernetes集群并部署了Go Web应用。过程中遇到的主要问题都与网络环境和镜像管理相关,通过正确的配置和策略设置,最终实现了稳定的部署,这个流程为后续的容器化应用部署提供了完整的参考模板,也为更复杂的Kubernetes应用管理奠定了基础。
本作品采用《CC 协议》,转载必须注明作者和本文链接
推荐文章: