30-StatefulSets
concepts/workloads/controllers/statefulset/
StatefulSets
StatefulSet is the workload API object used to manage stateful applications. statefulset是用于管理有状态应用程序的工作负载api对象。
Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods. 管理一组pod,并保证这些pod的有序性和唯一性。
Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling. 像部署一样,statefulset管理基于相同容器规范的pod。与部署不同,statefulset为每个pod维护一个粘性标识。这些pod是从同一个规范创建的,但不能互换:每个pod都有一个持久标识符,它在任何重新调度中都会维护这个标识符。
- Using StatefulSets
- Limitations
- Components
- Pod Selector
- Pod Identity
- Deployment and Scaling Guarantees
- Update Strategies
- What's next
Using StatefulSets
StatefulSets are valuable for applications that require one or more of the following. statefulset对于需要以下一项或多项的应用程序很有价值。
- Stable, unique network identifiers. 稳定、唯一的网络标识符。
- Stable, persistent storage. 稳定持久的存储。
- Ordered, graceful deployment and scaling. 有序、优雅的部署和扩展。
- Ordered, automated rolling updates. 有序、自动滚动更新。
In the above, stable is synonymous with persistence across Pod (re)scheduling. If an application doesn’t require any stable identifiers or ordered deployment, deletion, or scaling, you should deploy your application using a workload object that provides a set of stateless replicas. Deployment or ReplicaSet may be better suited to your stateless needs. 在上面,stable是pod(re)调度中持久性的同义词。如果应用程序不需要任何稳定的标识符或有序的部署、删除或扩展,则应使用提供一组无状态副本的工作负载对象来部署应用程序。部署或复制集可能更适合您的无状态需求。
Limitations
- The storage for a given Pod must either be provisioned by a PersistentVolume Provisioner based on the requested
storage class
, or pre-provisioned by an admin. 给定pod的存储必须由基于请求的“存储类”的PersistentVolume Provisioner提供,或者由管理员预先提供。 - Deleting and/or scaling a StatefulSet down will not delete the volumes associated with the StatefulSet. This is done to ensure data safety, which is generally more valuable than an automatic purge of all related StatefulSet resources. 删除和/或缩放statefulset down将不删除与statefulset关联的卷。这样做是为了确保数据安全,这通常比自动清除所有相关的statefulset资源更有价值。
- StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods. You are responsible for creating this Service. statefulset当前需要一个无头服务来负责pods的网络标识。您负责创建此服务。
- StatefulSets do not provide any guarantees on the termination of pods when a StatefulSet is deleted. To achieve ordered and graceful termination of the pods in the StatefulSet, it is possible to scale the StatefulSet down to 0 prior to deletion. statefulset不保证在删除statefulset时终止pods。为了实现statefulset中pods的有序和优雅终止,可以在删除之前将statefulset缩小到0。
- When using Rolling Updates with the default Pod Management Policy (
OrderedReady
), it’s possible to get into a broken state that requires manual intervention to repair. 使用带有默认pod管理策略(orderedready)的滚动更新时,可能会进入需要手动干预才能修复的断开状态。
Components
The example below demonstrates the components of a StatefulSet. 下面的示例演示statefulset的组件。
- A Headless Service, named nginx, is used to control the network domain. 一个名为nginx的无头服务用于控制网络域。
- The StatefulSet, named web, has a Spec that indicates that 3 replicas of the nginx container will be launched in unique Pods. statefulset名为web,它有一个规范,指出nginx容器的3个副本将在惟一的pods中启动。
- The volumeClaimTemplates will provide stable storage using PersistentVolumes provisioned by a PersistentVolume Provisioner. VolumeClaimTemplates将使用由PersistentVolume Provisioner配置的PersistentVolumes提供稳定的存储。
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 3 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "my-storage-class"
resources:
requests:
storage: 1Gi
Pod Selector
You must set the .spec.selector
field of a StatefulSet to match the labels of its .spec.template.metadata.labels
. Prior to Kubernetes 1.8, the .spec.selector
field was defaulted when omitted. In 1.8 and later versions, failing to specify a matching Pod Selector will result in a validation error during StatefulSet creation. 必须设置statefulset的.spec.selector
字段,使其与其.spec.template.metadata.labels
的标签匹配。在kubernetes 1.8之前,.spec.selector
字段在省略时是默认的。在1.8及更高版本中,如果未能指定匹配的pod选择器,将在statefulset创建期间导致验证错误。
Pod Identity
StatefulSet Pods have a unique identity that is comprised of an ordinal, a stable network identity, and stable storage. The identity sticks to the Pod, regardless of which node it’s (re)scheduled on. statefulset pod有一个唯一的标识,它由序数、稳定的网络标识和稳定的存储组成。无论它在哪个节点上(重新)调度,标识都会固定在pod上。
Ordinal Index
For a StatefulSet with N replicas, each Pod in the StatefulSet will be assigned an integer ordinal, from 0 up through N-1, that is unique over the Set. 对于具有n个副本的statefulset,statefulset中的每个pod将被分配一个整数序数,从0到n-1,在该集合上是唯一的。
Stable Network ID
Each Pod in a StatefulSet derives its hostname from the name of the StatefulSet and the ordinal of the Pod. The pattern for the constructed hostname is $(statefulset name)-$(ordinal)
. The example above will create three Pods named web-0,web-1,web-2
. A StatefulSet can use a Headless Service to control the domain of its Pods. The domain managed by this Service takes the form: $(service name).$(namespace).svc.cluster.local
, where “cluster.local” is the cluster domain. As each Pod is created, it gets a matching DNS subdomain, taking the form: $(podname).$(governing service domain)
, where the governing service is defined by the serviceName
field on the StatefulSet. statefulset中的每个pod从statefulset的名称和pod的序号派生其主机名。构造的主机名的模式是$(statefulset name)-$(ordinal)
。上面的示例将创建三个名为web-0、web-1、web-2
的pod。statefulset可以使用[headless service](https://kubernetes.io/docs/concepts/servic... networking/service/headless services)来控制其pods的域。此服务管理的域的格式为:$(服务名称)。$(命名空间).svc.cluster.local
,其中“cluster.local”是群集域。创建每个pod时,它将获得一个匹配的dns子域,其格式为:$(podname).$(管理服务域)
,其中管理服务由statefulset上的“servicename”字段定义。
As mentioned in the limitations section, you are responsible for creating the Headless Service responsible for the network identity of the pods. 如限制部分所述,您负责创建负责pods的网络标识的无头服务。
Here are some examples of choices for Cluster Domain, Service name, StatefulSet name, and how that affects the DNS names for the StatefulSet’s Pods. 下面是一些选择群集域、服务名称、statefulset名称的示例,以及这如何影响statefulset的pod的dns名称。
Cluster Domain | Service (ns/name) | StatefulSet (ns/name) | StatefulSet Domain | Pod DNS | Pod Hostname |
---|---|---|---|---|---|
cluster.local | default/nginx | default/web | nginx.default.svc.cluster.local | web-{0..N-1}.nginx.default.svc.cluster.local | web-{0..N-1} |
cluster.local | foo/nginx | foo/web | nginx.foo.svc.cluster.local | web-{0..N-1}.nginx.foo.svc.cluster.local | web-{0..N-1} |
kube.local | foo/nginx | foo/web | nginx.foo.svc.kube.local | web-{0..N-1}.nginx.foo.svc.kube.local | web-{0..N-1} |
Note: Cluster Domain will be set to
cluster.local
unless otherwise configured. 除非另有配置,否则群集域将设置为cluster.local。
Stable Storage
Kubernetes creates one PersistentVolume for each VolumeClaimTemplate. In the nginx example above, each Pod will receive a single PersistentVolume with a StorageClass of my-storage-class
and 1 Gib of provisioned storage. If no StorageClass is specified, then the default StorageClass will be used. When a Pod is (re)scheduled onto a node, its volumeMounts
mount the PersistentVolumes associated with its PersistentVolume Claims. Note that, the PersistentVolumes associated with the Pods’ PersistentVolume Claims are not deleted when the Pods, or StatefulSet are deleted. This must be done manually. Kubernetes为每个VolumeClaimTemplate创建一个PersistentVolume。在上面的nginx示例中,每个pod将接收一个persistenvolume,其中一个storage class是我的存储类,另一个是1gib的已配置存储。如果未指定StorageClass,则将使用默认的StorageClass。当pod被(重新)调度到节点上时,它的volumemounts装载与其persistenvolume声明相关联的persistenvolumes。注意,当pods或statefulset被删除时,与pods的persistenvolume声明相关联的persistenvolumes不会被删除。这必须手动完成。
Pod Name Label
When the StatefulSet Controller creates a Pod, it adds a label, statefulset.kubernetes.io/pod-name
, that is set to the name of the Pod. This label allows you to attach a Service to a specific Pod in the StatefulSet. 当statefulset控制器创建pod时,它会添加一个标签“statefulset.kubernetes.io/pod name”,该标签设置为pod的名称。此标签允许您将服务附加到statefulset中的特定pod。
Deployment and Scaling Guarantees
- For a StatefulSet with N replicas, when Pods are being deployed, they are created sequentially, in order from {0..N-1}. 对于具有n个副本的statefulset,在部署pod时,按照{0..n-1}的顺序依次创建它们。
- When Pods are being deleted, they are terminated in reverse order, from {N-1..0}. 当pod被删除时,它们以相反的顺序终止,从
- Before a scaling operation is applied to a Pod, all of its predecessors must be Running and Ready. 在将缩放操作应用于pod之前,它的所有前置任务都必须运行并准备就绪。
- Before a Pod is terminated, all of its successors must be completely shutdown. 在终止POD之前,必须完全关闭其所有后续程序。
The StatefulSet should not specify a pod.Spec.TerminationGracePeriodSeconds
of 0. This practice is unsafe and strongly discouraged. For further explanation, please refer to force deleting StatefulSet Pods. statefulset不应将“pod.spec.terminationgraceperiodseconds”指定为0。这种做法是不安全的,强烈反对。有关进一步的解释,请参阅强制删除statefulset pods。
When the nginx example above is created, three Pods will be deployed in the order web-0, web-1, web-2. web-1 will not be deployed before web-0 is Running and Ready, and web-2 will not be deployed until web-1 is Running and Ready. If web-0 should fail, after web-1 is Running and Ready, but before web-2 is launched, web-2 will not be launched until web-0 is successfully relaunched and becomes Running and Ready. 当创建上述nginx示例时,将按照web-0、web-1和web-2的顺序部署三个pod。在web-0运行并准备就绪之前,不会部署web-1,在web-1运行并准备就绪之前,不会部署web-2。如果web-0失败,则在web-1运行并准备就绪之后,但在web-2启动之前,web-2将不会启动,直到web-0成功重新启动并开始运行并准备就绪。
If a user were to scale the deployed example by patching the StatefulSet such that replicas=1
, web-2 would be terminated first. web-1 would not be terminated until web-2 is fully shutdown and deleted. If web-0 were to fail after web-2 has been terminated and is completely shutdown, but prior to web-1’s termination, web-1 would not be terminated until web-0 is Running and Ready. 如果用户要通过修补statefulset以使replicas=1来扩展部署的示例,那么web-2将首先终止。在完全关闭并删除web-2之前,web-1不会终止。如果web-0在web-2终止并完全关闭之后失败,但在web-1终止之前,web-1将不会终止,直到web-0运行并准备就绪。
Pod Management Policies
In Kubernetes 1.7 and later, StatefulSet allows you to relax its ordering guarantees while preserving its uniqueness and identity guarantees via its .spec.podManagementPolicy
field. 在Kubernetes 1.7及更高版本中,statefulset允许您放宽其排序保证,同时通过其.spec.podmanagementpolicy字段保留其唯一性和标识保证。
OrderedReady Pod Management
OrderedReady
pod management is the default for StatefulSets. It implements the behavior described above. orderedready pod management是statefulset的默认设置。它实现了上面描述的行为。
Parallel Pod Management
Parallel
pod management tells the StatefulSet controller to launch or terminate all Pods in parallel, and to not wait for Pods to become Running and Ready or completely terminated prior to launching or terminating another Pod. This option only affects the behavior for scaling operations. Updates are not affected. 并行pod管理告诉statefulset控制器并行启动或终止所有pod,不要等到pod运行并准备就绪或完全终止后再启动或终止另一个pod。此选项仅影响缩放操作的行为。更新不受影响。
Update Strategies
In Kubernetes 1.7 and later, StatefulSet’s .spec.updateStrategy
field allows you to configure and disable automated rolling updates for containers, labels, resource request/limits, and annotations for the Pods in a StatefulSet. 在Kubernetes 1.7及更高版本中,statefulset的.spec.updatestrategy字段允许您配置和禁用statefulset中pods的容器、标签、资源请求/限制和注释的自动滚动更新。
On Delete
The OnDelete
update strategy implements the legacy (1.6 and prior) behavior. When a StatefulSet’s .spec.updateStrategy.type
is set to OnDelete
, the StatefulSet controller will not automatically update the Pods in a StatefulSet. Users must manually delete Pods to cause the controller to create new Pods that reflect modifications made to a StatefulSet’s .spec.template
. “ondelete”更新策略实现遗留(1.6及以前版本)行为。当statefulset的.spec.updatestrategy.type
设置为“ondelete”时,statefulset控制器不会自动更新statefulset中的pods。用户必须手动删除pod,以使控制器创建反映对statefulset的.spec.template
所做修改的新pod。
Rolling Updates
The RollingUpdate
update strategy implements automated, rolling update for the Pods in a StatefulSet. It is the default strategy when .spec.updateStrategy
is left unspecified. When a StatefulSet’s .spec.updateStrategy.type
is set to RollingUpdate
, the StatefulSet controller will delete and recreate each Pod in the StatefulSet. It will proceed in the same order as Pod termination (from the largest ordinal to the smallest), updating each Pod one at a time. It will wait until an updated Pod is Running and Ready prior to updating its predecessor.
rolling update更新策略为statefulset中的pod实现自动的滚动更新。当.spec.updatestregy未指定时,它是默认策略。当statefulset的.spec.updatestrategy.type设置为rollingupdate时,statefulset控制器将删除并重新创建statefulset中的每个pod。它将按照与pod终止相同的顺序(从最大序数到最小序数)进行,每次更新一个pod。它将等到更新后的pod运行并准备就绪后,才能更新其前身。
Partitions
The RollingUpdate
update strategy can be partitioned, by specifying a .spec.updateStrategy.rollingUpdate.partition
. If a partition is specified, all Pods with an ordinal that is greater than or equal to the partition will be updated when the StatefulSet’s .spec.template
is updated. All Pods with an ordinal that is less than the partition will not be updated, and, even if they are deleted, they will be recreated at the previous version. If a StatefulSet’s .spec.updateStrategy.rollingUpdate.partition
is greater than its .spec.replicas
, updates to its .spec.template
will not be propagated to its Pods. In most cases you will not need to use a partition, but they are useful if you want to stage an update, roll out a canary, or perform a phased roll out. 可以通过指定.spec.update strategy.rollingupdate.partition对rollingupdate更新策略进行分区。如果指定了分区,则当statefulset的.spec.template更新时,序号大于或等于分区的所有pod都将更新。序号小于分区的所有pod都将不会更新,即使删除它们,也将在以前的版本中重新创建。如果statefulset的.spec.updatestrategy.rollingupdate.partition大于其.spec.replicas,则对其.spec.template的更新将不会传播到其pods。在大多数情况下,您不需要使用分区,但如果您希望进行更新、展开金丝雀或执行分阶段展开,则分区非常有用。
Forced Rollback
When using Rolling Updates with the default Pod Management Policy (OrderedReady
), it’s possible to get into a broken state that requires manual intervention to repair. 使用带有默认pod管理策略(orderedready)的滚动更新时,可能会进入需要手动干预才能修复的断开状态。
If you update the Pod template to a configuration that never becomes Running and Ready (for example, due to a bad binary or application-level configuration error), StatefulSet will stop the rollout and wait. 如果将pod模板更新为一个永远不会运行并准备就绪的配置(例如,由于错误的二进制或应用程序级配置错误),statefulset将停止卷展并等待。
In this state, it’s not enough to revert the Pod template to a good configuration. Due to a known issue, StatefulSet will continue to wait for the broken Pod to become Ready (which never happens) before it will attempt to revert it back to the working configuration. 在此状态下,仅将pod模板还原为良好配置是不够的。由于已知的问题,statefulset将继续等待损坏的pod准备就绪(这永远不会发生),然后再尝试将其还原回工作配置。
After reverting the template, you must also delete any Pods that StatefulSet had already attempted to run with the bad configuration. StatefulSet will then begin to recreate the Pods using the reverted template. 还原模板后,还必须删除statefulset已尝试使用错误配置运行的任何pod。然后statefulset将开始使用还原的模板重新创建pods。
What's next
- Follow an example of deploying a stateful application.
- Follow an example of deploying Cassandra with Stateful Sets.
Feedback
Was this page helpful?
本作品采用《CC 协议》,转载必须注明作者和本文链接