30-StatefulSets

concepts/workloads/controllers/statefulset/

StatefulSets

StatefulSet is the workload API object used to manage stateful applications. statefulset是用于管理有状态应用程序的工作负载api对象。

Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods. 管理一组pod,并保证这些pod的有序性和唯一性

Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling. 像部署一样,statefulset管理基于相同容器规范的pod。与部署不同,statefulset为每个pod维护一个粘性标识。这些pod是从同一个规范创建的,但不能互换:每个pod都有一个持久标识符,它在任何重新调度中都会维护这个标识符。

Using StatefulSets

StatefulSets are valuable for applications that require one or more of the following. statefulset对于需要以下一项或多项的应用程序很有价值。

  • Stable, unique network identifiers. 稳定、唯一的网络标识符。
  • Stable, persistent storage. 稳定持久的存储。
  • Ordered, graceful deployment and scaling. 有序、优雅的部署和扩展。
  • Ordered, automated rolling updates. 有序、自动滚动更新。

In the above, stable is synonymous with persistence across Pod (re)scheduling. If an application doesn’t require any stable identifiers or ordered deployment, deletion, or scaling, you should deploy your application using a workload object that provides a set of stateless replicas. Deployment or ReplicaSet may be better suited to your stateless needs. 在上面,stable是pod(re)调度中持久性的同义词。如果应用程序不需要任何稳定的标识符或有序的部署、删除或扩展,则应使用提供一组无状态副本的工作负载对象来部署应用程序。部署复制集可能更适合您的无状态需求。

Limitations

  • The storage for a given Pod must either be provisioned by a PersistentVolume Provisioner based on the requested storage class, or pre-provisioned by an admin. 给定pod的存储必须由基于请求的“存储类”的PersistentVolume Provisioner提供,或者由管理员预先提供。
  • Deleting and/or scaling a StatefulSet down will not delete the volumes associated with the StatefulSet. This is done to ensure data safety, which is generally more valuable than an automatic purge of all related StatefulSet resources. 删除和/或缩放statefulset down将删除与statefulset关联的卷。这样做是为了确保数据安全,这通常比自动清除所有相关的statefulset资源更有价值。
  • StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods. You are responsible for creating this Service. statefulset当前需要一个无头服务来负责pods的网络标识。您负责创建此服务。
  • StatefulSets do not provide any guarantees on the termination of pods when a StatefulSet is deleted. To achieve ordered and graceful termination of the pods in the StatefulSet, it is possible to scale the StatefulSet down to 0 prior to deletion. statefulset不保证在删除statefulset时终止pods。为了实现statefulset中pods的有序和优雅终止,可以在删除之前将statefulset缩小到0。
  • When using Rolling Updates with the default Pod Management Policy (OrderedReady), it’s possible to get into a broken state that requires manual intervention to repair. 使用带有默认pod管理策略(orderedready)的滚动更新时,可能会进入需要手动干预才能修复的断开状态。

Components

The example below demonstrates the components of a StatefulSet. 下面的示例演示statefulset的组件。

  • A Headless Service, named nginx, is used to control the network domain. 一个名为nginx的无头服务用于控制网络域。
  • The StatefulSet, named web, has a Spec that indicates that 3 replicas of the nginx container will be launched in unique Pods. statefulset名为web,它有一个规范,指出nginx容器的3个副本将在惟一的pods中启动。
  • The volumeClaimTemplates will provide stable storage using PersistentVolumes provisioned by a PersistentVolume Provisioner. VolumeClaimTemplates将使用由PersistentVolume Provisioner配置的PersistentVolumes提供稳定的存储。
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx # has to match .spec.template.metadata.labels
  serviceName: "nginx"
  replicas: 3 # by default is 1
  template:
    metadata:
      labels:
        app: nginx # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "my-storage-class"
      resources:
        requests:
          storage: 1Gi

Pod Selector

You must set the .spec.selector field of a StatefulSet to match the labels of its .spec.template.metadata.labels. Prior to Kubernetes 1.8, the .spec.selector field was defaulted when omitted. In 1.8 and later versions, failing to specify a matching Pod Selector will result in a validation error during StatefulSet creation. 必须设置statefulset的.spec.selector字段,使其与其.spec.template.metadata.labels的标签匹配。在kubernetes 1.8之前,.spec.selector字段在省略时是默认的。在1.8及更高版本中,如果未能指定匹配的pod选择器,将在statefulset创建期间导致验证错误。

Pod Identity

StatefulSet Pods have a unique identity that is comprised of an ordinal, a stable network identity, and stable storage. The identity sticks to the Pod, regardless of which node it’s (re)scheduled on. statefulset pod有一个唯一的标识,它由序数、稳定的网络标识和稳定的存储组成。无论它在哪个节点上(重新)调度,标识都会固定在pod上。

Ordinal Index

For a StatefulSet with N replicas, each Pod in the StatefulSet will be assigned an integer ordinal, from 0 up through N-1, that is unique over the Set. 对于具有n个副本的statefulset,statefulset中的每个pod将被分配一个整数序数,从0到n-1,在该集合上是唯一的。

Stable Network ID

Each Pod in a StatefulSet derives its hostname from the name of the StatefulSet and the ordinal of the Pod. The pattern for the constructed hostname is $(statefulset name)-$(ordinal). The example above will create three Pods named web-0,web-1,web-2. A StatefulSet can use a Headless Service to control the domain of its Pods. The domain managed by this Service takes the form: $(service name).$(namespace).svc.cluster.local, where “cluster.local” is the cluster domain. As each Pod is created, it gets a matching DNS subdomain, taking the form: $(podname).$(governing service domain), where the governing service is defined by the serviceName field on the StatefulSet. statefulset中的每个pod从statefulset的名称和pod的序号派生其主机名。构造的主机名的模式是$(statefulset name)-$(ordinal)。上面的示例将创建三个名为web-0、web-1、web-2的pod。statefulset可以使用[headless service](https://kubernetes.io/docs/concepts/servic... networking/service/headless services)来控制其pods的域。此服务管理的域的格式为:$(服务名称)。$(命名空间).svc.cluster.local,其中“cluster.local”是群集域。创建每个pod时,它将获得一个匹配的dns子域,其格式为:$(podname).$(管理服务域),其中管理服务由statefulset上的“servicename”字段定义。

As mentioned in the limitations section, you are responsible for creating the Headless Service responsible for the network identity of the pods. 如限制部分所述,您负责创建负责pods的网络标识的无头服务。

Here are some examples of choices for Cluster Domain, Service name, StatefulSet name, and how that affects the DNS names for the StatefulSet’s Pods. 下面是一些选择群集域、服务名称、statefulset名称的示例,以及这如何影响statefulset的pod的dns名称。

Cluster Domain Service (ns/name) StatefulSet (ns/name) StatefulSet Domain Pod DNS Pod Hostname
cluster.local default/nginx default/web nginx.default.svc.cluster.local web-{0..N-1}.nginx.default.svc.cluster.local web-{0..N-1}
cluster.local foo/nginx foo/web nginx.foo.svc.cluster.local web-{0..N-1}.nginx.foo.svc.cluster.local web-{0..N-1}
kube.local foo/nginx foo/web nginx.foo.svc.kube.local web-{0..N-1}.nginx.foo.svc.kube.local web-{0..N-1}

Note: Cluster Domain will be set to cluster.local unless otherwise configured. 除非另有配置,否则群集域将设置为cluster.local。

Stable Storage

Kubernetes creates one PersistentVolume for each VolumeClaimTemplate. In the nginx example above, each Pod will receive a single PersistentVolume with a StorageClass of my-storage-class and 1 Gib of provisioned storage. If no StorageClass is specified, then the default StorageClass will be used. When a Pod is (re)scheduled onto a node, its volumeMounts mount the PersistentVolumes associated with its PersistentVolume Claims. Note that, the PersistentVolumes associated with the Pods’ PersistentVolume Claims are not deleted when the Pods, or StatefulSet are deleted. This must be done manually. Kubernetes为每个VolumeClaimTemplate创建一个PersistentVolume。在上面的nginx示例中,每个pod将接收一个persistenvolume,其中一个storage class是我的存储类,另一个是1gib的已配置存储。如果未指定StorageClass,则将使用默认的StorageClass。当pod被(重新)调度到节点上时,它的volumemounts装载与其persistenvolume声明相关联的persistenvolumes。注意,当pods或statefulset被删除时,与pods的persistenvolume声明相关联的persistenvolumes不会被删除。这必须手动完成。

Pod Name Label

When the StatefulSet Controller creates a Pod, it adds a label, statefulset.kubernetes.io/pod-name, that is set to the name of the Pod. This label allows you to attach a Service to a specific Pod in the StatefulSet. 当statefulset控制器创建pod时,它会添加一个标签“statefulset.kubernetes.io/pod name”,该标签设置为pod的名称。此标签允许您将服务附加到statefulset中的特定pod。

Deployment and Scaling Guarantees

  • For a StatefulSet with N replicas, when Pods are being deployed, they are created sequentially, in order from {0..N-1}. 对于具有n个副本的statefulset,在部署pod时,按照{0..n-1}的顺序依次创建它们。
  • When Pods are being deleted, they are terminated in reverse order, from {N-1..0}. 当pod被删除时,它们以相反的顺序终止,从
  • Before a scaling operation is applied to a Pod, all of its predecessors must be Running and Ready. 在将缩放操作应用于pod之前,它的所有前置任务都必须运行并准备就绪。
  • Before a Pod is terminated, all of its successors must be completely shutdown. 在终止POD之前,必须完全关闭其所有后续程序。

The StatefulSet should not specify a pod.Spec.TerminationGracePeriodSeconds of 0. This practice is unsafe and strongly discouraged. For further explanation, please refer to force deleting StatefulSet Pods. statefulset不应将“pod.spec.terminationgraceperiodseconds”指定为0。这种做法是不安全的,强烈反对。有关进一步的解释,请参阅强制删除statefulset pods。

When the nginx example above is created, three Pods will be deployed in the order web-0, web-1, web-2. web-1 will not be deployed before web-0 is Running and Ready, and web-2 will not be deployed until web-1 is Running and Ready. If web-0 should fail, after web-1 is Running and Ready, but before web-2 is launched, web-2 will not be launched until web-0 is successfully relaunched and becomes Running and Ready. 当创建上述nginx示例时,将按照web-0、web-1和web-2的顺序部署三个pod。在web-0运行并准备就绪之前,不会部署web-1,在web-1运行并准备就绪之前,不会部署web-2。如果web-0失败,则在web-1运行并准备就绪之后,但在web-2启动之前,web-2将不会启动,直到web-0成功重新启动并开始运行并准备就绪。

If a user were to scale the deployed example by patching the StatefulSet such that replicas=1, web-2 would be terminated first. web-1 would not be terminated until web-2 is fully shutdown and deleted. If web-0 were to fail after web-2 has been terminated and is completely shutdown, but prior to web-1’s termination, web-1 would not be terminated until web-0 is Running and Ready. 如果用户要通过修补statefulset以使replicas=1来扩展部署的示例,那么web-2将首先终止。在完全关闭并删除web-2之前,web-1不会终止。如果web-0在web-2终止并完全关闭之后失败,但在web-1终止之前,web-1将不会终止,直到web-0运行并准备就绪。

Pod Management Policies

In Kubernetes 1.7 and later, StatefulSet allows you to relax its ordering guarantees while preserving its uniqueness and identity guarantees via its .spec.podManagementPolicy field. 在Kubernetes 1.7及更高版本中,statefulset允许您放宽其排序保证,同时通过其.spec.podmanagementpolicy字段保留其唯一性和标识保证。

OrderedReady Pod Management

OrderedReady pod management is the default for StatefulSets. It implements the behavior described above. orderedready pod management是statefulset的默认设置。它实现了上面描述的行为。

Parallel Pod Management

Parallel pod management tells the StatefulSet controller to launch or terminate all Pods in parallel, and to not wait for Pods to become Running and Ready or completely terminated prior to launching or terminating another Pod. This option only affects the behavior for scaling operations. Updates are not affected. 并行pod管理告诉statefulset控制器并行启动或终止所有pod,不要等到pod运行并准备就绪或完全终止后再启动或终止另一个pod。此选项仅影响缩放操作的行为。更新不受影响。

Update Strategies

In Kubernetes 1.7 and later, StatefulSet’s .spec.updateStrategy field allows you to configure and disable automated rolling updates for containers, labels, resource request/limits, and annotations for the Pods in a StatefulSet. 在Kubernetes 1.7及更高版本中,statefulset的.spec.updatestrategy字段允许您配置和禁用statefulset中pods的容器、标签、资源请求/限制和注释的自动滚动更新。

On Delete

The OnDelete update strategy implements the legacy (1.6 and prior) behavior. When a StatefulSet’s .spec.updateStrategy.type is set to OnDelete, the StatefulSet controller will not automatically update the Pods in a StatefulSet. Users must manually delete Pods to cause the controller to create new Pods that reflect modifications made to a StatefulSet’s .spec.template. “ondelete”更新策略实现遗留(1.6及以前版本)行为。当statefulset的.spec.updatestrategy.type设置为“ondelete”时,statefulset控制器不会自动更新statefulset中的pods。用户必须手动删除pod,以使控制器创建反映对statefulset的.spec.template所做修改的新pod。

Rolling Updates

The RollingUpdate update strategy implements automated, rolling update for the Pods in a StatefulSet. It is the default strategy when .spec.updateStrategy is left unspecified. When a StatefulSet’s .spec.updateStrategy.type is set to RollingUpdate, the StatefulSet controller will delete and recreate each Pod in the StatefulSet. It will proceed in the same order as Pod termination (from the largest ordinal to the smallest), updating each Pod one at a time. It will wait until an updated Pod is Running and Ready prior to updating its predecessor.
rolling update更新策略为statefulset中的pod实现自动的滚动更新。当.spec.updatestregy未指定时,它是默认策略。当statefulset的.spec.updatestrategy.type设置为rollingupdate时,statefulset控制器将删除并重新创建statefulset中的每个pod。它将按照与pod终止相同的顺序(从最大序数到最小序数)进行,每次更新一个pod。它将等到更新后的pod运行并准备就绪后,才能更新其前身。

Partitions

The RollingUpdate update strategy can be partitioned, by specifying a .spec.updateStrategy.rollingUpdate.partition. If a partition is specified, all Pods with an ordinal that is greater than or equal to the partition will be updated when the StatefulSet’s .spec.template is updated. All Pods with an ordinal that is less than the partition will not be updated, and, even if they are deleted, they will be recreated at the previous version. If a StatefulSet’s .spec.updateStrategy.rollingUpdate.partition is greater than its .spec.replicas, updates to its .spec.template will not be propagated to its Pods. In most cases you will not need to use a partition, but they are useful if you want to stage an update, roll out a canary, or perform a phased roll out. 可以通过指定.spec.update strategy.rollingupdate.partition对rollingupdate更新策略进行分区。如果指定了分区,则当statefulset的.spec.template更新时,序号大于或等于分区的所有pod都将更新。序号小于分区的所有pod都将不会更新,即使删除它们,也将在以前的版本中重新创建。如果statefulset的.spec.updatestrategy.rollingupdate.partition大于其.spec.replicas,则对其.spec.template的更新将不会传播到其pods。在大多数情况下,您不需要使用分区,但如果您希望进行更新、展开金丝雀或执行分阶段展开,则分区非常有用。

Forced Rollback

When using Rolling Updates with the default Pod Management Policy (OrderedReady), it’s possible to get into a broken state that requires manual intervention to repair. 使用带有默认pod管理策略(orderedready)的滚动更新时,可能会进入需要手动干预才能修复的断开状态。

If you update the Pod template to a configuration that never becomes Running and Ready (for example, due to a bad binary or application-level configuration error), StatefulSet will stop the rollout and wait. 如果将pod模板更新为一个永远不会运行并准备就绪的配置(例如,由于错误的二进制或应用程序级配置错误),statefulset将停止卷展并等待。

In this state, it’s not enough to revert the Pod template to a good configuration. Due to a known issue, StatefulSet will continue to wait for the broken Pod to become Ready (which never happens) before it will attempt to revert it back to the working configuration. 在此状态下,仅将pod模板还原为良好配置是不够的。由于已知的问题,statefulset将继续等待损坏的pod准备就绪(这永远不会发生),然后再尝试将其还原回工作配置。

After reverting the template, you must also delete any Pods that StatefulSet had already attempted to run with the bad configuration. StatefulSet will then begin to recreate the Pods using the reverted template. 还原模板后,还必须删除statefulset已尝试使用错误配置运行的任何pod。然后statefulset将开始使用还原的模板重新创建pods。

What's next

Feedback

Was this page helpful?

k8s
本作品采用《CC 协议》,转载必须注明作者和本文链接
《L04 微信小程序从零到发布》
从小程序个人账户申请开始,带你一步步进行开发一个微信小程序,直到提交微信控制台上线发布。
《L03 构架 API 服务器》
你将学到如 RESTFul 设计风格、PostMan 的使用、OAuth 流程,JWT 概念及使用 和 API 开发相关的进阶知识。
讨论数量: 0
(= ̄ω ̄=)··· 暂无内容!

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!