Longhorn 是 Kubernetes 的轻量级、可靠且易于使用的分布式块存储系统。
Longhorn 是免费的开源软件。最初由 Rancher Labs 开发,现在作为CNCF( Cloud Native Computing Foundation )的沙箱项目进行开发。
使用 Longhorn,您可以:
- 使用 Longhorn 卷作为 Kubernetes 集群中分布式有状态应用程序的持久存储
- 将块存储划分为 Longhorn 卷,这样无论是否有云提供商,都可以使用 Kubernetes 卷
- 跨多个节点和数据中心复制块存储以提高可用性
- 将备份数据存储在 NFS 或 AWS S3 等外部存储上
- 创建跨集群灾难恢复卷,以便可以从第二个 Kubernetes 集群的备份中快速恢复来自主 Kubernetes 集群的数据
- 安排卷的定期快照,并安排定期备份到 NFS 或 S3 兼容的辅助存储
- 从备份恢复卷
- 在不中断持久卷的情况下升级 Longhorn
Longhorn 带有独立的 UI,可以使用 Helm、kubectl 或 Rancher app catalog 进行安装。
Architecture
在下图中,
- Longhorn volumes 有三个实例。
- 每个卷都有一个专用控制器,称为 Longhorn Engine 并作为 Linux 进程运行。
- 每个 Longhorn 卷有两个副本(replica),每个副本是一个 Linux 进程。
- 图中的箭头表示卷(volume)、控制器实例(controller instance)、副本实例(replica instances)和磁盘之间的读/写数据流。
- 通过为每个卷创建单独的 Longhorn Engine,如果一个控制器出现故障,其他卷的功能不会受到影响。
Install
#Using the Environment Check Script
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/scripts/environment_check.sh | bash
#open-iscsi
open-iscsi is installed, and the iscsid daemon is running on all the nodes
#Install with Helm
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version 1.6.0
Longhorn Settings
参数名称 | 参数意义(原文) | 参数意义(个人译文) | 默认值 |
backup-target: | The target used for backup. NFS and S3 are supported. See Setting a Backup Target for details. | 用于备份的目标器。支持NFS和S3协议。详细信息请参见设置备份目标。 | Example: s3://backupbucket@us-east-1/backupstore |
backup-target-credential-secret: | The Kubernetes secret associated with the backup target. See Setting a Backup Target for details. | Kubernetes的秘密与备用目标有关。详细信息请参见设置备份目标。 | Example: s3-secret |
allow-recurring-job-while-volume-detached: | If this setting is enabled, Longhorn automatically attaches the volume and takes snapshot/backup when it is the time to do recurring snapshot/backup.Note that during the time the volume was attached automatically, the volume is not ready for the workload. the workload will have to wait until the recurring job finishes. 如果启用了该设置,Longhorn会自动附加卷,并在需要重复执行快照/备份时进行快照/备份。请注意,在自动附加卷的过程中,卷还没有为工作负载做好准备。工作负载必须等待循环作业完成。 FALSE create-default-disk-labeled-nodes: “If no other disks exist, create the default disk automatically, only on nodes with the Kubernetes label node.longhorn.io/create-default-disk=true .If disabled, the default disk will be created on all new nodes when the node is detected for the first time.” “如果不存在其他磁盘,则只在Kubernetes标签为node.longhorn的节点上自动创建默认磁盘。node.longhorn.io/create-default-disk=true .。如果禁用,当第一次发现该节点时,将在所有新节点上创建默认磁盘。如果您希望扩展集群,但不想使用新节点上的存储,或者希望为Longhorn节点定制磁盘,则此选项非常有用。” FALSE default-data-path: “Default path to use for storing data on a host.Can be used with Create Default Disk on Labeled Nodes option, to make Longhorn only use the nodes with specific storage mounted at, for example, /opt/longhorn when scaling the cluster.” “主机上数据存储的默认路径。可以与“Create Default Disk on Labeled Nodes”选项一起使用,以使Longhorn在扩展集群时只使用挂载在/opt/ Longhorn上的特定存储的节点。” /var/lib/longhorn/ replica-soft-anti-affinity: “When this setting is checked, the Longhorn Manager will allow scheduling on nodes with existing healthy replicas of the same volume.When this setting is un-checked, the Longhorn Manager will not allow scheduling on nodes with existing healthy replicas of the same volume.” 选中此设置后,Longhorn Manager将允许在具有相同卷的现有健康副本的节点上调度。未选中此设置时,Longhorn Manager将不允许在具有相同卷的现有健康副本的节点上调度 | FALSE | |
replica-auto-balance: | “Enable this setting automatically rebalances replicas when discovered an available node.The available global options are:disabled. This is the default option. No replica auto-balance will be done.least-effort. This option instructs Longhorn to balance replicas for minimal redundancy.best-effort. This option instructs Longhorn try to balancing replicas for even redundancy. Longhorn does not forcefully re-schedule the replicas to a zone that does not have enough nodes to support even balance. Instead, Longhorn will re-schedule to balance at the node level.” | “启用此设置,当发现可用节点时自动重新平衡副本。现有的全局选择是:disabled。这是默认选项。没有副本自动平衡将被完成。least-effort。这个选项指示Longhorn平衡副本以最小化冗余。best-effort。这个选项指示Longhorn尝试平衡副本,甚至是冗余。Longhorn不会强制将副本重新调度到没有足够节点来支持均衡的区域。相反,Longhorn将在节点级别上重新调度以实现平衡。” | disabled |
storage-over-provisioning-percentage: | “The over-provisioning percentage defines how much storage can be allocated relative to the hard drive’s capacity.With the default setting of 200, the Longhorn Manager will allow scheduling new replicas only after the amount of disk space has been added to the used disk space (storage scheduled), and the used disk space (Storage Maximum - Storage Reserved) is not over 200% of the actual usable disk capacity.This value can be lowered to avoid overprovisioning storage. See Multiple Disks Support for details. Also, a replica of volume may take more space than the volume’s size since the snapshots need storage space as well. The users can delete snapshots to reclaim spaces.” | “超额供应百分比定义了相对于硬盘容量可以分配多少存储空间。默认设置为200,longhorn-manager将允许调度新副本后的磁盘空间量添加到使用磁盘空间(存储计划),并使用的磁盘空间(存储最大——存储保留)不超过200%的实际可用的磁盘容量。存储超额供应百分比 比如默认200的情况,意味着可以把一块300G的longhorn volume 调度到一个只有150G存储空间的节点上。这个值可以降低,以避免过度分配存储。详细信息请参见多磁盘支持。此外,卷的副本可能会占用比卷本身大小更多的空间,因为快照也需要存储空间。用户可以通过删除快照来回收空间。” | 200 |
storage-minimal-available-percentage: | With the default setting of 25, the Longhorn Manager will allow scheduling new replicas only after the amount of disk space has been subtracted from the available disk space (Storage Available) and the available disk space is still over 25% of actual disk capacity (Storage Maximum). Otherwise the disk becomes unschedulable until more space is freed up. | 在默认设置为25的情况下,只有在从可用磁盘空间(Storage available)中减去磁盘空间量,并且可用磁盘空间仍然超过实际磁盘容量(Storage Maximum)的25%之后,Longhorn Manager才允许调度新的副本。否则,在释放更多空间之前,磁盘将变得不可调度。 | 25 |
upgrade-checker: | Upgrade Checker will check for a new Longhorn version periodically. When there is a new version available, it will notify the user in the Longhorn UI. | 升级检查器将定期检查一个新的Longhorn版本。当有新版本可用时,它会在Longhorn UI中通知用户。 | TRUE |
default-replica-count: | “The default number of replicas when creating the volume from Longhorn UI. For Kubernetes, update the numberOfReplicas in the StorageClass.The recommended way of choosing the default replica count is: if you have three or more nodes for storage, use 3; otherwise use 2. Using a single replica on a single node cluster is also OK, but the high availability functionality wouldn’t be available. You can still take snapshots/backups of the volume.” | “从Longhorn用户界面创建卷时默认的副本数量。对于Kubernetes,更新StorageClass中的numberOfReplicas推荐的选择默认副本计数的方法是:如果你有三个或更多的节点用于存储,使用3;否则使用2。在单个节点集群上使用单个副本也可以,但高可用性功能将不可用。您仍然可以对卷进行快照/备份。” | 3 |
default-data-locality: | “We say a Longhorn volume has data locality if there is a local replica of the volume on the same node as the pod which is using the volume. This setting specifies the default data locality when a volume is created from the Longhorn UI. For Kubernetes configuration, update the dataLocality in the StorageClass The available modes are: disabled. This is the default option. There may or may not be a replica on the same node as the attached volume (workload). best-effort. This option instructs Longhorn to try to keep a replica on the same node as the attached volume (workload). Longhorn will not stop the volume, even if it cannot keep a replica local to the attached volume (workload) due to environment limitation, e.g. not enough disk space, incompatible disk tags, etc. strict-local: This option enforces Longhorn keep the only one replica on the same node as the attached volume, and therefore, it offers higher IOPS and lower latency performance.” | “我们说Longhorn卷具有数据局部性,如果卷的本地副本与使用该卷的pod在同一节点上。该设置指定从Longhorn UI创建卷时的默认数据位置。对于Kubernetes配置,更新StorageClass中的dataLocality可供选择的模式有:1.disabled。这是默认选项。与附加卷(工作负载)相同的节点上可能有副本,也可能没有副本。2.best-effort。这个选项指示Longhorn尝试在与附加卷(工作负载)相同的节点上保持一个副本。Longhorn将不会停止卷,即使由于环境限制,例如没有足够的磁盘空间、不兼容的磁盘标签等,它不能保留附加卷的本地副本(工作负载)。3.strict-local:该选项强制Longhorn在与附加卷相同的节点上只保留一个副本,因此,它提供更高的IOPS和更低的延迟性能。” | disabled |
default-longhorn-static-storage-class: | The storageClassName is for persistent volumes (PVs) and persistent volume claims (PVCs) when creating PV/PVC for an existing Longhorn volume. Notice that it’s unnecessary for users to create the related StorageClass object in Kubernetes since the StorageClass would only be used as matching labels for PVC bounding purpose. By default ‘longhorn-static’. | storageClassName用于为现有的Longhorn卷创建PV/PVC时持久化卷(PV)和持久化卷声明(PVCs)。注意,用户没有必要在Kubernetes中创建相关的StorageClass对象,因为StorageClass只会被用作PVC包围目的的匹配标签。默认情况下“longhorn-static”。 | longhorn-static |
backupstore-poll-interval: | “The interval in seconds to poll the backup store for updating volumes’ Last Backup field. Set to 0 to disable the polling. See Setting up Disaster Recovery Volumes for details.For more information on how the backupstore poll interval affects the recovery time objective and recovery point objective, refer to the concepts section.” | “以秒为单位轮询备份存储更新卷的“Last backup”字段。设置为0表示禁用轮询。请参见搭建灾备卷。有关备份存储轮询间隔如何影响恢复时间目标和恢复点目标的详细信息,请参阅概念部分。” | 300 |
taint-toleration: | 污点容忍,包括 Longhorn manager, Longhorn driver, Longhorn UI以及instance manager, engine image, CSI driver, etc. | 污点容忍,包括 Longhorn manager, Longhorn driver, Longhorn UI以及instance manager, engine image, CSI driver, etc. | Example: nodetype=storage:NoSchedule |
system-managed-components-node-selector: | “If you want to restrict Longhorn components to only run on a particular set of nodes, you can set node selector for all Longhorn components.Longhorn system contains user deployed components (e.g, Longhorn manager, Longhorn driver, Longhorn UI) and system managed components (e.g, instance manager, engine image, CSI driver, etc.) You need to set node selector for both of them. This setting only sets node selector for system managed components. Follow the instruction at Node Selector to change node selector.” | “如果希望限制Longhorn组件只能在特定的节点集上运行,可以为所有Longhorn组件设置node-selector。Longhorn系统包含用户部署组件(如Longhorn管理器、Longhorn驱动程序、Longhorn UI)和系统管理组件(如实例管理器、引擎映像、CSI驱动程序等),您需要为它们设置节点选择器。此设置仅为系统管理组件设置节点选择器。按照节点选择器中的指示更改节点选择器。” | Example: label-key1:label-value1;label-key2:label-value2 |
priority-class: | “By default, Longhorn workloads run with the same priority as other pods in the cluster, meaning in cases of node pressure, such as a node running out of memory, Longhorn workloads will be at the same priority as other Pods for eviction.The Priority Class setting will specify a Priority Class for the Longhorn workloads to run as. This can be used to set the priority for Longhorn workloads higher so that they will not be the first to be evicted when a node is under pressure.Longhorn system contains user deployed components (e.g, Longhorn manager, Longhorn driver, Longhorn UI) and system managed components (e.g, instance manager, engine image, CSI driver, etc.) Note that this setting only sets Priority Class for system managed components. Depending on how you deployed Longhorn, you need to set Priority Class for user deployed components in Helm chart or deployment YAML file.” | “默认情况下,Longhorn工作负载的优先级与集群中的其他pods相同,这意味着在节点压力的情况下,例如节点耗尽内存,Longhorn工作负载将与其他pods的优先级相同。优先级类设置将为运行的Longhorn工作负载指定一个优先级类。这可以用来为Longhorn工作负载设置更高的优先级,这样当节点处于压力之下时,它们就不会是第一个被驱逐的。Longhorn系统包含用户部署组件(例如,Longhorn管理器、Longhorn驱动程序、Longhorn UI)和系统管理组件(例如,实例管理器、引擎映像、CSI驱动程序等)。请注意,此设置仅设置系统管理组件的优先级。根据您如何部署Longhorn,您需要在Helm chart或部署YAML文件中为用户部署的组件设置优先级。” | Example: high-priority |
auto-salvage: | If enabled, volumes will be automatically salvaged when all the replicas become faulty e.g. due to network disconnection. Longhorn will try to figure out which replica(s) are usable, then use them for the volume. | 如果启用,当所有副本出现故障(例如由于网络断开)时,卷将被自动回收。Longhorn将尝试找出哪些副本是可用的,然后将它们用于卷。 | TRUE |
auto-delete-pod-when-volume-detached-unexpectedly: | If enabled, Longhorn will automatically delete the workload pod that is managed by a controller (e.g. deployment, statefulset, daemonset, etc…) when Longhorn volume is detached unexpectedly (e.g. during Kubernetes upgrade, Docker reboot, or network disconnect). By deleting the pod, its controller restarts the pod and Kubernetes handles volume reattachment and remount. | 如果启用,当Longhorn卷意外分离(例如Kubernetes升级、Docker重启或网络断开)时,Longhorn将自动删除由控制器管理的工作负载pod。通过删除pod,其控制器将重新启动pod, Kubernetes将处理卷的重新连接和重新挂载。 | TRUE |
disable-scheduling-on-cordoned-node: | “When this setting is checked, the Longhorn Manager will not schedule replicas on Kubernetes cordoned nodes.When this setting is un-checked, the Longhorn Manager will schedule replicas on Kubernetes cordoned nodes.” | “当选中此设置时,Longhorn Manager将不会在Kubernetes被封锁的节点上调度副本。当未选中此设置时,Longhorn Manager将在Kubernetes被封锁的节点上调度副本。” | TRUE |
replica-zone-soft-anti-affinity: | “When this setting is checked, the Longhorn Manager will allow scheduling new replicas of a volume to the nodes in the same zone as existing healthy replicas.When this setting is un-checked, Longhorn Manager will not allow scheduling new replicas of a volume to the nodes in the same zone as existing healthy replicas.” | “选中此设置后,Longhorn Manager将允许将卷的新副本调度到与现有健康副本相同分区中的节点上。未选中此设置时,Longhorn Manager将不允许将卷的新副本调度到与现有健康副本相同分区中的节点。” | TRUE |
node-down-pod-deletion-policy: | “Defines the Longhorn action when a Volume is stuck with a StatefulSet/Deployment Pod on a node that is down.do-nothing is the default Kubernetes behavior of never force deleting StatefulSet/Deployment terminating pods. Since the pod on the node that is down isn’t removed, Longhorn volumes are stuck on nodes that are down.delete-statefulset-pod Longhorn will force delete StatefulSet terminating pods on nodes that are down to release Longhorn volumes so that Kubernetes can spin up replacement pods.delete-deployment-pod Longhorn will force delete Deployment terminating pods on nodes that are down to release Longhorn volumes so that Kubernetes can spin up replacement pods.delete-both-statefulset-and-deployment-pod Longhorn will force delete StatefulSet/Deployment terminating pods on nodes that are down to release Longhorn volumes so that Kubernetes can spin up replacement pods.” | “定义Longhorn卷的动作当Pod所在的节点上宕机时。do-nothing 是Kubernetes的默认行为,即从不强制删除StatefulSet/Deployment终止pods。由于节点上的pod没有被移除,Longhorn卷被卡在节点上。delete-StatefulSet-pod Longhorn将强制删除节点上的StatefulSet终止pod,以释放Longhorn卷,以便Kubernetes可以飘移pod。delete-deployment-pod Longhorn将强制删除部署终止deployment,以释放Longhorn卷,以便Kubernetes可以飘移pod。delete-both-statefulset-and-deployment-pod Longhorn将强制删除节点上的StatefulSet/Deployment终止pod,以释放Longhorn卷,以便Kubernetes可以飘移pod” | do-nothing |
allow-node-drain-with-last-healthy-replica: | “By default, Longhorn will block kubectl drain action on a node if the node contains the last healthy replica of a volume.If this setting is enabled, Longhorn will not block kubectl drain action on a node even if the node contains the last healthy replica of a volume.” | 如果节点包含卷的最后一个健康副本,是否阻塞kubectl drain操作 | FALSE |
mkfs-ext4-parameters: | Allows setting additional filesystem creation parameters for ext4. For older host kernels it might be necessary to disable the optional ext4 metadata_csum feature by specifying -O 64bit,metadata_csum. | 允许为ext4设置额外的文件系统创建参数 | |
disable-replica-rebuild: | This deprecated setting is replaced by the new setting ‘Concurrent Replica Rebuild Per Node Limit’. Once the new setting value is 0, it means rebuilding disable. | 这个已弃用的设置被新设置“ concurrent-replica-rebuild-per-node-limit”所取代。一旦新的设置值为0,它意味着重建禁用。 | FALSE |
replica-replenishment-wait-interval: | “When there is at least one failed replica volume in a degraded volume, this interval in seconds determines how long Longhorn will wait at most in order to reuse the existing data of the failed replicas rather than directly creating a new replica for this volume.Warning: This wait interval works only when there is at least one failed replica in the volume. And this option may block the rebuilding for a while.” | “当降级的卷中至少有一个失败的副本卷时,这个以秒为单位的间隔决定Longhorn为了重用失败副本的现有数据而不是直接为这个卷创建一个新副本最多需要等待多长时间。警告:此等待时间间隔仅在卷中至少有一个失败的副本时有效。这个选项可能会在一段时间内阻止重建。” | 600 |
concurrent-replica-rebuild-per-node-limit: | “This setting controls how many replicas on a node can be rebuilt simultaneously.Typically, Longhorn can block the replica starting once the current rebuilding count on a node exceeds the limit. But when the value is 0, it means disabling the replica rebuilding.” | “此设置控制一个节点上可以同时重建多少个副本。通常情况下,一旦节点上的当前重建计数超过限制,Longhorn可以阻止复制启动。但是当该值为0时,它意味着禁用副本重建。” | 5 |
disable-revision-counter: | Allows engine controller and engine replica to disable revision counter file update for every data write. This improves the data path performance. See Revision Counter for details. | 允许引擎控制器和引擎副本在每次写数据时禁用修订计数器文件更新。这提高了数据路径性能。详情请参阅修订版计数器。 | FALSE |
system-managed-pods-image-pull-policy: | “This setting defines the Image Pull Policy of Longhorn system managed pods, e.g. instance manager, engine image, CSI driver, etc.Notice that the new Image Pull Policy will only apply after the system managed pods restart.This setting definition is exactly the same as that of in Kubernetes. Here are the available options:always. Every time the kubelet launches a container, the kubelet queries the container image registry to resolve the name to an image digest. If the kubelet has a container image with that exact digest cached locally, the kubelet uses its cached image; otherwise, the kubelet downloads (pulls) the image with the resolved digest, and uses that image to launch the container.if-not-present. The image is pulled only if it is not already present locally.never. The image is assumed to exist locally. No attempt is made to pull the image.” | “该设置定义了Longhorn系统管理的pod的Image Pull Policy,例如实例管理器、引擎映像、CSI driver等。请注意,新的Image Pull Policy只在系统托管的pod重启后才适用。这个设置定义与Kubernetes中的设置定义完全相同。以下是可供选择的方案:always。每次kubelet启动一个容器时,kubelet都会查询容器映像注册表,以将名称解析为一个映像摘要。如果kubelet在本地缓存了一个容器映像,kubelet会使用它缓存的映像;否则,kubelet下载(提取)带有已解析摘要的映像,并使用该映像启动容器。if-not-present。只有当映像在本地还没有出现时才会被拉取。never。假设镜像本地存在。没有试图拉取图像。” | if-not-present |
allow-volume-creation-with-degraded-availability: | This setting allows user to create and attach a volume that doesn’t have all the replicas scheduled at the time of creation. | 该设置允许用户创建和附加一个卷,该卷在创建时没有计划所有副本。 | TRUE |
auto-cleanup-system-generated-snapshot: | Longhorn will generate system snapshot during replica rebuild, and if a user doesn’t setup a recurring snapshot schedule, all the system generated snapshots would be left in the replica, and user has to delete them manually, this setting allow Longhorn to automatically cleanup system generated snapshot after replica rebuild. | 自动清理系统生成的副本快照 | TRUE |
concurrent-automatic-engine-upgrade-per-node-limit: | This setting controls how Longhorn automatically upgrades volumes’ engines to the new default engine image after upgrading Longhorn manager. The value of this setting specifies the maximum number of engines per node that are allowed to upgrade to the default engine image at the same time. If the value is 0, Longhorn will not automatically upgrade volumes’ engines to default version. | 此设置控制Longhorn在升级Longhorn管理器后如何自动将卷的引擎升级到新的默认引擎映像。此设置的值指定允许每个节点同时升级到默认引擎映像的最大引擎数量。如果该值为0,则Longhorn不会自动将卷的引擎升级到默认版本。 | 0 |
backing-image-cleanup-wait-interval: | This interval in minutes determines how long Longhorn will wait before cleaning up the backing image file when there is no replica in the disk using it. | 这个间隔(以分钟为单位)决定了当使用该备份映像文件的磁盘中没有副本时,Longhorn将等待多长时间才能清理该备份映像文件。 | 60 |
backing-image-recovery-wait-interval: | The interval in seconds determines how long Longhorn will wait before re-downloading the backing image file when all disk files of this backing image become failed or unknown. | 以秒为单位的间隔决定了当该备份映像的所有磁盘文件失效或未知时,Longhorn在重新下载该备份映像文件之前将等待多长时间。 | 300 |
guaranteed-engine-manager-cpu: | this integer value indicates what percentage of the total allocatable CPU on each node will be reserved for each engine manager Pod. For example, 10 means 10% of the total CPU on a node will be allocated to each engine manager pod on this node. This will help maintain engine stability during high node workload. | 这个整数值表示每个节点上总可分配CPU的百分比将预留给每个引擎管理器Pod。例如,10表示某个节点上总CPU的10%将分配给该节点上的每个引擎管理器吊舱。这将有助于在高节点工作负载期间保持引擎的稳定性。 | 12 |
guaranteed-replica-manager-cpu: | Similar to “Guaranteed Engine Manager CPU”, this integer value indicates what percentage of the total allocatable CPU on each node will be reserved for each replica manager Pod. For example, 10 means 10% of the total CPU on a node will be allocated to each replica manager pod on this node. This will help maintain replica stability during high node workload. | 类似于“guaranteed-engine-manager-cpu”,这个整数值表示每个节点上为每个副本管理器Pod保留的总可分配CPU的百分比。例如,10表示某个节点上总CPU的10%将分配给该节点上的每个副本管理器吊舱。这将有助于在高节点工作负载期间保持副本的稳定性。 | 12 |