Примечания к выпуску
Содержание
4.0.10Исправленные проблемыИзвестные проблемы4.0.9Исправленные проблемыИзвестные проблемы4.0.8Исправленные проблемыИзвестные проблемы4.0.7Исправленные проблемыИзвестные проблемы4.0.6Исправленные проблемыИзвестные проблемы4.0.5Исправленные проблемыИзвестные проблемы4.0.4Исправленные проблемыИзвестные проблемы4.0.3Исправленные проблемыИзвестные проблемы4.0.2Исправленные проблемыИзвестные проблемы4.0.1Исправленные проблемыИзвестные проблемы4.0.0Функции и улучшенияУстановка и обновление: модульная архитектураКластеры: декларативное управление жизненным циклом кластера с помощью Cluster APIOperator и расширения: полная видимость возможностейОптимизация логики лог-запросовОбновление ElasticSearch до 8.17Аутентификация ALBALB поддерживает аннотации ingress-nginxОптимизация live migration KubevirtОптимизация интеграции LDAP/OIDCПоддержка Source to Image (S2I)Локальное решение RegistryРефакторинг модуля GitOpsМониторинг на уровне namespaceИнтеграция CrossplaneОбновления виртуализацииОбновления Ceph StorageОбновления TopoLVMИсправленные проблемыИзвестные проблемы4.0.10
Выпущено: 2026-05-27
Исправленные проблемы
- Fixed an issue where kubeconfig client certificates could expire in long-running clusters or clusters older than one year, causing platform 500 errors, kubectl command failures, and Apollo component access issues. The controller now detects and automatically renews expiring client certificates in <cluster>-kubeconfig Secrets, and control-plane /root/.kube/config now uses client certificates with about 10-year validity.
- Fixed an issue during rolling upgrades where existing workload-cluster kubeconfig Secrets or ClusterCredential client certificates were not automatically rotated when expired or expiring within 30 days, which could cause workload-cluster access failures. The controller now detects this condition, reissues the client certificates, and updates the corresponding Secret or credential.
- Fixed an issue after upgrading from 3.16.2 to 4.0.x where the apiserver.crt generated by the upgrade script lacked the Authority Key Identifier (AKI) extension. Newer Go TLS clients, such as Jenkins pipelines, could reject the certificate and fail to connect to the Kubernetes API server. Upgrade-generated apiserver.crt certificates now include AKI.
- Fixed an issue where Pods could not be created normally after Multus was uninstalled.
- Fixed a security issue where image-registry referenced Secrets through environment variables in external S3 storage scenarios, preventing sensitive information from being exposed.
Известные проблемы
- In certain cases, users may find that the operations automatically recorded by the platform in a ResourcePatch resource do not match the actual modifications they made to the component. As a result, when the ResourcePatch controller applies the patch, the component may undergo unexpected changes.
Workaround: Users should manually modify the ResourcePatch resource to ensure it reflects the intended changes. - Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.9
Выпущено: 2026-02-10
Исправленные проблемы
- Fixed an issue where the olm-registry pod would continuously restart, preventing the OperatorHub from functioning properly. This was caused by the `seccompProfile: RuntimeDefault` security configuration added during CIS compliance hardening, which blocked the `clone` syscall required by CGO operations. The seccomp profile has been adjusted to allow necessary syscalls while maintaining security compliance.Fixed in ACP 4.0.9.
- Fixed a performance issue where the permission validation during native application creation became extremely slow (10+ seconds) when the cluster had 60+ operators installed.Fixed in ACP 4.0.9.
- Fixed an issue where installing the marketplace plugin on workload clusters would fail. Fixed in ACP 4.0.9.
- Fix the issue where the egress gateway cannot route traffic from Pods in the same subnet as the egress gateway.Fixed in ACP 4.0.9.
Известные проблемы
- Fixed an issue where kubeconfig client certificates could expire in long-running clusters or clusters older than one year, causing platform 500 errors, kubectl command failures, and Apollo component access issues. The controller now detects and automatically renews expiring client certificates in <cluster>-kubeconfig Secrets, and control-plane /root/.kube/config now uses client certificates with about 10-year validity.
- Fixed an issue during rolling upgrades where existing workload-cluster kubeconfig Secrets or ClusterCredential client certificates were not automatically rotated when expired or expiring within 30 days, which could cause workload-cluster access failures. The controller now detects this condition, reissues the client certificates, and updates the corresponding Secret or credential.
- Fixed an issue after upgrading from 3.16.2 to 4.0.x where the apiserver.crt generated by the upgrade script lacked the Authority Key Identifier (AKI) extension. Newer Go TLS clients, such as Jenkins pipelines, could reject the certificate and fail to connect to the Kubernetes API server. Upgrade-generated apiserver.crt certificates now include AKI.
- In certain cases, users may find that the operations automatically recorded by the platform in a ResourcePatch resource do not match the actual modifications they made to the component. As a result, when the ResourcePatch controller applies the patch, the component may undergo unexpected changes.
Workaround: Users should manually modify the ResourcePatch resource to ensure it reflects the intended changes. - Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Fixed an issue where Pods could not be created normally after Multus was uninstalled.
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.8
Выпущено: 2026-01-07
Исправленные проблемы
- When a user is automatically disabled by the system due to long-term lack of login, it will be automatically disabled again after being manually activated by the administrator. This issue has been fixed.
- Fix ovn-central can not auto recover if one ovn-db file is deleted manually.
- Fix ovn-central race with two leaders that interrupts cluster network.
Известные проблемы
- Fixed an issue where kubeconfig client certificates could expire in long-running clusters or clusters older than one year, causing platform 500 errors, kubectl command failures, and Apollo component access issues. The controller now detects and automatically renews expiring client certificates in <cluster>-kubeconfig Secrets, and control-plane /root/.kube/config now uses client certificates with about 10-year validity.
- Fixed an issue during rolling upgrades where existing workload-cluster kubeconfig Secrets or ClusterCredential client certificates were not automatically rotated when expired or expiring within 30 days, which could cause workload-cluster access failures. The controller now detects this condition, reissues the client certificates, and updates the corresponding Secret or credential.
- Fixed an issue where the olm-registry pod would continuously restart, preventing the OperatorHub from functioning properly. This was caused by the `seccompProfile: RuntimeDefault` security configuration added during CIS compliance hardening, which blocked the `clone` syscall required by CGO operations. The seccomp profile has been adjusted to allow necessary syscalls while maintaining security compliance.Fixed in ACP 4.0.9.
- Fixed a performance issue where the permission validation during native application creation became extremely slow (10+ seconds) when the cluster had 60+ operators installed.Fixed in ACP 4.0.9.
- Fixed an issue where installing the marketplace plugin on workload clusters would fail. Fixed in ACP 4.0.9.
- In certain cases, users may find that the operations automatically recorded by the platform in a ResourcePatch resource do not match the actual modifications they made to the component. As a result, when the ResourcePatch controller applies the patch, the component may undergo unexpected changes.
Workaround: Users should manually modify the ResourcePatch resource to ensure it reflects the intended changes. - Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Fixed an issue where Pods could not be created normally after Multus was uninstalled.
- Fix the issue where the egress gateway cannot route traffic from Pods in the same subnet as the egress gateway.Fixed in ACP 4.0.9.
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.7
Выпущено: 2025-12-10
Исправленные проблемы
- Before this update, the Tekton Pipeline component had a Kubernetes STIG security vulnerability, where secrets were exposed through environment variables in the tekton-hub-api deployment, violating security best practices. With this update, the secret mounting logic in environment variables has been completely removed to ensure that the tekton-hub-api deployment no longer exposes any credentials, complying with Kubernetes STIG security requirements.
- Before this update, the tekton-results-retention-policy-agent container in Tekton Results included sensitive information in environment variables, posing a security risk of exposing credentials in plaintext during container operations and logging scenarios. With this update, sensitive information has been properly secured and removed from environment variables to prevent credential leakage, ensuring that the retention-policy-agent container no longer contains plaintext passwords or tokens in its configuration, thereby enhancing the overall security posture of the Tekton Results system.
- Before this update, the PostgreSQL container in tekton-results-postgres-0 contained environment variables with sensitive information such as PASSWORD, password, TOKEN, and token, which posed a security risk when these credentials were exposed in plain text. With this update, the sensitive environment variables have been properly secured and no longer contain plain text passwords or tokens, ensuring that sensitive credentials are handled securely and not exposed in container environment variables.
- Before this update, the environment variables of the tekton-results-api container contained sensitive information, posing security risks when these credentials were exposed in plain text. With this update, sensitive environment variables have been properly protected, and passwords and token information are no longer exposed in plain text, enhancing the security of the tekton-results-api component.
- Before this update, the pipeline interface experienced multiple display issues including text display problems, poor user experience with variable completion multi-line functionality, and unstable behavior when updating triggers where parameters and workspace would sometimes appear and sometimes disappear, requiring users to reselect the pipeline to make them appear (including the Pipeline list). With this update, these display issues have been resolved. The pipeline and pipelinerun pages now display correctly with improved text rendering, enhanced variable completion multi-line functionality for better user experience, and stable trigger update behavior where parameters and workspace consistently appear without requiring pipeline reselection.
- Previously, the status field of an upmachinepool resource stored the associated machine resources without any ordering. This caused the resource to be updated on every reconcile loop, resulting in excessively large audit logs. This issue has now been fixed.
- Fixed an issue where modifying the Pod Security Policy when importing a namespace into a project did not take effect.
- Fixed an issue in the console where updating a Deployment in a specific sequence could cause the container lifecycle configuration to be unintentionally removed.
Известные проблемы
- Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Fix ovn-central can not auto recover if one ovn-db file is deleted manually.
- Fix ovn-central race with two leaders that interrupts cluster network.
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.6
Выпущено: 2025-11-03
Исправленные проблемы
- Fixed an issue where, after upgrading the platform from v3.x to v4.x, if a business cluster was not upgraded, the metrics created in its new custom monitoring dashboard could not be used by HPA.
Известные проблемы
- Previously, the status field of an upmachinepool resource stored the associated machine resources without any ordering. This caused the resource to be updated on every reconcile loop, resulting in excessively large audit logs. This issue has now been fixed.
- Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Fix ovn-central can not auto recover if one ovn-db file is deleted manually.
- Fix ovn-central race with two leaders that interrupts cluster network.
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - Fixed an issue in the console where updating a Deployment in a specific sequence could cause the container lifecycle configuration to be unintentionally removed.
- The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.5
Выпущено: 2025-09-30
Исправленные проблемы
- Previously, after uninstalling an Operator, the Operator status was incorrectly displayed as Absent, even though the Operator was actually Ready. Users had to manually re-upload the Operator using violet upload. This issue has now been resolved, and the Operator correctly appears as Ready after uninstallation.
- In some cases, installing a new Operator version after uploading it via violet upload would fail unexpectedly. This intermittent issue has been fixed.
- When an Operator or Cluster Plugin included multiple frontend extensions, the left-side navigation of these extensions could become unresponsive. The temporary workaround required users to add the annotation cpaas.io/auto-sync: "false" to the extension’s ConfigMap. This behavior has now been permanently fixed in the code.
Известные проблемы
- When upgrading a Redis Sentinel instance from v5 to v7, occasional brain split incidents may occur, potentially leading to data loss.
Solution: Back up the Redis instance data before performing a cross-version upgrade. - When cluster network anomalies occur, failure to update the primary node label of a PostgreSQL instance may result in abnormal instance status, potentially causing partial new connection failures.
- Before this update, the pipeline interface experienced multiple display issues including text display problems, poor user experience with variable completion multi-line functionality, and unstable behavior when updating triggers where parameters and workspace would sometimes appear and sometimes disappear, requiring users to reselect the pipeline to make them appear (including the Pipeline list). With this update, these display issues have been resolved. The pipeline and pipelinerun pages now display correctly with improved text rendering, enhanced variable completion multi-line functionality for better user experience, and stable trigger update behavior where parameters and workspace consistently appear without requiring pipeline reselection.
- Previously, the status field of an upmachinepool resource stored the associated machine resources without any ordering. This caused the resource to be updated on every reconcile loop, resulting in excessively large audit logs. This issue has now been fixed.
- Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Fix ovn-central can not auto recover if one ovn-db file is deleted manually.
- Fix ovn-central race with two leaders that interrupts cluster network.
- Fixed an issue where, after upgrading the platform from v3.x to v4.x, if a business cluster was not upgraded, the metrics created in its new custom monitoring dashboard could not be used by HPA.
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - Fixed an issue in the console where updating a Deployment in a specific sequence could cause the container lifecycle configuration to be unintentionally removed.
- The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.4
Выпущено: 2025-09-01
Исправленные проблемы
- Previously, upgrading the cluster would leave behind CRI (Container Runtime Interface) Pods, which blocked further upgrades to version 4.1. This issue has been fixed in version 4.0.4.
Известные проблемы
No issues in this release.
4.0.3
Выпущено: 2025-07-01
Исправленные проблемы
- Fixed an issue where master nodes in HA clusters using Calico could not be deleted.
Известные проблемы
- Previously, upgrading the cluster would leave behind CRI (Container Runtime Interface) Pods, which blocked further upgrades to version 4.1. This issue has been fixed in version 4.0.4.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
4.0.2
Выпущено: 2025-06-06
Исправленные проблемы
- Fixed an issue where performing a node drain on a public cloud Kubernetes cluster (such as ACK) managed by the platform failed with a 404 error.
Известные проблемы
- Fixed an issue where master nodes in HA clusters using Calico could not be deleted.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
4.0.1
Выпущено: 2025-05-04
Исправленные проблемы
- Under high api-server pressure, the aggregate worker in kyverno-report-controller may occasionally fail to start, preventing proper creation of compliance reports. This results in PolicyReport resources not being created, causing the Web Console to either display no compliance violation information or only partial report data. To troubleshoot, check the kyverno-report-controller pod logs for the presence of "starting worker aggregate-report-controller/worker" messages to verify proper operation. If the worker is not running, manually restart the kyverno-report-controller as a temporary solution.
Известные проблемы
- Fixed an issue where master nodes in HA clusters using Calico could not be deleted.
- Fixed an issue where performing a node drain on a public cloud Kubernetes cluster (such as ACK) managed by the platform failed with a 404 error.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
4.0.0
Выпущено: 2025-04-08
Функции и улучшения
Установка и обновление: модульная архитектура
Мы полностью переработали архитектуру нашей платформы, чтобы обеспечить беспрецедентную гибкость, более быстрые обновления и снижение операционных затрат.
Упрощенная установка Теперь наша платформа разворачивается через облегченный основной пакет, содержащий только необходимые компоненты. После создания базовой основы клиенты могут выбирать ровно те Operators или плагины кластера, которые им нужны — будь то DevOps, Service Mesh или другие специализированные функции, — и загружать, отправлять и устанавливать их по отдельности.
Точечные исправления
- Релизы исправлений включают только те компоненты, в которых действительно требуется устранение ошибок.
- Компоненты без исправлений остаются без изменений, что гарантирует, что остальная часть платформы не затрагивается.
- Клиенты применяют исправления через встроенный стандартизированный механизм обновления платформы, а не вручную обновляют отдельные компоненты, что значительно упрощает обслуживание и отслеживание.
Интеллектуальные обновления
- Во время обновления заменяются и перезапускаются только компоненты с новым кодом.
- Неподверженные изменениям компоненты сохраняют свои текущие версии и время работы.
- Это минимизирует простои и сокращает окно обслуживания, обеспечивая более плавный процесс обновления.
Независимое версионирование компонентов
- Большинство Operators выпускаются по собственным графикам, отдельно от основной платформы.
- Новые функции и исправления становятся доступными сразу после готовности — не нужно ждать полного обновления платформы.
- Такой подход ускоряет поставку и позволяет клиентам быстрее получать преимущества от улучшений.
Кластеры: декларативное управление жизненным циклом кластера с помощью Cluster API
Локальные кластеры теперь используют Kubernetes Cluster API для полностью декларативных операций, включая:
- Создание кластера
- Масштабирование узлов и присоединение
Такая бесшовная интеграция Cluster API напрямую вписывается в ваши конвейеры IaC, обеспечивая сквозное программное управление жизненным циклом кластера.
Operator и расширения: полная видимость возможностей
Полный каталог Operators
Теперь OperatorHub отображает все поддерживаемые Operators независимо от того, были ли их пакеты загружены на платформу. Это улучшение:
- Обеспечивает полную видимость возможностей платформы даже в изолированных средах
- Устраняет пробелы в информации между тем, что доступно, и тем, что известно пользователям
- Снижает затруднения при поиске при изучении возможностей платформы
Гибкость версий
Теперь пользователи могут выбирать конкретные версии Operators во время установки, а не ограничиваться только последней версией, что обеспечивает больший контроль над совместимостью компонентов и путями обновления.
Расширения Web Console
Теперь Operators поддерживают расширения Web Console на основе якорей, позволяя включать во внутреннюю структуру Operators frontend-образы, ориентированные на определенные функции, и бесшовно интегрировать их в Web Console платформы.
Улучшения cluster plugin
Все улучшения видимости Operators, выбора версии и возможностей расширения Web Console также применяются к cluster plugins, обеспечивая единообразный пользовательский опыт во всех расширениях платформы.
Оптимизация логики лог-запросов
Страница запросов к логам была оптимизирована для устранения проблем с удобством использования и производительностью, с которыми сталкиваются пользователи при работе с функцией лог-запросов:
- Исходная radio button заменена расширенным компонентом поиска. Теперь вы можете использовать поиск по логам так же, как поиск по GIT.
- Независимые условия запроса для содержимого логов
- Изменено расположение критериев запроса по времени. Теперь при изменении временного диапазона вы не будете сбрасывать критерии фильтрации логов.
- Оптимизирован API запросов к логам для повышения общей производительности запросов
Обновление ElasticSearch до 8.17
Мы обновили версию ElasticSearch до 8.17, чтобы воспользоваться функциями и улучшениями сообщества.
Аутентификация ALB
ALB теперь поддерживает различные механизмы аутентификации, что позволяет пользователю выполнять аутентификацию на уровне Ingress вместо реализации ее в каждом backend-приложении.
ALB поддерживает аннотации ingress-nginx
Этот выпуск добавляет поддержку распространенных аннотаций ingress-nginx в ALB, включая настройки keepalive, конфигурации тайм-аутов и HTTP-переадресации, что повышает совместимость с сообществом ingress-nginx.
Оптимизация live migration Kubevirt
Во время процесса live migration время сетевого прерывания было сокращено до менее чем 0,5 секунды, а существующие TCP-соединения не будут разорваны. Эта оптимизация значительно повышает стабильность и надежность миграции виртуальных машин в production-средах.
Оптимизация интеграции LDAP/OIDC
Поля формы интеграции LDAP/OIDC были скорректированы, главным образом за счет удаления ненужных/дублирующихся полей и оптимизации описаний полей. Интеграция LDAP/OIDC теперь поддерживает настройку через YAML, позволяя выполнять сопоставление атрибутов пользователя внутри YAML-файла.
Поддержка Source to Image (S2I)
- Добавлен оператор Alauda Container Platform Builds для автоматической сборки образов из исходного кода
- Поддерживаются стеки языков Java/Go/Node.js/Python
- Упрощает развертывание приложений через репозитории исходного кода
Локальное решение Registry
- ACP Registry предоставляет облегченный Docker Registry с функциями, готовыми для enterprise-среды
- Обеспечивает возможности управления образами «из коробки»
- Упрощает доставку приложений
Рефакторинг модуля GitOps
- ACP GitOps выделен в отдельную архитектуру плагина кластера
- Обновлен Argo CD до версии v2.14.x
- Улучшено управление жизненным циклом приложений на основе GitOps
Мониторинг на уровне namespace
- Добавлены динамические панели мониторинга на уровне namespace
- Обеспечивает визуализацию метрик Applications/Workloads/Pods
Интеграция Crossplane
- Выпущена дистрибуция Alauda Build of Crossplane
- Реализует предоставление ресурсов, ориентированное на приложение, через композиции XRD
Обновления виртуализации
- Обновлено до KubeVirt 1.4 для расширенных возможностей виртуализации
- Оптимизирована обработка образов для более быстрого выделения VM
- Оптимизирована live migration VM, теперь ее можно запускать непосредственно из UI с отображением статуса миграции
- Улучшено binding networking с поддержкой dual-stack (IPv4/IPv6)
- Добавлена поддержка vTPM для повышения безопасности VM
Обновления Ceph Storage
- Metro-DR со stretch cluster обеспечивает синхронизацию данных в реальном времени между зонами доступности
- Regional-DR с зеркалированием на основе pool повышает защиту данных
Обновления TopoLVM
- Добавлена поддержка развертывания multipath device, что повышает гибкость и стабильность
Исправленные проблемы
- Previously, after publishing a new Operator version, users had to wait 10 minutes before installing it. This waiting period has been reduced to 2 minutes, allowing faster installation of new Operator versions.
- On gpu nodes with multiple cards on a single node, gpu-manager occasionally exists, with unsuccessful scheduling issues for applications using vgpu.
- When using the pgpu plugin, you need to set the default runtimeclass on the gpu node to nvidia. if you don't, it may cause the application to not be able to request gpu resources properly.
- On a single GPU card, gpu-manager cannot create multiple inference services based on vllm, mlserver at the same time.
On AI platforms, this issue occurs when gpu-manager is used to create multiple inference services; on container platforms, this issue does not occur when gpu-manager is used to create multiple smart applications. - With mps, pods restart indefinitely when nodes are low on resources.
Известные проблемы
- Fixed an issue where master nodes in HA clusters using Calico could not be deleted.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Under high api-server pressure, the aggregate worker in kyverno-report-controller may occasionally fail to start, preventing proper creation of compliance reports. This results in PolicyReport resources not being created, causing the Web Console to either display no compliance violation information or only partial report data. To troubleshoot, check the kyverno-report-controller pod logs for the presence of "starting worker aggregate-report-controller/worker" messages to verify proper operation. If the worker is not running, manually restart the kyverno-report-controller as a temporary solution.
- Fixed an inconsistency where Secrets created through the web console only stored the Username and Password, and lacked the complete authentication field (auth) compared to those created via kubectl create. This issue previously caused authentication failures for build tools (e.g., buildah) that rely on the complete auth data.
- The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order.
No issues in this release.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.