Примечания к выпуску
Содержание
4.0.9Исправленные проблемыИзвестные проблемы4.0.8Исправленные проблемыИзвестные проблемы4.0.7Исправленные проблемыИзвестные проблемы4.0.6Исправленные проблемыИзвестные проблемы4.0.5Исправленные проблемыИзвестные проблемы4.0.4Исправленные проблемыИзвестные проблемы4.0.3Исправленные проблемыИзвестные проблемы4.0.2Исправленные проблемыИзвестные проблемы4.0.1Исправленные проблемыИзвестные проблемы4.0.0Новые возможности и улучшенияУстановка и обновление: модульная архитектураКластеры: декларативное управление жизненным циклом кластера с помощью Cluster APIOperator & Extension: полная видимость возможностейОптимизация логики запросов логовОбновление ElasticSearch до версии 8.17Аутентификация ALBALB поддерживает аннотации ingress-nginxОптимизация live migration в KubevirtОптимизация интеграции LDAP/OIDCПоддержка Source to Image (S2I)Локальное решение RegistryРефакторинг модуля GitOpsМониторинг на уровне NamespaceИнтеграция CrossplaneОбновления виртуализацииОбновления Ceph StorageОбновления TopoLVMИсправленные проблемыИзвестные проблемы4.0.9
Исправленные проблемы
- Fixed an issue where the olm-registry pod would continuously restart, preventing the OperatorHub from functioning properly. This was caused by the `seccompProfile: RuntimeDefault` security configuration added during CIS compliance hardening, which blocked the `clone` syscall required by CGO operations. The seccomp profile has been adjusted to allow necessary syscalls while maintaining security compliance.Fixed in ACP 4.0.9.
- Fixed a performance issue where the permission validation during native application creation became extremely slow (10+ seconds) when the cluster had 60+ operators installed.Fixed in ACP 4.0.9.
- Fixed an issue where installing the marketplace plugin on workload clusters would fail. Fixed in ACP 4.0.9.
- Fix the issue where the egress gateway cannot route traffic from Pods in the same subnet as the egress gateway.Fixed in ACP 4.0.9.
Известные проблемы
- In certain cases, users may find that the operations automatically recorded by the platform in a ResourcePatch resource do not match the actual modifications they made to the component. As a result, when the ResourcePatch controller applies the patch, the component may undergo unexpected changes.
Workaround: Users should manually modify the ResourcePatch resource to ensure it reflects the intended changes. - Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.8
Исправленные проблемы
- When a user is automatically disabled by the system due to long-term lack of login, it will be automatically disabled again after being manually activated by the administrator. This issue has been fixed.
- Fix ovn-central can not auto recover if one ovn-db file is deleted manually.
- Fix ovn-central race with two leaders that interrupts cluster network.
Известные проблемы
- Fixed an issue where the olm-registry pod would continuously restart, preventing the OperatorHub from functioning properly. This was caused by the `seccompProfile: RuntimeDefault` security configuration added during CIS compliance hardening, which blocked the `clone` syscall required by CGO operations. The seccomp profile has been adjusted to allow necessary syscalls while maintaining security compliance.Fixed in ACP 4.0.9.
- Fixed a performance issue where the permission validation during native application creation became extremely slow (10+ seconds) when the cluster had 60+ operators installed.Fixed in ACP 4.0.9.
- Fixed an issue where installing the marketplace plugin on workload clusters would fail. Fixed in ACP 4.0.9.
- In certain cases, users may find that the operations automatically recorded by the platform in a ResourcePatch resource do not match the actual modifications they made to the component. As a result, when the ResourcePatch controller applies the patch, the component may undergo unexpected changes.
Workaround: Users should manually modify the ResourcePatch resource to ensure it reflects the intended changes. - Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Fix the issue where the egress gateway cannot route traffic from Pods in the same subnet as the egress gateway.Fixed in ACP 4.0.9.
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.7
Исправленные проблемы
- Before this update, the Tekton Pipeline component had a Kubernetes STIG security vulnerability, where secrets were exposed through environment variables in the tekton-hub-api deployment, violating security best practices. With this update, the secret mounting logic in environment variables has been completely removed to ensure that the tekton-hub-api deployment no longer exposes any credentials, complying with Kubernetes STIG security requirements.
- Before this update, the tekton-results-retention-policy-agent container in Tekton Results included sensitive information in environment variables, posing a security risk of exposing credentials in plaintext during container operations and logging scenarios. With this update, sensitive information has been properly secured and removed from environment variables to prevent credential leakage, ensuring that the retention-policy-agent container no longer contains plaintext passwords or tokens in its configuration, thereby enhancing the overall security posture of the Tekton Results system.
- Before this update, the PostgreSQL container in tekton-results-postgres-0 contained environment variables with sensitive information such as PASSWORD, password, TOKEN, and token, which posed a security risk when these credentials were exposed in plain text. With this update, the sensitive environment variables have been properly secured and no longer contain plain text passwords or tokens, ensuring that sensitive credentials are handled securely and not exposed in container environment variables.
- Before this update, the environment variables of the tekton-results-api container contained sensitive information, posing security risks when these credentials were exposed in plain text. With this update, sensitive environment variables have been properly protected, and passwords and token information are no longer exposed in plain text, enhancing the security of the tekton-results-api component.
- Before this update, the pipeline interface experienced multiple display issues including text display problems, poor user experience with variable completion multi-line functionality, and unstable behavior when updating triggers where parameters and workspace would sometimes appear and sometimes disappear, requiring users to reselect the pipeline to make them appear (including the Pipeline list). With this update, these display issues have been resolved. The pipeline and pipelinerun pages now display correctly with improved text rendering, enhanced variable completion multi-line functionality for better user experience, and stable trigger update behavior where parameters and workspace consistently appear without requiring pipeline reselection.
- Previously, the status field of an upmachinepool resource stored the associated machine resources without any ordering. This caused the resource to be updated on every reconcile loop, resulting in excessively large audit logs. This issue has now been fixed.
- Fixed an issue where modifying the Pod Security Policy when importing a namespace into a project did not take effect.
- Fixed an issue in the console where updating a Deployment in a specific sequence could cause the container lifecycle configuration to be unintentionally removed.
Известные проблемы
- Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Fix ovn-central can not auto recover if one ovn-db file is deleted manually.
- Fix ovn-central race with two leaders that interrupts cluster network.
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.6
Исправленные проблемы
- Fixed an issue where, after upgrading the platform from v3.x to v4.x, if a business cluster was not upgraded, the metrics created in its new custom monitoring dashboard could not be used by HPA.
Известные проблемы
- Previously, the status field of an upmachinepool resource stored the associated machine resources without any ordering. This caused the resource to be updated on every reconcile loop, resulting in excessively large audit logs. This issue has now been fixed.
- Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Fix ovn-central can not auto recover if one ovn-db file is deleted manually.
- Fix ovn-central race with two leaders that interrupts cluster network.
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - Fixed an issue in the console where updating a Deployment in a specific sequence could cause the container lifecycle configuration to be unintentionally removed.
- The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.5
Исправленные проблемы
- Previously, after uninstalling an Operator, the Operator status was incorrectly displayed as Absent, even though the Operator was actually Ready. Users had to manually re-upload the Operator using violet upload. This issue has now been resolved, and the Operator correctly appears as Ready after uninstallation.
- In some cases, installing a new Operator version after uploading it via violet upload would fail unexpectedly. This intermittent issue has been fixed.
- When an Operator or Cluster Plugin included multiple frontend extensions, the left-side navigation of these extensions could become unresponsive. The temporary workaround required users to add the annotation cpaas.io/auto-sync: "false" to the extension’s ConfigMap. This behavior has now been permanently fixed in the code.
Известные проблемы
- When upgrading a Redis Sentinel instance from v5 to v7, occasional brain split incidents may occur, potentially leading to data loss.
Solution: Back up the Redis instance data before performing a cross-version upgrade. - When cluster network anomalies occur, failure to update the primary node label of a PostgreSQL instance may result in abnormal instance status, potentially causing partial new connection failures.
- Before this update, the pipeline interface experienced multiple display issues including text display problems, poor user experience with variable completion multi-line functionality, and unstable behavior when updating triggers where parameters and workspace would sometimes appear and sometimes disappear, requiring users to reselect the pipeline to make them appear (including the Pipeline list). With this update, these display issues have been resolved. The pipeline and pipelinerun pages now display correctly with improved text rendering, enhanced variable completion multi-line functionality for better user experience, and stable trigger update behavior where parameters and workspace consistently appear without requiring pipeline reselection.
- Previously, the status field of an upmachinepool resource stored the associated machine resources without any ordering. This caused the resource to be updated on every reconcile loop, resulting in excessively large audit logs. This issue has now been fixed.
- Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Fix ovn-central can not auto recover if one ovn-db file is deleted manually.
- Fix ovn-central race with two leaders that interrupts cluster network.
- Fixed an issue where, after upgrading the platform from v3.x to v4.x, if a business cluster was not upgraded, the metrics created in its new custom monitoring dashboard could not be used by HPA.
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - Fixed an issue in the console where updating a Deployment in a specific sequence could cause the container lifecycle configuration to be unintentionally removed.
- The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.4
Исправленные проблемы
- Previously, upgrading the cluster would leave behind CRI (Container Runtime Interface) Pods, which blocked further upgrades to version 4.1. This issue has been fixed in version 4.0.4.
Известные проблемы
No issues in this release.
4.0.3
Исправленные проблемы
- Fixed an issue where master nodes in HA clusters using Calico could not be deleted.
Известные проблемы
- Previously, upgrading the cluster would leave behind CRI (Container Runtime Interface) Pods, which blocked further upgrades to version 4.1. This issue has been fixed in version 4.0.4.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
4.0.2
Исправленные проблемы
- Fixed an issue where performing a node drain on a public cloud Kubernetes cluster (such as ACK) managed by the platform failed with a 404 error.
Известные проблемы
- Fixed an issue where master nodes in HA clusters using Calico could not be deleted.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
4.0.1
Исправленные проблемы
- Under high api-server pressure, the aggregate worker in kyverno-report-controller may occasionally fail to start, preventing proper creation of compliance reports. This results in PolicyReport resources not being created, causing the Web Console to either display no compliance violation information or only partial report data. To troubleshoot, check the kyverno-report-controller pod logs for the presence of "starting worker aggregate-report-controller/worker" messages to verify proper operation. If the worker is not running, manually restart the kyverno-report-controller as a temporary solution.
Известные проблемы
- Fixed an issue where master nodes in HA clusters using Calico could not be deleted.
- Fixed an issue where performing a node drain on a public cloud Kubernetes cluster (such as ACK) managed by the platform failed with a 404 error.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
4.0.0
Новые возможности и улучшения
Установка и обновление: модульная архитектура
Мы полностью переработали архитектуру нашей платформы, чтобы обеспечить беспрецедентную гибкость, более быстрые обновления и снижение операционных затрат.
Оптимизированная установка
Наша платформа теперь разворачивается через компактный базовый пакет, содержащий только необходимые компоненты. После установки основы клиенты могут выбирать именно те Operators или плагины кластера, которые им нужны — будь то DevOps, Service Mesh или другие специализированные функции — и загружать, устанавливать их по отдельности.
Целевые патчи
- Патч-релизы включают только те компоненты, которые действительно требуют исправлений.
- Компоненты без исправлений остаются без изменений, что гарантирует сохранность остальной части платформы.
- Клиенты применяют патчи через встроенный стандартный механизм обновления платформы, а не вручную обновляют отдельные компоненты, что упрощает сопровождение и контроль.
Интеллектуальные обновления
- При обновлении заменяются и перезапускаются только компоненты с новым кодом.
- Неизменённые компоненты сохраняют текущие версии и время работы.
- Это минимизирует время простоя и сокращает окно обслуживания для более плавного обновления.
Независимое версионирование компонентов
- Большинство Operators следуют собственным графикам релизов, отдельно от ядра платформы.
- Новые функции и исправления становятся доступны сразу после готовности — без ожидания полного обновления платформы.
- Такой подход ускоряет доставку и позволяет клиентам быстрее получать улучшения.
Кластеры: декларативное управление жизненным циклом кластера с помощью Cluster API
Локальные кластеры теперь используют Kubernetes Cluster API для полностью декларативных операций, включая:
- Создание кластера
- Масштабирование и присоединение узлов
Эта бесшовная интеграция Cluster API напрямую вписывается в ваши IaC-пайплайны, обеспечивая сквозное программное управление жизненным циклом кластера.
Operator & Extension: полная видимость возможностей
Полный каталог Operators
OperatorHub теперь отображает все поддерживаемые Operators, независимо от того, загружены ли их пакеты в платформу. Это улучшение:
- Обеспечивает полную видимость возможностей платформы даже в изолированных средах
- Исключает информационные пробелы между доступным и известным пользователям
- Снижает трения при изучении возможностей платформы
Гибкость версий
Пользователи теперь могут выбирать конкретные версии Operator при установке, а не ограничиваться только последней версией, что даёт больший контроль над совместимостью компонентов и путями обновления.
Расширения Web Console
Operators теперь поддерживают расширения Web Console на основе якорей, позволяя включать функционально-специфичные frontend-образы в Operators и бесшовно интегрировать их в Web Console платформы.
Улучшения плагинов кластера
Все улучшения видимости Operators, выбора версий и возможностей расширения Web Console также применимы к плагинам кластера, обеспечивая единый пользовательский опыт для всех расширений платформы.
Оптимизация логики запросов логов
Страница запроса логов была оптимизирована для решения проблем с удобством и производительностью, с которыми сталкиваются пользователи при использовании функции запроса логов:
- Исходный радиобокс заменён на компонент расширенного поиска. Теперь вы можете использовать поиск по логам так же, как и поиск в GIT.
- Независимые условия запроса для содержимого логов
- Перемещено расположение критериев временного запроса. Теперь при изменении временного диапазона ваши фильтры логов не сбрасываются.
- Оптимизирован API запроса логов для улучшения общей производительности запросов
Обновление ElasticSearch до версии 8.17
Мы обновили версию ElasticSearch до 8.17, чтобы использовать новые функции и улучшения сообщества.
Аутентификация ALB
ALB теперь поддерживает различные механизмы аутентификации, что позволяет обрабатывать аутентификацию на уровне Ingress вместо реализации её в каждом backend-приложении.
ALB поддерживает аннотации ingress-nginx
В этом релизе добавлена поддержка распространённых аннотаций ingress-nginx в ALB, включая настройки keepalive, конфигурации таймаутов и HTTP-перенаправления, что улучшает совместимость с ingress-nginx сообщества.
Оптимизация live migration в Kubevirt
Во время процесса live migration время сетевого прерывания сокращено до менее чем 0.5 секунд, при этом существующие TCP-соединения не разрываются.
Эта оптимизация значительно повышает стабильность и надёжность миграций виртуальных машин в продуктивных средах.
Оптимизация интеграции LDAP/OIDC
Поля формы интеграции LDAP/OIDC были скорректированы, включая удаление ненужных/дублирующих полей и улучшение описаний.
Интеграция LDAP/OIDC теперь поддерживает конфигурацию через YAML, позволяя выполнять сопоставление атрибутов пользователей внутри YAML-файла.
Поддержка Source to Image (S2I)
- Добавлен оператор Alauda Container Platform Builds для автоматизированной сборки образов из исходного кода
- Поддержка стеков Java/Go/Node.js/Python
- Упрощение развертывания приложений через репозитории исходного кода
Локальное решение Registry
- ACP Registry предоставляет лёгкий Docker Registry с корпоративными функциями
- Обеспечивает готовые к использованию возможности управления образами
- Упрощает доставку приложений
Рефакторинг модуля GitOps
- ACP GitOps выделен в отдельную архитектуру плагина кластера
- Обновлён Argo CD до версии v2.14.x
- Улучшено управление жизненным циклом приложений на основе GitOps
Мониторинг на уровне Namespace
- Введены динамические панели мониторинга на уровне namespace
- Отображение метрик приложений, workloads и pods
Интеграция Crossplane
- Выпущено дистрибутив Alauda Build of Crossplane
- Реализовано приложение-ориентированное предоставление ресурсов через XRD compositions
Обновления виртуализации
- Обновление до KubeVirt 1.4 для расширенных возможностей виртуализации
- Оптимизация работы с образами для ускорения развертывания ВМ
- Оптимизация live migration ВМ с возможностью запуска из UI и отображением статуса миграции
- Улучшено сетевое связывание с поддержкой dual-stack (IPv4/IPv6)
- Добавлена поддержка vTPM для повышения безопасности ВМ
Обновления Ceph Storage
- Metro-DR с stretch cluster обеспечивает синхронизацию данных в реальном времени между зонами доступности
- Regional-DR с зеркалированием на уровне пулов повышает защиту данных
Обновления TopoLVM
- Добавлена поддержка развертывания мультипутевых устройств, повышающая гибкость и стабильность
Исправленные проблемы
- Previously, after publishing a new Operator version, users had to wait 10 minutes before installing it. This waiting period has been reduced to 2 minutes, allowing faster installation of new Operator versions.
- On gpu nodes with multiple cards on a single node, gpu-manager occasionally exists, with unsuccessful scheduling issues for applications using vgpu.
- When using the pgpu plugin, you need to set the default runtimeclass on the gpu node to nvidia. if you don't, it may cause the application to not be able to request gpu resources properly.
- On a single GPU card, gpu-manager cannot create multiple inference services based on vllm, mlserver at the same time.
On AI platforms, this issue occurs when gpu-manager is used to create multiple inference services; on container platforms, this issue does not occur when gpu-manager is used to create multiple smart applications. - With mps, pods restart indefinitely when nodes are low on resources.
Известные проблемы
- Fixed an issue where master nodes in HA clusters using Calico could not be deleted.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Under high api-server pressure, the aggregate worker in kyverno-report-controller may occasionally fail to start, preventing proper creation of compliance reports. This results in PolicyReport resources not being created, causing the Web Console to either display no compliance violation information or only partial report data. To troubleshoot, check the kyverno-report-controller pod logs for the presence of "starting worker aggregate-report-controller/worker" messages to verify proper operation. If the worker is not running, manually restart the kyverno-report-controller as a temporary solution.
- Fixed an inconsistency where Secrets created through the web console only stored the Username and Password, and lacked the complete authentication field (auth) compared to those created via kubectl create. This issue previously caused authentication failures for build tools (e.g., buildah) that rely on the complete auth data.
- The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order.
No issues in this release.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.