Примечания к выпуску
Содержание
4.0.8
Исправленные ошибки
- When a user is automatically disabled by the system due to long-term lack of login, it will be automatically disabled again after being manually activated by the administrator. This issue has been fixed.
- Fix ovn-central can not auto recover if one ovn-db file is deleted manually.
- Fix ovn-central race with two leaders that interrupts cluster network.
Известные проблемы
- In certain cases, users may find that the operations automatically recorded by the platform in a ResourcePatch resource do not match the actual modifications they made to the component. As a result, when the ResourcePatch controller applies the patch, the component may undergo unexpected changes.
Workaround: Users should manually modify the ResourcePatch resource to ensure it reflects the intended changes. - Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.7
Исправленные ошибки
- Before this update, the Tekton Pipeline component had a Kubernetes STIG security vulnerability, where secrets were exposed through environment variables in the tekton-hub-api deployment, violating security best practices. With this update, the secret mounting logic in environment variables has been completely removed to ensure that the tekton-hub-api deployment no longer exposes any credentials, complying with Kubernetes STIG security requirements.
- Before this update, the tekton-results-retention-policy-agent container in Tekton Results included sensitive information in environment variables, posing a security risk of exposing credentials in plaintext during container operations and logging scenarios. With this update, sensitive information has been properly secured and removed from environment variables to prevent credential leakage, ensuring that the retention-policy-agent container no longer contains plaintext passwords or tokens in its configuration, thereby enhancing the overall security posture of the Tekton Results system.
- Before this update, the PostgreSQL container in tekton-results-postgres-0 contained environment variables with sensitive information such as PASSWORD, password, TOKEN, and token, which posed a security risk when these credentials were exposed in plain text. With this update, the sensitive environment variables have been properly secured and no longer contain plain text passwords or tokens, ensuring that sensitive credentials are handled securely and not exposed in container environment variables.
- Before this update, the environment variables of the tekton-results-api container contained sensitive information, posing security risks when these credentials were exposed in plain text. With this update, sensitive environment variables have been properly protected, and passwords and token information are no longer exposed in plain text, enhancing the security of the tekton-results-api component.
- Before this update, the pipeline interface experienced multiple display issues including text display problems, poor user experience with variable completion multi-line functionality, and unstable behavior when updating triggers where parameters and workspace would sometimes appear and sometimes disappear, requiring users to reselect the pipeline to make them appear (including the Pipeline list). With this update, these display issues have been resolved. The pipeline and pipelinerun pages now display correctly with improved text rendering, enhanced variable completion multi-line functionality for better user experience, and stable trigger update behavior where parameters and workspace consistently appear without requiring pipeline reselection.
- Previously, the status field of an upmachinepool resource stored the associated machine resources without any ordering. This caused the resource to be updated on every reconcile loop, resulting in excessively large audit logs. This issue has now been fixed.
- Fixed an issue where modifying the Pod Security Policy when importing a namespace into a project did not take effect.
- Fixed an issue in the console where updating a Deployment in a specific sequence could cause the container lifecycle configuration to be unintentionally removed.
Известные проблемы
- Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Fix ovn-central can not auto recover if one ovn-db file is deleted manually.
- Fix ovn-central race with two leaders that interrupts cluster network.
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.6
Исправленные ошибки
- Fixed an issue where, after upgrading the platform from v3.x to v4.x, if a business cluster was not upgraded, the metrics created in its new custom monitoring dashboard could not be used by HPA.
Известные проблемы
- Previously, the status field of an upmachinepool resource stored the associated machine resources without any ordering. This caused the resource to be updated on every reconcile loop, resulting in excessively large audit logs. This issue has now been fixed.
- Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Fix ovn-central can not auto recover if one ovn-db file is deleted manually.
- Fix ovn-central race with two leaders that interrupts cluster network.
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - Fixed an issue in the console where updating a Deployment in a specific sequence could cause the container lifecycle configuration to be unintentionally removed.
- The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.5
Исправленные ошибки
- Previously, after uninstalling an Operator, the Operator status was incorrectly displayed as Absent, even though the Operator was actually Ready. Users had to manually re-upload the Operator using violet upload. This issue has now been resolved, and the Operator correctly appears as Ready after uninstallation.
- In some cases, installing a new Operator version after uploading it via violet upload would fail unexpectedly. This intermittent issue has been fixed.
- When an Operator or Cluster Plugin included multiple frontend extensions, the left-side navigation of these extensions could become unresponsive. The temporary workaround required users to add the annotation cpaas.io/auto-sync: "false" to the extension’s ConfigMap. This behavior has now been permanently fixed in the code.
Известные проблемы
- When upgrading a Redis Sentinel instance from v5 to v7, occasional brain split incidents may occur, potentially leading to data loss.
Solution: Back up the Redis instance data before performing a cross-version upgrade. - When cluster network anomalies occur, failure to update the primary node label of a PostgreSQL instance may result in abnormal instance status, potentially causing partial new connection failures.
- Before this update, the pipeline interface experienced multiple display issues including text display problems, poor user experience with variable completion multi-line functionality, and unstable behavior when updating triggers where parameters and workspace would sometimes appear and sometimes disappear, requiring users to reselect the pipeline to make them appear (including the Pipeline list). With this update, these display issues have been resolved. The pipeline and pipelinerun pages now display correctly with improved text rendering, enhanced variable completion multi-line functionality for better user experience, and stable trigger update behavior where parameters and workspace consistently appear without requiring pipeline reselection.
- Previously, the status field of an upmachinepool resource stored the associated machine resources without any ordering. This caused the resource to be updated on every reconcile loop, resulting in excessively large audit logs. This issue has now been fixed.
- Previously, if a cluster contained nodes with an empty Display Name, users were unable to filter nodes by typing in the node selector dropdown on the node details page. This issue has been fixed in ACP 4.2.0.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.
- When a StatefulSet's Pod is stopped and then restarted, the platform takes the earliest runtime of the Pod's daily operation as the start time and the latest runtime as the end time, ignoring any intermediate periods when it was not running. This results in the Pod being metered for more time than its actual operational hours.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Fix ovn-central can not auto recover if one ovn-db file is deleted manually.
- Fix ovn-central race with two leaders that interrupts cluster network.
- Fixed an issue where, after upgrading the platform from v3.x to v4.x, if a business cluster was not upgraded, the metrics created in its new custom monitoring dashboard could not be used by HPA.
- Application creation failure triggered by the defaultMode field in YAML.
Affected Path: Alauda Container Platform → Application Management → Application List → Create from YAML. Submitting YAML containing the defaultMode field (typically used for ConfigMap/Secret volume mount permissions) triggers validation errors and causes deployment failure.
Workaround: Manually remove all defaultMode declarations before application creation. - Fixed an issue in the console where updating a Deployment in a specific sequence could cause the container lifecycle configuration to be unintentionally removed.
- The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order. - When pre-delete post-delete hook is set in helm chart.
When the delete template application is executed and the chart is uninstalled, the hook execution fails for some reasons, thus the application cannot be deleted. It is necessary to investigate the cause and give priority to solving the problem of hook execution failure.
4.0.4
Исправленные ошибки
- Previously, upgrading the cluster would leave behind CRI (Container Runtime Interface) Pods, which blocked further upgrades to version 4.1. This issue has been fixed in version 4.0.4.
Известные проблемы
No issues in this release.
4.0.3
Исправленные ошибки
- Fixed an issue where master nodes in HA clusters using Calico could not be deleted.
Известные проблемы
- Previously, upgrading the cluster would leave behind CRI (Container Runtime Interface) Pods, which blocked further upgrades to version 4.1. This issue has been fixed in version 4.0.4.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
4.0.2
Исправленные ошибки
- Fixed an issue where performing a node drain on a public cloud Kubernetes cluster (such as ACK) managed by the platform failed with a 404 error.
Известные проблемы
- Fixed an issue where master nodes in HA clusters using Calico could not be deleted.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
4.0.1
Исправленные ошибки
- Under high api-server pressure, the aggregate worker in kyverno-report-controller may occasionally fail to start, preventing proper creation of compliance reports. This results in PolicyReport resources not being created, causing the Web Console to either display no compliance violation information or only partial report data. To troubleshoot, check the kyverno-report-controller pod logs for the presence of "starting worker aggregate-report-controller/worker" messages to verify proper operation. If the worker is not running, manually restart the kyverno-report-controller as a temporary solution.
Известные проблемы
- Fixed an issue where master nodes in HA clusters using Calico could not be deleted.
- Fixed an issue where performing a node drain on a public cloud Kubernetes cluster (such as ACK) managed by the platform failed with a 404 error.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
4.0.0
Новые возможности и улучшения
Установка и обновление: Модульная архитектура
Мы полностью переработали архитектуру нашей платформы, чтобы обеспечить беспрецедентную гибкость, более быстрые обновления и снижение операционных затрат.
Оптимизированная установка
Наша платформа теперь разворачивается через компактный базовый пакет, содержащий только необходимые компоненты. После установки основы клиенты могут выбирать именно те Operators или плагины кластера, которые им нужны — будь то DevOps, Service Mesh или другие специализированные функции — и загружать, устанавливать их по отдельности.
Целевые патчи
- Патч-релизы включают только те компоненты, которые действительно требуют исправлений.
- Компоненты без исправлений остаются без изменений, что гарантирует сохранность остальной части платформы.
- Клиенты применяют патчи через встроенный в платформу стандартизированный механизм обновления, а не вручную обновляют отдельные компоненты, что значительно упрощает сопровождение и отслеживание.
Интеллектуальные обновления
- При обновлении заменяются и перезапускаются только компоненты с новым кодом.
- Неизменённые компоненты сохраняют текущие версии и время работы.
- Это минимизирует время простоя и сокращает окно обслуживания для более плавного обновления.
Независимое версионирование компонентов
- Большинство Operators следуют собственным графикам релизов, отдельно от ядра платформы.
- Новые функции и исправления становятся доступны сразу после готовности — без ожидания полного обновления платформы.
- Такой подход ускоряет доставку и позволяет клиентам быстрее получать улучшения.
Кластеры: Декларативное управление жизненным циклом кластера с помощью Cluster API
Локальные кластеры теперь используют Kubernetes Cluster API для полностью декларативных операций, включая:
- Создание кластера
- Масштабирование и присоединение узлов
Эта бесшовная интеграция Cluster API напрямую вписывается в ваши IaC-пайплайны, обеспечивая сквозное программное управление жизненным циклом кластера.
Operator & Extension: Полная видимость возможностей
Полный каталог Operators
OperatorHub теперь отображает все поддерживаемые Operators независимо от того, загружены ли их пакеты в платформу. Это улучшение:
- Обеспечивает полную видимость возможностей платформы даже в изолированных средах
- Устраняет информационные пробелы между доступным и известным пользователям
- Снижает трения при изучении возможностей платформы
Гибкость выбора версии
Пользователи теперь могут выбирать конкретные версии Operator при установке, а не ограничиваться только последней версией, что даёт больший контроль над совместимостью компонентов и путями обновления.
Расширения Web Console
Operators теперь поддерживают расширения Web Console на основе якорей, позволяя включать функционально-специфичные frontend-образы в Operators и беспрепятственно интегрировать их в Web Console платформы.
Улучшения плагинов кластера
Все улучшения видимости Operators, выбора версии и возможностей расширения Web Console также применяются к плагинам кластера, обеспечивая единообразный пользовательский опыт для всех расширений платформы.
Оптимизация логики запросов логов
Страница запросов логов была оптимизирована для решения проблем с удобством и производительностью, с которыми сталкиваются пользователи при использовании функции поиска логов:
- Исходный радиобокс заменён на компонент расширенного поиска. Теперь вы можете использовать поиск логов так же, как поиск в GIT.
- Независимые условия запроса для содержимого логов
- Изменено расположение критериев временного запроса. Теперь при изменении временного диапазона ваши фильтры логов не сбрасываются.
- Оптимизирован API запросов логов для улучшения общей производительности запросов
Обновление ElasticSearch до версии 8.17
Мы обновили версию ElasticSearch до 8.17, чтобы использовать функции и улучшения сообщества.
Аутентификация ALB
ALB теперь поддерживает различные механизмы аутентификации, что позволяет обрабатывать аутентификацию на уровне Ingress вместо реализации её в каждом backend-приложении.
ALB поддерживает аннотации ingress-nginx
В этом релизе добавлена поддержка распространённых аннотаций ingress-nginx в ALB, включая настройки keepalive, конфигурации таймаутов и HTTP-перенаправления, что улучшает совместимость с ingress-nginx сообщества.
Оптимизация живой миграции Kubevirt
Во время процесса живой миграции время сетевого прерывания сокращено до менее чем 0.5 секунд, и существующие TCP-соединения не разрываются.
Эта оптимизация значительно повышает стабильность и надёжность миграций виртуальных машин в продуктивных средах.
Оптимизация интеграции LDAP/OIDC
Формы интеграции LDAP/OIDC были скорректированы, включая удаление ненужных/дублирующих полей и оптимизацию описаний полей.
Интеграция LDAP/OIDC теперь поддерживает конфигурацию через YAML, позволяя выполнять сопоставление атрибутов пользователей внутри YAML-файла.
Поддержка Source to Image (S2I)
- Добавлен оператор Alauda Container Platform Builds для автоматизированной сборки образов из исходного кода
- Поддержка языковых стеков Java/Go/Node.js/Python
- Упрощает развертывание приложений через репозитории исходного кода
Локальное решение Registry
- ACP Registry предоставляет лёгкий Docker Registry с корпоративными функциями
- Обеспечивает готовые к использованию возможности управления образами
- Упрощает доставку приложений
Рефакторинг модуля GitOps
- ACP GitOps выделен в отдельную архитектуру плагина кластера
- Обновлён Argo CD до версии v2.14.x
- Улучшено управление жизненным циклом приложений на базе GitOps
Мониторинг на уровне Namespace
- Введены динамические панели мониторинга на уровне namespace
- Обеспечивает визуализацию метрик приложений, workloads и pods
Интеграция Crossplane
- Выпущено дистрибутив Alauda Build of Crossplane
- Реализует ориентированное на приложения предоставление ресурсов через XRD compositions
Обновления виртуализации
- Обновление до KubeVirt 1.4 для расширенных возможностей виртуализации
- Оптимизация работы с образами для ускоренного развертывания ВМ
- Оптимизация живой миграции ВМ с возможностью запуска из UI и отображением статуса миграции
- Улучшенная сетевая привязка с поддержкой dual-stack (IPv4/IPv6)
- Добавлена поддержка vTPM для повышения безопасности ВМ
Обновления Ceph Storage
- Metro-DR с stretch cluster обеспечивает синхронизацию данных в реальном времени между зонами доступности
- Regional-DR с зеркалированием на уровне pool повышает защиту данных
Обновления TopoLVM
- Добавлена поддержка развертывания multipath-устройств, повышающая гибкость и стабильность
Исправленные ошибки
- Previously, after publishing a new Operator version, users had to wait 10 minutes before installing it. This waiting period has been reduced to 2 minutes, allowing faster installation of new Operator versions.
- On gpu nodes with multiple cards on a single node, gpu-manager occasionally exists, with unsuccessful scheduling issues for applications using vgpu.
- When using the pgpu plugin, you need to set the default runtimeclass on the gpu node to nvidia. if you don't, it may cause the application to not be able to request gpu resources properly.
- On a single GPU card, gpu-manager cannot create multiple inference services based on vllm, mlserver at the same time.
On AI platforms, this issue occurs when gpu-manager is used to create multiple inference services; on container platforms, this issue does not occur when gpu-manager is used to create multiple smart applications. - With mps, pods restart indefinitely when nodes are low on resources.
Известные проблемы
- Fixed an issue where master nodes in HA clusters using Calico could not be deleted.
- When upgrading from 3.18.0 to 4.0.1, running the upgrade script may fail with a timeout if the global cluster uses the built-in image registry with the protect-secret-files feature enabled. This issue has been fixed in ACP 4.1.0.
- Occasionally, a pod may become stuck in the Terminating state and cannot be deleted by containerd. Although containerd attempts the deletion operation, the container remains in a pseudo-running state. The containerd logs show OCI "runtime exec failed: exec failed: cannot exec in a stopped container: unknown" while the container status appears as Running. This issue occurs very rarely in containerd 1.7.23 (observed only once) and affects only individual pods when triggered. If encountered, restart containerd as a temporary workaround. This is a known issue in the containerd community, tracked at https://github.com/containerd/containerd/issues/6080.
- When upgrading clusters to Kubernetes 1.31, all pods in the cluster will restart. This behavior is caused by changes to the Pod spec fields in Kubernetes 1.31 and cannot be avoided. For more details, please refer to the Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/129385
- Under high api-server pressure, the aggregate worker in kyverno-report-controller may occasionally fail to start, preventing proper creation of compliance reports. This results in PolicyReport resources not being created, causing the Web Console to either display no compliance violation information or only partial report data. To troubleshoot, check the kyverno-report-controller pod logs for the presence of "starting worker aggregate-report-controller/worker" messages to verify proper operation. If the worker is not running, manually restart the kyverno-report-controller as a temporary solution.
- Fixed an inconsistency where Secrets created through the web console only stored the Username and Password, and lacked the complete authentication field (auth) compared to those created via kubectl create. This issue previously caused authentication failures for build tools (e.g., buildah) that rely on the complete auth data.
- The default pool .mgr created by ceph-mgr uses the default Crush Rule, which may fail to properly select OSDs in a stretched cluster. To resolve this, the .mgr pool must be created using CephBlockPool. However, due to timing uncertainties, ceph-mgr might attempt to create the .mgr pool before the Rook Operator completes its setup, leading to conflicts.
If encountering this issue, restart the rook-ceph-mgr Pod to trigger reinitialization.
If unresolved, manually clean up the conflicting .mgr pool and redeploy the cluster to ensure proper creation order.
No issues in this release.
- The temporary files were not deleted after log archiving, preventing disk space from being reclaimed. This issue has been fixed.
- When the amount of logs in a single container is too large (standard output or file logs), it can happen that a log file reaches the rotate threshold and triggers a rotate, but the contents of the logs in it have not been captured yet, which results in the simultaneous capture of the old and new log files, and a chaotic log order.