Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[hami][WIP] Draft HAMI project integration#1844

Draft
tabu-a wants to merge 2 commits intocozystack:maincozystack/cozystack:mainfrom
tabu-a:maintabu-a/cozystack:mainCopy head branch name to clipboard
Draft

[hami][WIP] Draft HAMI project integration#1844
tabu-a wants to merge 2 commits intocozystack:maincozystack/cozystack:mainfrom
tabu-a:maintabu-a/cozystack:mainCopy head branch name to clipboard

Conversation

@tabu-a
Copy link
Contributor

@tabu-a tabu-a commented Jan 9, 2026

No description provided.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 9, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tabu-a, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the HAMI project into the Kubernetes application, providing a robust solution for managing and scheduling heterogeneous AI computing resources. It introduces a new addon that streamlines the deployment and configuration of HAMI's core components, including its scheduler extender, device plugin, and admission webhook. This integration significantly enhances the platform's ability to handle diverse GPU and XPU hardware, offering flexible resource allocation and optimized workload placement for AI and high-performance computing environments.

Highlights

  • New HAMI Addon Integration: Introduces the HAMI (Heterogeneous AI Computing Virtualization Middleware) project as a new addon within the Kubernetes application, enabling advanced GPU/XPU resource management.
  • Comprehensive Helm Chart: A new Helm chart named cozy-hami has been added, encapsulating all necessary Kubernetes resources for deploying the HAMI scheduler extender, device plugin, and admission webhook.
  • Extensive Device Support: The HAMI integration provides detailed configuration and scheduling capabilities for a wide range of heterogeneous computing resources, including NVIDIA, Cambricon MLU, Hygon DCU, Iluvatar, Metax, Enflame, Kunlunxin, and Huawei Ascend devices.
  • Configurable Scheduling Policies: The addon offers granular control over scheduling policies, device splitting, memory and core scaling, and NVIDIA MIG (Multi-Instance GPU) strategies through its values.yaml.
  • GPU Operator Dependency: The HAMI addon explicitly requires the NVIDIA GPU Operator to be enabled, ensuring a foundational layer for GPU management is in place.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates the HAMi project for GPU virtualization by adding a new HelmRelease, updating configurations, and vendoring the upstream HAMi Helm chart. The changes are a good starting point for the integration. My review highlights a critical issue with a hardcoded node name in the vendored chart which will prevent it from working correctly. I've also identified a high-severity issue regarding the use of infinite retries in the HelmRelease, which could impact cluster stability. Additionally, I've provided some medium-severity suggestions to improve code style and consistency.

Comment on lines +12 to +26
{
"nodeconfig": [
{
"name": "m5-cloudinfra-online02",
"operatingmode": "hami-core",
"devicememoryscaling": 1.8,
"devicesplitcount": 10,
"migstrategy":"none",
"filterdevices": {
"uuid": [],
"index": []
}
}
]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This ConfigMap contains a hardcoded node name m5-cloudinfra-online02. This will cause the device plugin configuration to be incorrect on any other node, preventing it from functioning correctly. This appears to be from an example configuration that was vendored. This hardcoded value should be removed. If no node-specific configuration is needed, an empty nodeconfig array would be a safer default.

    {
        "nodeconfig": []
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks irrelevant - this is a predefined value by hami maintaners

install:
createNamespace: false
remediation:
retries: -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The remediation strategy for installation is set to infinite retries (retries: -1). This can mask underlying persistent issues and cause the Flux controller to enter a perpetual reconciliation loop, consuming significant resources. It is recommended to use a finite number of retries (e.g., 5) to handle transient errors while ensuring that persistent failures are reported.

      retries: 5

upgrade:
force: true
remediation:
retries: -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The remediation strategy for upgrades is set to infinite retries (retries: -1). This can mask underlying persistent issues and cause the Flux controller to enter a perpetual reconciliation loop, consuming significant resources. It is recommended to use a finite number of retries (e.g., 5) to handle transient errors while ensuring that persistent failures are reported.

      retries: 5

Comment on lines +28 to +29


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are trailing empty lines with whitespace at the end of the file. These should be removed to improve code cleanliness and consistency.

- name: {{ . }}
{{- end }}
{{- end }}
{{- end -}} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file is missing a newline character at the end. It's a common convention and good practice to end files with a newline to prevent issues with certain tools and to adhere to POSIX standards. Several other new files in this chart have the same issue.

{{- end -}}

@lexfrei lexfrei self-requested a review January 20, 2026 10:36
Signed-off-by: Tamerlan Abu <tamerlanabu@gmail.com>
Signed-off-by: Tamerlan Abu <tamerlanabu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.