{ "cells": [ { "cell_type": "markdown", "id": "intro", "metadata": {}, "source": [ "# Qwen3.5 msModelSlim Quantization Verification\n", "\n", "This notebook follows the official Ascend `msmodelslim` Qwen3.5 example and verifies that the image has the required pieces in place for model quantization.\n", "The image is validated against the working stack used for a successful `Qwen3.5-27B` run: `msmodelslim 26.0.0a2`, `transformers 5.2.0`, `torchvision 0.24.0`, `mistral-common 1.11.0`, `easydict 1.13`, and `wcmatch 10.1`.\n", "\n", "Official reference:\n", "- https://raw.gitcode.com/Ascend/msmodelslim/raw/master/example/Qwen3_5/README.md\n", "\n", "This notebook does not start a large quantization job by default. It checks the runtime, prepares the official `msmodelslim quant` command, and only runs it when `RUN_QUANT = True`." ] }, { "cell_type": "markdown", "id": "official-notes", "metadata": {}, "source": [ "## Official Notes\n", "\n", "The official Qwen3.5 guide states:\n", "- `msmodelslim` must be installed.\n", "- `transformers==5.2.0` is required.\n", "- For the verified `Qwen3.5-27B` multimodal path in this image, `torchvision==0.24.0`, `mistral-common==1.11.0`, `easydict==1.13`, and `wcmatch==10.1` are also required at runtime.\n", "- Supported devices are Atlas A2 and Atlas A3 training/inference products.\n", "- Example command format:\n", "\n", "```bash\n", "msmodelslim quant --model_path ${MODEL_PATH} --save_path ${SAVE_PATH} --device npu --model_type Qwen3.5-27B --quant_type w8a8 --trust_remote_code True\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "config", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "# Update these paths before running a real quantization job.\n", "MODEL_PATH = Path('/opt/app-root/src/models/Qwen3.5-27B')\n", "SAVE_PATH = Path('/opt/app-root/src/output/qwen35-27b-w8a8')\n", "\n", "# Officially documented model types in the Qwen3.5 README.\n", "MODEL_TYPE = 'Qwen3.5-27B'\n", "QUANT_TYPE = 'w8a8'\n", "DEVICE = 'npu'\n", "TRUST_REMOTE_CODE = True\n", "\n", "# Safety switch: keep this False for environment verification.\n", "RUN_QUANT = False\n", "\n", "CANN_ENV = '/usr/local/Ascend/cann/set_env.sh'\n", "ATB_ENV = '/usr/local/Ascend/nnal/atb/set_env.sh'\n", "\n", "SUPPORTED_MODEL_TYPES = {\n", " 'Qwen3.5-397B-A17B': {'w8a8', 'w4a8'},\n", " 'Qwen3.5-122B-A10B': {'w8a8'},\n", " 'Qwen3.5-35B-A3B': {'w8a8'},\n", " 'Qwen3.5-27B': {'w8a8'},\n", "}\n", "\n", "print('MODEL_PATH =', MODEL_PATH)\n", "print('SAVE_PATH =', SAVE_PATH)\n", "print('MODEL_TYPE =', MODEL_TYPE)\n", "print('QUANT_TYPE =', QUANT_TYPE)\n", "print('RUN_QUANT =', RUN_QUANT)" ] }, { "cell_type": "code", "execution_count": null, "id": "helpers", "metadata": {}, "outputs": [], "source": [ "import os\n", "import shlex\n", "import stat\n", "import subprocess\n", "\n", "\n", "def run_cmd(cmd: str, check: bool = True) -> subprocess.CompletedProcess:\n", " env_prefix = f'source {CANN_ENV} && source {ATB_ENV}'\n", " full_cmd = f'{env_prefix} && {cmd}'\n", " print(f'$ {cmd}')\n", " result = subprocess.run(\n", " ['bash', '-lc', full_cmd],\n", " text=True,\n", " capture_output=True,\n", " env=os.environ.copy(),\n", " )\n", " if result.stdout:\n", " print(result.stdout)\n", " if result.stderr:\n", " print(result.stderr)\n", " if check and result.returncode != 0:\n", " raise RuntimeError(f'command failed with exit code {result.returncode}')\n", " return result\n", "\n", "\n", "def shell_quote(value: str) -> str:\n", " return shlex.quote(value)\n", "\n", "\n", "def find_writable_paths(root: Path, recursive: bool, limit: int = 20) -> list[str]:\n", " if not root.exists():\n", " return []\n", "\n", " offenders = []\n", "\n", " def inspect(candidate: Path) -> bool:\n", " try:\n", " mode = candidate.stat().st_mode\n", " except FileNotFoundError:\n", " return False\n", " if mode & (stat.S_IWGRP | stat.S_IWOTH):\n", " offenders.append(f'{candidate} mode={oct(mode & 0o777)}')\n", " return len(offenders) >= limit\n", " return False\n", "\n", " if inspect(root):\n", " return offenders\n", " if recursive and root.is_dir():\n", " for candidate in root.rglob('*'):\n", " if inspect(candidate):\n", " break\n", " return offenders\n", "\n", "\n", "def prepare_msmodelslim_permissions(model_path: Path, save_path: Path) -> None:\n", " run_cmd(f'mkdir -p {shell_quote(str(save_path))}')\n", " for path in (model_path.parent, save_path.parent):\n", " if path.exists():\n", " run_cmd(f'chmod go-w {shell_quote(str(path))}', check=False)\n", " for path in (model_path, save_path):\n", " if path.exists():\n", " run_cmd(f'chmod -R go-w {shell_quote(str(path))}', check=False)\n", "\n", "\n", "def print_msmodelslim_permission_report(model_path: Path, save_path: Path) -> bool:\n", " checks = [\n", " ('model parent', find_writable_paths(model_path.parent, recursive=False)),\n", " ('model tree', find_writable_paths(model_path, recursive=True)),\n", " ('save parent', find_writable_paths(save_path.parent, recursive=False)),\n", " ('save tree', find_writable_paths(save_path, recursive=True)),\n", " ]\n", "\n", " print('Permission preflight:')\n", " for label, offenders in checks:\n", " if offenders:\n", " print(f' [{label}] writable path(s) still present, showing up to {len(offenders)}:')\n", " for offender in offenders:\n", " print(f' {offender}')\n", " else:\n", " print(f' [{label}] OK')\n", "\n", " return not any(offenders for label, offenders in checks if label.startswith('model'))\n", "\n", "\n", "print('Helper functions loaded')" ] }, { "cell_type": "code", "execution_count": null, "id": "verify-env", "metadata": {}, "outputs": [], "source": [ "print('Python version:')\n", "run_cmd('python3 --version')\n", "\n", "print('Checking verified runtime stack:')\n", "run_cmd(\"python3 - <<'PY'\\nimport importlib.metadata as m\\nfor name in ['bracex', 'easydict', 'msmodelslim', 'transformers', 'huggingface-hub', 'torchvision', 'mistral-common', 'wcmatch']:\\n print(name, m.version(name))\\nfrom huggingface_hub import is_offline_mode\\nprint('huggingface-hub is_offline_mode ok', is_offline_mode())\\nimport bracex\\nimport easydict\\nimport mistral_common\\nimport msmodelslim\\nimport torchvision\\nimport transformers\\nimport wcmatch\\nprint('runtime imports ok')\\nPY\")\n", "\n", "print('Checking transformers version:')\n", "run_cmd(\"python3 -c \\\"import transformers; print(transformers.__version__)\\\"\")\n", "\n", "print('Checking msmodelslim CLI:')\n", "run_cmd('msmodelslim --help | head -n 20')" ] }, { "cell_type": "code", "execution_count": null, "id": "prepare-command", "metadata": {}, "outputs": [], "source": [ "if MODEL_TYPE not in SUPPORTED_MODEL_TYPES:\n", " raise ValueError(f'Unsupported MODEL_TYPE: {MODEL_TYPE}')\n", "\n", "if QUANT_TYPE not in SUPPORTED_MODEL_TYPES[MODEL_TYPE]:\n", " raise ValueError(f'{MODEL_TYPE} does not support quant type {QUANT_TYPE} in the official README')\n", "\n", "SAVE_PATH.mkdir(parents=True, exist_ok=True)\n", "\n", "quant_cmd = ' '.join([\n", " 'msmodelslim', 'quant',\n", " '--model_path', shell_quote(str(MODEL_PATH)),\n", " '--save_path', shell_quote(str(SAVE_PATH)),\n", " '--device', DEVICE,\n", " '--model_type', MODEL_TYPE,\n", " '--quant_type', QUANT_TYPE,\n", " '--trust_remote_code', str(TRUST_REMOTE_CODE),\n", "])\n", "\n", "print('Official quant command:')\n", "print(quant_cmd)\n", "print('Permission prep that will run when RUN_QUANT = True:')\n", "print(f' chmod go-w {MODEL_PATH.parent}')\n", "print(f' chmod -R go-w {MODEL_PATH}')\n", "print(f' mkdir -p {SAVE_PATH}')\n", "print(f' chmod go-w {SAVE_PATH.parent}')\n", "print(f' chmod -R go-w {SAVE_PATH}')\n", "\n", "if MODEL_PATH.exists():\n", " print(f'Model path exists: {MODEL_PATH}')\n", " if not print_msmodelslim_permission_report(MODEL_PATH, SAVE_PATH):\n", " print('MODEL_PATH is not ready yet. RUN_QUANT = True will try to fix permissions and re-check before quantization.')\n", "else:\n", " print(f'Model path does not exist yet: {MODEL_PATH}')\n", " print('Update MODEL_PATH before setting RUN_QUANT = True.')" ] }, { "cell_type": "markdown", "id": "supported-matrix", "metadata": {}, "source": [ "## Supported Official Qwen3.5 Variants\n", "\n", "- `Qwen3.5-397B-A17B`: `w8a8`, `w4a8`\n", "- `Qwen3.5-122B-A10B`: `w8a8`\n", "- `Qwen3.5-35B-A3B`: `w8a8`\n", "- `Qwen3.5-27B`: `w8a8`\n", "\n", "If you want to verify a different official model, change `MODEL_TYPE`, `MODEL_PATH`, `SAVE_PATH`, and `QUANT_TYPE` above." ] }, { "cell_type": "code", "execution_count": null, "id": "run-quant", "metadata": {}, "outputs": [], "source": [ "if RUN_QUANT:\n", " if not MODEL_PATH.exists():\n", " raise FileNotFoundError(f'MODEL_PATH does not exist: {MODEL_PATH}')\n", " prepare_msmodelslim_permissions(MODEL_PATH, SAVE_PATH)\n", " if not print_msmodelslim_permission_report(MODEL_PATH, SAVE_PATH):\n", " raise RuntimeError(\n", " 'MODEL_PATH still has group/other writable bits after permission prep. '\n", " 'This usually means the mounted model directory was created with permissive modes '\n", " 'or the storage backend is not honoring chmod.'\n", " )\n", " run_cmd(quant_cmd)\n", "else:\n", " print('RUN_QUANT is False; skipping the real quantization job.')\n", " print('Set RUN_QUANT = True after you prepare the model weights on disk.')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11" } }, "nbformat": 4, "nbformat_minor": 5 }