\n",
+ "\n",
+ "以下是一些 Google 線上機器學習課程的筆記本。詳情請參閱
完整的課程網站。\n",
+ "- [Pandas DataFrame 簡介](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/pandas_dataframe_ultraquick_tutorial.ipynb)\n",
+ "- [以 tf.keras 使用合成資料進行線性迴歸](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/linear_regression_with_synthetic_data.ipynb)\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "### 使用加速硬體\n",
+ "
\n",
+ "\n",
+ "- [搭配 GPU 使用 TensorFlow](/notebooks/gpu.ipynb)\n",
+ "- [使用 TPU 的 TensorFlow](/notebooks/tpu.ipynb)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "P-H6Lw1vyNNd"
+ },
+ "source": [
+ "
\n",
+ "\n",
+ "
\n",
+ "\n",
+ "### 主要範例\n",
+ "\n",
+ "
\n",
+ "\n",
+ "-
NeMo Voice Swap:使用 Nvidia 的 NeMo 對話式 AI 工具組將音訊片段中的語音換成電腦產生的語音。\n",
+ "\n",
+ "-
重新訓練圖片分類工具:以預先訓練的圖片分類工具為基礎,建立一個分辨花朵的 Keras 模型。\n",
+ "-
文字分類:將 IMDB 電影評論分類為
正面或
負面。\n",
+ "-
風格轉換:運用深度學習轉換圖片的風格。\n",
+ "-
支援多種語言的 Universal Sentence Encoder 問與答:使用機器學習模型來回答 SQuAD 資料集的問題。\n",
+ "-
影片畫面內插:預測影片在第一個與最後一個畫面之間的內容。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "!pip install https://github.com/Isotr0py/SakuraLLM-Notebooks/releases/download/wheels/llama_cpp_python-0.2.39-cp310-cp310-manylinux_2_17_x86_64.whl"
+ ],
+ "metadata": {
+ "id": "GxVzvQ-03kXF",
+ "outputId": "28977c48-3ef2-4d16-fa8e-52674c821f09",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "execution_count": 1,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Collecting llama-cpp-python==0.2.39\n",
+ " Downloading https://github.com/Isotr0py/SakuraLLM-Notebooks/releases/download/wheels/llama_cpp_python-0.2.39-cp310-cp310-manylinux_2_17_x86_64.whl (15.1 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m15.1/15.1 MB\u001b[0m \u001b[31m28.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: typing-extensions>=4.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-cpp-python==0.2.39) (4.10.0)\n",
+ "Requirement already satisfied: numpy>=1.20.0 in /usr/local/lib/python3.10/dist-packages (from llama-cpp-python==0.2.39) (1.25.2)\n",
+ "Collecting diskcache>=5.6.1 (from llama-cpp-python==0.2.39)\n",
+ " Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m45.5/45.5 kB\u001b[0m \u001b[31m1.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: jinja2>=2.11.3 in /usr/local/lib/python3.10/dist-packages (from llama-cpp-python==0.2.39) (3.1.3)\n",
+ "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2>=2.11.3->llama-cpp-python==0.2.39) (2.1.5)\n",
+ "Installing collected packages: diskcache, llama-cpp-python\n",
+ "Successfully installed diskcache-5.6.3 llama-cpp-python-0.2.39\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "repo_id = \"TheBloke/CodeLlama-7B-Instruct-GGUF\"\n",
+ "file_name=\"codellama-7b-instruct.Q2_K.gguf\"\n",
+ "\n",
+ "from huggingface_hub import hf_hub_download\n",
+ "hf_hub_download(repo_id=repo_id, filename=file_name,local_dir=\"/kaggle/working\")"
+ ],
+ "metadata": {
+ "id": "L8oDP-oP3o18",
+ "outputId": "308d055b-c2ee-4f74-9496-7ac8c4f35ff8",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 67,
+ "referenced_widgets": [
+ "170aaac40a084f4f8794fc409c4094c7",
+ "aca3ba766bf843888ce6abb335707f07",
+ "e9572463294049dfa0428bc51851af44",
+ "cb27c1ef215c407da8e9c90959d17560",
+ "4a2f923d36a241928958c552d5535140",
+ "af001d0fb6594ccea5029fda79124c51",
+ "2cc5d1eeb3dc4c67a19c20afa4997fa0",
+ "4f6e435fa1cd4a209af3132635b5414f",
+ "f8e4fa05285c41e0b8ee3ba27d746ce8",
+ "8fd4d6dfd5c64af09b1a729df75232d3",
+ "bd656ec2ee95401d8798e8365e8da283"
+ ]
+ }
+ },
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "codellama-7b-instruct.Q2_K.gguf: 0%| | 0.00/2.83G [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "170aaac40a084f4f8794fc409c4094c7"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'/kaggle/working/codellama-7b-instruct.Q2_K.gguf'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 2
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from llama_cpp import Llama\n",
+ "model_path=\"/kaggle/working/codellama-7b-instruct.Q2_K.gguf\"\n",
+ "llm = Llama(\n",
+ " model_path=model_path,n_gpu_layers=100, n_threads=6, n_ctx=3584, n_batch=521, verbose=True\n",
+ ")\n",
+ "output = llm(\n",
+ " \"Q: python如何用time计算开始结束时间 A: \", # Prompt\n",
+ " max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window\n",
+ " stop=[\"Q:\", \"\\n\"], # Stop generating just before the model would generate a new question\n",
+ " echo=True # Echo the prompt back in the output\n",
+ ") # Generate a completion, can also call create_completion\n",
+ "print(output[\"choices\"][0][\"text\"])"
+ ],
+ "metadata": {
+ "id": "g5Ycrz5V3rcP",
+ "outputId": "861afece-7493-497e-e78c-c93402b80c60",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "execution_count": 4,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /kaggle/working/codellama-7b-instruct.Q2_K.gguf (version GGUF V2)\n",
+ "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n",
+ "llama_model_loader: - kv 0: general.architecture str = llama\n",
+ "llama_model_loader: - kv 1: general.name str = codellama_codellama-7b-instruct-hf\n",
+ "llama_model_loader: - kv 2: llama.context_length u32 = 16384\n",
+ "llama_model_loader: - kv 3: llama.embedding_length u32 = 4096\n",
+ "llama_model_loader: - kv 4: llama.block_count u32 = 32\n",
+ "llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008\n",
+ "llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128\n",
+ "llama_model_loader: - kv 7: llama.attention.head_count u32 = 32\n",
+ "llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32\n",
+ "llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010\n",
+ "llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000.000000\n",
+ "llama_model_loader: - kv 11: general.file_type u32 = 10\n",
+ "llama_model_loader: - kv 12: tokenizer.ggml.model str = llama\n",
+ "llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32016] = [\"
\", \"\", \"\", \"<0x00>\", \"<...\n",
+ "llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32016] = [0.000000, 0.000000, 0.000000, 0.0000...\n",
+ "llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32016] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...\n",
+ "llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1\n",
+ "llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2\n",
+ "llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0\n",
+ "llama_model_loader: - kv 19: general.quantization_version u32 = 2\n",
+ "llama_model_loader: - type f32: 65 tensors\n",
+ "llama_model_loader: - type q2_K: 65 tensors\n",
+ "llama_model_loader: - type q3_K: 160 tensors\n",
+ "llama_model_loader: - type q6_K: 1 tensors\n",
+ "llm_load_vocab: mismatch in special tokens definition ( 264/32016 vs 259/32016 ).\n",
+ "llm_load_print_meta: format = GGUF V2\n",
+ "llm_load_print_meta: arch = llama\n",
+ "llm_load_print_meta: vocab type = SPM\n",
+ "llm_load_print_meta: n_vocab = 32016\n",
+ "llm_load_print_meta: n_merges = 0\n",
+ "llm_load_print_meta: n_ctx_train = 16384\n",
+ "llm_load_print_meta: n_embd = 4096\n",
+ "llm_load_print_meta: n_head = 32\n",
+ "llm_load_print_meta: n_head_kv = 32\n",
+ "llm_load_print_meta: n_layer = 32\n",
+ "llm_load_print_meta: n_rot = 128\n",
+ "llm_load_print_meta: n_embd_head_k = 128\n",
+ "llm_load_print_meta: n_embd_head_v = 128\n",
+ "llm_load_print_meta: n_gqa = 1\n",
+ "llm_load_print_meta: n_embd_k_gqa = 4096\n",
+ "llm_load_print_meta: n_embd_v_gqa = 4096\n",
+ "llm_load_print_meta: f_norm_eps = 0.0e+00\n",
+ "llm_load_print_meta: f_norm_rms_eps = 1.0e-05\n",
+ "llm_load_print_meta: f_clamp_kqv = 0.0e+00\n",
+ "llm_load_print_meta: f_max_alibi_bias = 0.0e+00\n",
+ "llm_load_print_meta: n_ff = 11008\n",
+ "llm_load_print_meta: n_expert = 0\n",
+ "llm_load_print_meta: n_expert_used = 0\n",
+ "llm_load_print_meta: rope scaling = linear\n",
+ "llm_load_print_meta: freq_base_train = 1000000.0\n",
+ "llm_load_print_meta: freq_scale_train = 1\n",
+ "llm_load_print_meta: n_yarn_orig_ctx = 16384\n",
+ "llm_load_print_meta: rope_finetuned = unknown\n",
+ "llm_load_print_meta: model type = 7B\n",
+ "llm_load_print_meta: model ftype = Q2_K - Medium\n",
+ "llm_load_print_meta: model params = 6.74 B\n",
+ "llm_load_print_meta: model size = 2.63 GiB (3.35 BPW) \n",
+ "llm_load_print_meta: general.name = codellama_codellama-7b-instruct-hf\n",
+ "llm_load_print_meta: BOS token = 1 ''\n",
+ "llm_load_print_meta: EOS token = 2 ''\n",
+ "llm_load_print_meta: UNK token = 0 ''\n",
+ "llm_load_print_meta: LF token = 13 '<0x0A>'\n",
+ "llm_load_tensors: ggml ctx size = 0.22 MiB\n",
+ "llm_load_tensors: offloading 32 repeating layers to GPU\n",
+ "llm_load_tensors: offloading non-repeating layers to GPU\n",
+ "llm_load_tensors: offloaded 33/33 layers to GPU\n",
+ "llm_load_tensors: CPU buffer size = 41.04 MiB\n",
+ "llm_load_tensors: CUDA0 buffer size = 2653.36 MiB\n",
+ ".................................................................................................\n",
+ "llama_new_context_with_model: n_ctx = 3584\n",
+ "llama_new_context_with_model: freq_base = 1000000.0\n",
+ "llama_new_context_with_model: freq_scale = 1\n",
+ "llama_kv_cache_init: CUDA0 KV buffer size = 1792.00 MiB\n",
+ "llama_new_context_with_model: KV self size = 1792.00 MiB, K (f16): 896.00 MiB, V (f16): 896.00 MiB\n",
+ "llama_new_context_with_model: CUDA_Host input buffer size = 15.28 MiB\n",
+ "llama_new_context_with_model: CUDA0 compute buffer size = 285.43 MiB\n",
+ "llama_new_context_with_model: CUDA_Host compute buffer size = 8.95 MiB\n",
+ "llama_new_context_with_model: graph splits (measure): 3\n",
+ "AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | \n",
+ "Model metadata: {'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.rope.freq_base': '1000000.000000', 'llama.context_length': '16384', 'general.name': 'codellama_codellama-7b-instruct-hf', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '11008', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '32', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '10'}\n",
+ "\n",
+ "llama_print_timings: load time = 457.00 ms\n",
+ "llama_print_timings: sample time = 20.80 ms / 32 runs ( 0.65 ms per token, 1538.46 tokens per second)\n",
+ "llama_print_timings: prompt eval time = 456.45 ms / 21 tokens ( 21.74 ms per token, 46.01 tokens per second)\n",
+ "llama_print_timings: eval time = 904.17 ms / 31 runs ( 29.17 ms per token, 34.29 tokens per second)\n",
+ "llama_print_timings: total time = 1511.20 ms / 52 tokens\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Q: python如何用time计算开始结束时间 A: 可以使用time模块,调用clock_gettime获取开始时间,然后在函数中调用clock\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "!pip install 'llama-cpp-python[server]'\n"
+ ],
+ "metadata": {
+ "id": "e0hpTiqX32aU",
+ "outputId": "2b9e2b0d-4fa7-4771-accf-64b392929429",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Requirement already satisfied: llama-cpp-python[server] in /usr/local/lib/python3.10/dist-packages (0.2.39)\n",
+ "Requirement already satisfied: typing-extensions>=4.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-cpp-python[server]) (4.10.0)\n",
+ "Requirement already satisfied: numpy>=1.20.0 in /usr/local/lib/python3.10/dist-packages (from llama-cpp-python[server]) (1.25.2)\n",
+ "Requirement already satisfied: diskcache>=5.6.1 in /usr/local/lib/python3.10/dist-packages (from llama-cpp-python[server]) (5.6.3)\n",
+ "Requirement already satisfied: jinja2>=2.11.3 in /usr/local/lib/python3.10/dist-packages (from llama-cpp-python[server]) (3.1.3)\n",
+ "Collecting uvicorn>=0.22.0 (from llama-cpp-python[server])\n",
+ " Downloading uvicorn-0.28.0-py3-none-any.whl (60 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m60.6/60.6 kB\u001b[0m \u001b[31m1.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting fastapi>=0.100.0 (from llama-cpp-python[server])\n",
+ " Downloading fastapi-0.110.0-py3-none-any.whl (92 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m92.1/92.1 kB\u001b[0m \u001b[31m5.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting pydantic-settings>=2.0.1 (from llama-cpp-python[server])\n",
+ " Downloading pydantic_settings-2.2.1-py3-none-any.whl (13 kB)\n",
+ "Collecting sse-starlette>=1.6.1 (from llama-cpp-python[server])\n",
+ " Downloading sse_starlette-2.0.0-py3-none-any.whl (9.0 kB)\n",
+ "Collecting starlette-context<0.4,>=0.3.6 (from llama-cpp-python[server])\n",
+ " Downloading starlette_context-0.3.6-py3-none-any.whl (12 kB)\n",
+ "Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from fastapi>=0.100.0->llama-cpp-python[server]) (2.6.4)\n",
+ "Collecting starlette<0.37.0,>=0.36.3 (from fastapi>=0.100.0->llama-cpp-python[server])\n",
+ " Downloading starlette-0.36.3-py3-none-any.whl (71 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m71.5/71.5 kB\u001b[0m \u001b[31m10.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2>=2.11.3->llama-cpp-python[server]) (2.1.5)\n",
+ "Collecting python-dotenv>=0.21.0 (from pydantic-settings>=2.0.1->llama-cpp-python[server])\n",
+ " Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)\n",
+ "Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from sse-starlette>=1.6.1->llama-cpp-python[server]) (3.7.1)\n",
+ "Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.10/dist-packages (from uvicorn>=0.22.0->llama-cpp-python[server]) (8.1.7)\n",
+ "Collecting h11>=0.8 (from uvicorn>=0.22.0->llama-cpp-python[server])\n",
+ " Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m8.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi>=0.100.0->llama-cpp-python[server]) (0.6.0)\n",
+ "Requirement already satisfied: pydantic-core==2.16.3 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi>=0.100.0->llama-cpp-python[server]) (2.16.3)\n",
+ "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio->sse-starlette>=1.6.1->llama-cpp-python[server]) (3.6)\n",
+ "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.10/dist-packages (from anyio->sse-starlette>=1.6.1->llama-cpp-python[server]) (1.3.1)\n",
+ "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->sse-starlette>=1.6.1->llama-cpp-python[server]) (1.2.0)\n",
+ "Installing collected packages: python-dotenv, h11, uvicorn, starlette, starlette-context, sse-starlette, pydantic-settings, fastapi\n",
+ "Successfully installed fastapi-0.110.0 h11-0.14.0 pydantic-settings-2.2.1 python-dotenv-1.0.1 sse-starlette-2.0.0 starlette-0.36.3 starlette-context-0.3.6 uvicorn-0.28.0\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from google.colab.output import eval_js\n",
+ "print(eval_js(\"google.colab.kernel.proxyPort(5000)\"))\n",
+ "!python3 -m llama_cpp.server --host 0.0.0.0 --port 5000 --model /kaggle/working/codellama-7b-instruct.Q2_K.gguf"
+ ],
+ "metadata": {
+ "id": "Kx881xqt35Vw",
+ "outputId": "42938059-487e-45e9-c8f9-ae381bf70cc3",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ }
+ },
+ "execution_count": 10,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "https://id91m21m2t-496ff2e9c6d22116-5000-colab.googleusercontent.com/\n",
+ "ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no\n",
+ "ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes\n",
+ "ggml_init_cublas: found 1 CUDA devices:\n",
+ " Device 0: Tesla T4, compute capability 7.5, VMM: yes\n",
+ "llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /kaggle/working/codellama-7b-instruct.Q2_K.gguf (version GGUF V2)\n",
+ "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n",
+ "llama_model_loader: - kv 0: general.architecture str = llama\n",
+ "llama_model_loader: - kv 1: general.name str = codellama_codellama-7b-instruct-hf\n",
+ "llama_model_loader: - kv 2: llama.context_length u32 = 16384\n",
+ "llama_model_loader: - kv 3: llama.embedding_length u32 = 4096\n",
+ "llama_model_loader: - kv 4: llama.block_count u32 = 32\n",
+ "llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008\n",
+ "llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128\n",
+ "llama_model_loader: - kv 7: llama.attention.head_count u32 = 32\n",
+ "llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32\n",
+ "llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010\n",
+ "llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000.000000\n",
+ "llama_model_loader: - kv 11: general.file_type u32 = 10\n",
+ "llama_model_loader: - kv 12: tokenizer.ggml.model str = llama\n",
+ "llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32016] = [\"\", \"\", \"\", \"<0x00>\", \"<...\n",
+ "llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32016] = [0.000000, 0.000000, 0.000000, 0.0000...\n",
+ "llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32016] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...\n",
+ "llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1\n",
+ "llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2\n",
+ "llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0\n",
+ "llama_model_loader: - kv 19: general.quantization_version u32 = 2\n",
+ "llama_model_loader: - type f32: 65 tensors\n",
+ "llama_model_loader: - type q2_K: 65 tensors\n",
+ "llama_model_loader: - type q3_K: 160 tensors\n",
+ "llama_model_loader: - type q6_K: 1 tensors\n",
+ "llm_load_vocab: mismatch in special tokens definition ( 264/32016 vs 259/32016 ).\n",
+ "llm_load_print_meta: format = GGUF V2\n",
+ "llm_load_print_meta: arch = llama\n",
+ "llm_load_print_meta: vocab type = SPM\n",
+ "llm_load_print_meta: n_vocab = 32016\n",
+ "llm_load_print_meta: n_merges = 0\n",
+ "llm_load_print_meta: n_ctx_train = 16384\n",
+ "llm_load_print_meta: n_embd = 4096\n",
+ "llm_load_print_meta: n_head = 32\n",
+ "llm_load_print_meta: n_head_kv = 32\n",
+ "llm_load_print_meta: n_layer = 32\n",
+ "llm_load_print_meta: n_rot = 128\n",
+ "llm_load_print_meta: n_embd_head_k = 128\n",
+ "llm_load_print_meta: n_embd_head_v = 128\n",
+ "llm_load_print_meta: n_gqa = 1\n",
+ "llm_load_print_meta: n_embd_k_gqa = 4096\n",
+ "llm_load_print_meta: n_embd_v_gqa = 4096\n",
+ "llm_load_print_meta: f_norm_eps = 0.0e+00\n",
+ "llm_load_print_meta: f_norm_rms_eps = 1.0e-05\n",
+ "llm_load_print_meta: f_clamp_kqv = 0.0e+00\n",
+ "llm_load_print_meta: f_max_alibi_bias = 0.0e+00\n",
+ "llm_load_print_meta: n_ff = 11008\n",
+ "llm_load_print_meta: n_expert = 0\n",
+ "llm_load_print_meta: n_expert_used = 0\n",
+ "llm_load_print_meta: rope scaling = linear\n",
+ "llm_load_print_meta: freq_base_train = 1000000.0\n",
+ "llm_load_print_meta: freq_scale_train = 1\n",
+ "llm_load_print_meta: n_yarn_orig_ctx = 16384\n",
+ "llm_load_print_meta: rope_finetuned = unknown\n",
+ "llm_load_print_meta: model type = 7B\n",
+ "llm_load_print_meta: model ftype = Q2_K - Medium\n",
+ "llm_load_print_meta: model params = 6.74 B\n",
+ "llm_load_print_meta: model size = 2.63 GiB (3.35 BPW) \n",
+ "llm_load_print_meta: general.name = codellama_codellama-7b-instruct-hf\n",
+ "llm_load_print_meta: BOS token = 1 ''\n",
+ "llm_load_print_meta: EOS token = 2 ''\n",
+ "llm_load_print_meta: UNK token = 0 ''\n",
+ "llm_load_print_meta: LF token = 13 '<0x0A>'\n",
+ "llm_load_tensors: ggml ctx size = 0.11 MiB\n",
+ "llm_load_tensors: offloading 0 repeating layers to GPU\n",
+ "llm_load_tensors: offloaded 0/33 layers to GPU\n",
+ "llm_load_tensors: CPU buffer size = 2694.39 MiB\n",
+ "warning: failed to mlock 43773952-byte buffer (after previously locking 0 bytes): Cannot allocate memory\n",
+ "Try increasing RLIMIT_MEMLOCK ('ulimit -l' as root).\n",
+ ".................................................................................................\n",
+ "llama_new_context_with_model: n_ctx = 2048\n",
+ "llama_new_context_with_model: freq_base = 1000000.0\n",
+ "llama_new_context_with_model: freq_scale = 1\n",
+ "llama_kv_cache_init: CUDA_Host KV buffer size = 1024.00 MiB\n",
+ "llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB\n",
+ "llama_new_context_with_model: CUDA_Host input buffer size = 12.01 MiB\n",
+ "llama_new_context_with_model: CUDA_Host compute buffer size = 167.20 MiB\n",
+ "llama_new_context_with_model: graph splits (measure): 1\n",
+ "AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | \n",
+ "Model metadata: {'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.rope.freq_base': '1000000.000000', 'llama.context_length': '16384', 'general.name': 'codellama_codellama-7b-instruct-hf', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '11008', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '32', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '10'}\n",
+ "\u001b[32mINFO\u001b[0m: Started server process [\u001b[36m12857\u001b[0m]\n",
+ "\u001b[32mINFO\u001b[0m: Waiting for application startup.\n",
+ "\u001b[32mINFO\u001b[0m: Application startup complete.\n",
+ "\u001b[32mINFO\u001b[0m: Uvicorn running on \u001b[1mhttp://0.0.0.0:5000\u001b[0m (Press CTRL+C to quit)\n",
+ "\u001b[32mINFO\u001b[0m: 127.0.0.1:51698 - \"\u001b[1mGET /docs HTTP/1.1\u001b[0m\" \u001b[32m200 OK\u001b[0m\n",
+ "\u001b[32mINFO\u001b[0m: 127.0.0.1:52900 - \"\u001b[1mGET /openapi.json HTTP/1.1\u001b[0m\" \u001b[32m200 OK\u001b[0m\n",
+ "\u001b[32mINFO\u001b[0m: 127.0.0.1:44572 - \"\u001b[1mGET /v1/models HTTP/1.1\u001b[0m\" \u001b[32m200 OK\u001b[0m\n",
+ "\u001b[32mINFO\u001b[0m: 127.0.0.1:44584 - \"\u001b[1mGET /favicon.ico HTTP/1.1\u001b[0m\" \u001b[31m404 Not Found\u001b[0m\n",
+ "\u001b[32mINFO\u001b[0m: Shutting down\n",
+ "\u001b[32mINFO\u001b[0m: Finished server process [\u001b[36m12857\u001b[0m]\n",
+ "\u001b[31mERROR\u001b[0m: Traceback (most recent call last):\n",
+ " File \"/usr/local/lib/python3.10/dist-packages/starlette/routing.py\", line 743, in lifespan\n",
+ " await receive()\n",
+ " File \"/usr/local/lib/python3.10/dist-packages/uvicorn/lifespan/on.py\", line 137, in receive\n",
+ " return await self.receive_queue.get()\n",
+ " File \"/usr/lib/python3.10/asyncio/queues.py\", line 159, in get\n",
+ " await getter\n",
+ "asyncio.exceptions.CancelledError\n",
+ "\n"
+ ]
+ }
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "name": "歡迎使用 Colaboratory",
+ "provenance": [],
+ "gpuType": "T4",
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "accelerator": "GPU",
+ "widgets": {
+ "application/vnd.jupyter.widget-state+json": {
+ "170aaac40a084f4f8794fc409c4094c7": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HBoxModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HBoxModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HBoxView",
+ "box_style": "",
+ "children": [
+ "IPY_MODEL_aca3ba766bf843888ce6abb335707f07",
+ "IPY_MODEL_e9572463294049dfa0428bc51851af44",
+ "IPY_MODEL_cb27c1ef215c407da8e9c90959d17560"
+ ],
+ "layout": "IPY_MODEL_4a2f923d36a241928958c552d5535140"
+ }
+ },
+ "aca3ba766bf843888ce6abb335707f07": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HTMLModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HTMLModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HTMLView",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_af001d0fb6594ccea5029fda79124c51",
+ "placeholder": "",
+ "style": "IPY_MODEL_2cc5d1eeb3dc4c67a19c20afa4997fa0",
+ "value": "codellama-7b-instruct.Q2_K.gguf: 100%"
+ }
+ },
+ "e9572463294049dfa0428bc51851af44": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "FloatProgressModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "FloatProgressModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "ProgressView",
+ "bar_style": "success",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_4f6e435fa1cd4a209af3132635b5414f",
+ "max": 2826016448,
+ "min": 0,
+ "orientation": "horizontal",
+ "style": "IPY_MODEL_f8e4fa05285c41e0b8ee3ba27d746ce8",
+ "value": 2826016448
+ }
+ },
+ "cb27c1ef215c407da8e9c90959d17560": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HTMLModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HTMLModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HTMLView",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_8fd4d6dfd5c64af09b1a729df75232d3",
+ "placeholder": "",
+ "style": "IPY_MODEL_bd656ec2ee95401d8798e8365e8da283",
+ "value": " 2.83G/2.83G [00:19<00:00, 131MB/s]"
+ }
+ },
+ "4a2f923d36a241928958c552d5535140": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "af001d0fb6594ccea5029fda79124c51": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "2cc5d1eeb3dc4c67a19c20afa4997fa0": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
+ },
+ "4f6e435fa1cd4a209af3132635b5414f": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "f8e4fa05285c41e0b8ee3ba27d746ce8": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "ProgressStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "ProgressStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "bar_color": null,
+ "description_width": ""
+ }
+ },
+ "8fd4d6dfd5c64af09b1a729df75232d3": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "bd656ec2ee95401d8798e8365e8da283": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
+ }
+ }
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file