Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ssbuild/aigc_serving

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

285 Commits
285 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

简介

aigc_serving lightweight and efficient Language service model reasoning

llm.png

图片来自于论文: [A Survey of Large Language Models](https://arxiv.org/pdf/2303.18223.pdf)

update information

    2024-04-22 0.3.0
    2023-12-07 support sus-chat and deepseek-coder
    2023-12-04 support auto-gptq , such as 4bits 8bits
    2023-12-03 limit model_max_length for inputs
    2023-12-02 support qwen 1.8b 7b 14b 72b and 
          support chat chat_stream is for openai_chat ,  
          generate generate_stream is for openai , 
          support batch for generate method
    2023-11-28 support autoawq , such as 4bits 8bits
    2023-11-27 yi model_type change to llama
    2023-11-22 support sentence_transformers , such as bge,m3e and so on
    2023-11-20 support seed for generator sample and support bianque2 , lingowhale
    2023-11-06 fix pydantic 2 and support api_keys in config
    2023-11-04 support yi aigc-zoo>=0.2.7.post2 , 支持 pydantic >= 2 
    2023-11-01 support bluelm aigc-zoo>=0.2.7.post1
    2023-10-31 support chatglm3,CausalLM,skywork , aigc-zoo>=0.2.7.post0
    2023-10-11 support t5
    2023-09-13 支持模型别名
    2023-09-11 增加支持且仅支持 ptv2 for chatglm,chatglm2
    2023-09-06 support baichuan2
    2023-09-03 增加tiger , openbuddy 模板 , test openbuddy-70b passed
    2023-08-26 fix same group stream order
    2023-08-25 aigc-zoo 0.2.0.post1 support xverse-13b-chat , 已有模型 实现 stop 功能
    2023-08-20 support embedding
    2023-08-17 add tiger-chat-13b
    2023-08-16 推理可选使用 Rope NtkScale , 不训练扩展推理长度
    2023-08-14 支持lora model 基础模型和lora头切换
    2023-08-12 增加通义千问模型工具调用示例,支持 **`function call`** 特性,调用方式参考 [邮件发送助手](./tests/email_sender.py)、[定积分计算器](./tests/quad_calculator.py)、[SQL查询](./tests/sql_querier.py)
    2023-08-11 qwen官方配置文件更新,请使用aigc-zoo 0.1.17.post0 , 并更新官方config.json , generation_config.json 等信息
    2023-08-10 0.1.17 release , fix new bugs
    2023-08-08 support xverse-13b , 版本要求 deep_training 0.1.15.rc2
    2023-08-07 support llama llama2 量化推理 , 版本要求 deep_training 0.1.15.rc1
    2023-08-05 aigc_zoo 最低版本0.1.14 
    2023-08-03 support qwen
    2023-08-02 support muti lora infer , 手动升级 aigc_zoo , pip install -U git+https://github.com/ssbuild/deep_training.zoo.git --force-reinstall --no-deps
    2023-07-27 support openai client
    2023-07-26 support streaming
    2023-07-24 support chat
    2023-07-23 support deepspeed , accelerate

注意

  • 推荐环境: linux python >=3.10 torch >= 2.0.1

install

pip install -r requirements.txt

支持的模型

支持且不限于以下模型 ,原则上支持transformer 全系列

模型 16bit 4bit ptv2 deepspeed accelerate hf
baichuan-7b ×
baichuan-13b ×
baichuan2-7b ×
baichuan2-13b ×
bloom × ×
casuallm ×
chatglm
chatglm2
chatglm3
internlm ×
llama ×
moss ×
openbuddy ×
opt × ×
qwen ×
rwkv × ×
t5 × ×
tiger × ×
xverse ×
bluelm ×
yi ×
bianque2 ×
lingowhale ×
sus_chat ×
deepseek ×
deepseek_coder ×

docker

build

cd aigc_serving
docker build -f docker/Dockerfile -t aigc_serving ..

docker run

docker run -it --runtime=nvidia --name aigc_serving aigc_serving:latest /bin/bash

模型配置

config.yaml 更多模型配置参见 assets/template

依赖

服务启动和停止

# 启动
cd script
bash start.sh
# 停止
cd script
bash stop.sh

加密服务启动和停止

第一步加密工程

pip install -U se_imports
cd serving/cc
python cc.py

第二步加密工程部署

pip install -U se_imports
# 启动
cd script_se
bash start.sh
# 停止
cd script_se
bash stop.sh

推荐模型指标评估

推荐界面 ChatGPT-Next-Web 或者 dify

界面 界面 界面 界面

客户端测试 tests

openai 接口

chat demo tests/test_openai_chat.py

import openai

openai.api_key = "EMPTY"
openai.api_base = "http://192.168.2.180:8081/v1"
model = "chatglm2-6b-int4"
model = "qwen-7b-chat-int4"

# # Test list models API
# models = openai.Model.list()
# print("Models:", models)

# Test completion API
stream = False

data = {
    "model": model,
    "adapter_name": None, # lora头
    "prompt": ["你是谁?"],
    "top_p": 0.8,
    "temperature": 1.0,
    "frequency_penalty": 1.01,
    "stream": stream,
    "nchar": 1,# stream 字符
    "n": 1, # 返回 n 个choices
    # "stop": ["Observation:","Observation:\n"]
}


completion = openai.Completion.create(**data)
if stream:
    text = ''
    for choices in completion:
        c = choices.choices[0]
        text += c.text
        print(c.text)
    print(text)
else:
    for choice in completion.choices:
        print("result:", choice.text)

embedding tests/test_openai_embedding.py

import openai
# 新版本
openai.api_key = "EMPTY"
openai.api_base = "http://192.168.2.180:8081/v1"

model = "chatglm2-6b-int4"
model = "qwen-7b-chat-int4"

# # Test list models API
# models = openai.Model.list()
# print("Models:", models)

# Test completion API
stream = False

data = {
    "model": model,
    "adapter_name": None, # lora头
    "input": ["你是谁",],
}


completion = openai.Embedding.create(**data)

for d in completion.data:
    print(d)

纯粹而干净的代码

注意事项

  • 1、 如果deepspeed , 确保 num_attention_heads % len(device_id) == 0
  • 2、 模型键值,必须以模型名开始 , 不区分大小写

Star History

Star History Chart

About

aigc_serving lightweight and efficient Language service model reasoning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.