文章目录
下图转自 公众号 AI工程化
- 推理执行引擎
- server
- vLLM
- vLLM 部署 Qwen https://ezcode.blog.csdn.net/article/details/135947607
- HF Pipeline
- TGI
- DeepSpeed-MII
- TensorRT-LLM
- vLLM
- pc/edge
- ggml
- ollama https://ezcode.blog.csdn.net/article/details/136482825
- mlc-llm
- server
- Web 服务
- Flask
- Gradio
- FastAPI https://so.csdn.net/so/search?q=FastAPI&u=lovechris00
- Starlette
- 推理服务
- OpenLLM
- RayLLM
- Triton Server
- 对话服务
- FastChat
- OpenChat
- HuggingChat https://ezcode.blog.csdn.net/article/details/136911282
- GPT4ALL