Skip to content
Snippets Groups Projects
Unverified Commit 5f9e16aa authored by AmadeusChan's avatar AmadeusChan Committed by GitHub
Browse files

added two new serving papers: DistServe and MuxServe

parent 51dbbab9
Branches
No related tags found
No related merge requests found
......@@ -41,6 +41,8 @@ This is a list of (non-comprehensive) LLM system papers maintained by [ALCHEM La
- FASTDECODE: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines (arXiv'24) [link to paper](https://arxiv.org/pdf/2403.11421.pdf)
- FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning (arXiv'24) [link to paper](https://arxiv.org/pdf/2402.18789.pdf)
- Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads (arXiv'24) [link to paper](https://arxiv.org/pdf/2401.11181.pdf)
- MuxServe: Flexible Multiplexing for Efficient Multiple LLM Serving (arXiv'24) [link to paper](https://arxiv.org/pdf/2401.09670.pdf)
- DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving (arXiv'24) [link to paper](https://arxiv.org/pdf/2401.09670.pdf)
## LLM Training Systems
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment