🚀Serving LLM-jp-4 32B Thinking on mdx.jp A100 x2 with vLLM and Using It via an OpenAI-Compatible API
Notes from running the official LLM-jp-4-32b-a3b-thinking model on an mdx.jp A100 40GB x2 server and switching from a Transformers OOM to a vLLM deployment

























