Spring AI 使用 Ollama

2026-05-08 17:20:58 技术文档 14℃ 0

借助 Ollama，你可以在本地运行各类大语言模型（LLMs）并生成文本内容。Spring AI 通过 OllamaChatModel API 支持 Ollama 的聊天补全能力。

Ollama 同时提供了兼容 OpenAI API 的接口。OpenAI API 兼容性章节会介绍如何使用 Spring AI OpenAI 连接到 Ollama 服务端。

前提条件

首先你需要能访问 Ollama 实例，可选方式如下：

在本地设备下载并安装 Ollama
通过 Testcontainers 配置并运行 Ollama
通过 Kubernetes 服务绑定连接 Ollama 实例

你可以从 Ollama 模型库拉取应用所需的模型：

ollama pull <model-name>

你也可以拉取 Hugging Face 上数千个免费的 GGUF 模型：

ollama pull hf.co/<username>/<model-repository>

此外，你还可以开启模型自动下载功能：自动拉取模型。

自动配置

Spring AI 自动配置、启动器模块的工件名称发生了重大变更。更多详情请参考升级说明文档。

Spring AI 为 Ollama 聊天集成提供了 Spring Boot 自动配置。如需启用，在项目的 Maven pom.xml 或 Gradle build.gradle 中添加以下依赖：

Maven 依赖：

<dependency>
   <groupId>org.springframework.ai</groupId>
   <artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>

请参考依赖管理章节，将 Spring AI BOM 添加到构建文件中。

基础配置项

配置 Ollama 连接的属性前缀为 spring.ai.ollama：

属性	说明	默认值
spring.ai.ollama.base-url	Ollama API 服务的基础地址	http://localhost:11434

以下是 Ollama 集成初始化和模型自动拉取的配置项：

属性	说明	默认值
spring.ai.ollama.init.pull-model-strategy	启动时是否拉取模型及拉取策略	never
spring.ai.ollama.init.timeout	模型拉取超时时间	5m
spring.ai.ollama.init.max-retries	模型拉取最大重试次数	0
spring.ai.ollama.init.chat.include	是否将聊天类模型纳入初始化任务	true
spring.ai.ollama.init.chat.additional-models	除默认配置外，额外初始化的模型	[]

聊天配置项

聊天自动配置的启用/禁用现在通过顶级属性 spring.ai.model.chat 配置：

启用：spring.ai.model.chat=ollama（默认启用）
禁用：spring.ai.model.chat=none（或非 ollama 的值）

该变更用于支持多模型配置。

配置 Ollama 聊天模型的属性前缀为 spring.ai.ollama.chat.options，包含模型、keep-alive、格式等高级请求参数。

Ollama 聊天模型高级请求参数：

属性	说明	默认值
spring.ai.ollama.chat.enabled（已移除）	启用 Ollama 聊天模型	true
spring.ai.model.chat	启用 Ollama 聊天模型	ollama
spring.ai.ollama.chat.options.model	使用的模型名称	mistral
spring.ai.ollama.chat.options.format	响应返回格式，支持 json 或 JSON Schema	-
spring.ai.ollama.chat.options.keep_alive	请求后模型驻留内存时长	5m

其余配置项基于 Ollama 有效参数和类型，默认值遵循 Ollama 类型默认配置：

属性	说明	默认值
spring.ai.ollama.chat.options.numa	是否使用 NUMA	false
spring.ai.ollama.chat.options.num-ctx	上下文窗口大小	2048
spring.ai.ollama.chat.options.num-batch	提示词处理最大批次大小	512
spring.ai.ollama.chat.options.num-gpu	GPU 计算层数	-1
spring.ai.ollama.chat.options.main-gpu	主 GPU 编号	0
spring.ai.ollama.chat.options.low-vram	低显存模式	false
spring.ai.ollama.chat.options.f16-kv	-	true
spring.ai.ollama.chat.options.logits-all	返回所有 token 的 logits	-
spring.ai.ollama.chat.options.vocab-only	仅加载词表不加载权重	-
spring.ai.ollama.chat.options.use-mmap	是否使用内存映射	null
spring.ai.ollama.chat.options.use-mlock	锁定模型内存	false
spring.ai.ollama.chat.options.num-thread	计算线程数	0
spring.ai.ollama.chat.options.num-keep	-	4
spring.ai.ollama.chat.options.seed	随机种子	-1
spring.ai.ollama.chat.options.num-predict	最大生成 token 数	-1
spring.ai.ollama.chat.options.top-k	采样候选数	40
spring.ai.ollama.chat.options.top-p	核采样概率	0.9
spring.ai.ollama.chat.options.min-p	最小采样概率	0.0
spring.ai.ollama.chat.options.tfs-z	无尾采样系数	1.0
spring.ai.ollama.chat.options.typical-p	-	1.0
spring.ai.ollama.chat.options.repeat-last-n	防重复回溯长度	64
spring.ai.ollama.chat.options.temperature	生成温度（创造力）	0.8
spring.ai.ollama.chat.options.repeat-penalty	重复惩罚系数	1.1
spring.ai.ollama.chat.options.presence-penalty	-	0.0
spring.ai.ollama.chat.options.frequency-penalty	-	0.0
spring.ai.ollama.chat.options.mirostat	Mirostat 采样模式	0
spring.ai.ollama.chat.options.mirostat-tau	Mirostat 目标复杂度	5.0
spring.ai.ollama.chat.options.mirostat-eta	Mirostat 学习率	0.1
spring.ai.ollama.chat.options.penalize-newline	-	true
spring.ai.ollama.chat.options.stop	停止生成序列	-
spring.ai.ollama.chat.options.tool-names	函数调用工具名称列表	-
spring.ai.ollama.chat.options.tool-callbacks	聊天模型注册的工具回调	-
spring.ai.ollama.chat.options.internal-tool-execution-enabled	是否内部处理工具调用	true

所有 spring.ai.ollama.chat.options 前缀的配置项，都可以在调用 Prompt 时添加请求级运行时选项来覆盖。

运行时选项

OllamaChatOptions.java 类提供模型配置，如使用的模型、温度、思考模式等。

OllamaOptions 已废弃，聊天模型使用 OllamaChatOptions，嵌入模型使用 OllamaEmbeddingOptions，新类提供类型安全的模型专属配置。

启动时，可通过 OllamaChatModel(api, options) 构造器或 spring.ai.ollama.chat.options.* 配置默认选项。

运行时，可在调用 Prompt 时覆盖默认配置，示例：

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        OllamaChatOptions.builder()
            .model(OllamaModel.LLAMA3_1)
            .temperature(0.4)
            .build()
    ));

除了 OllamaChatOptions，也可使用通用的 ChatOptions 实例。

自动拉取模型

Spring AI Ollama 可在模型不存在时自动拉取，适用于开发、测试和新环境部署。

支持按名称拉取 Hugging Face 免费 GGUF 模型。

三种拉取策略：

always：始终拉取，确保使用最新版本
when_missing：仅缺失时拉取，可能使用旧版本
never：从不自动拉取

生产环境不建议自动拉取，建议提前预下载模型。

配置拉取策略、超时和重试：

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: always
        timeout: 60s
        max-retries: 1

启动时初始化额外模型：

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: always
        chat:
          additional-models:
            - llama3.2
            - qwen2.5

排除聊天模型不自动拉取：

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: always
        chat:
          include: false

函数调用

可向 OllamaChatModel 注册自定义 Java 函数，模型会智能输出 JSON 参数调用注册函数，实现大模型与外部工具/API 连接。

使用函数调用需要 Ollama 0.2.8+，流式模式需要 0.4.6+。

思考模式（推理）

Ollama 支持推理模型的思考模式，可在输出最终答案前展示内部推理过程，支持 Qwen3、DeepSeek-v3.1 等模型。

思考模式可提升复杂问题的响应质量。

默认行为（Ollama 0.12+）：支持思考的模型默认开启，标准模型默认关闭，可通过 enableThinking()/disableThinking() 显式控制。

启用思考模式：

ChatResponse response = chatModel.call(
    new Prompt(
        "How many letter 'r' are in the word 'strawberry'?",
        OllamaChatOptions.builder()
            .model("qwen3")
            .enableThinking()
            .build()
    ));

// 获取思考过程
String thinking = response.getResult().getMetadata().get("thinking");
String answer = response.getResult().getOutput().getText();

禁用思考模式：

ChatResponse response = chatModel.call(
    new Prompt(
        "What is 2+2?",
        OllamaChatOptions.builder()
            .model("deepseek-r1")
            .disableThinking()
            .build()
    ));

思考等级（仅 GPT-OSS）：

// 低思考等级
ChatResponse response = chatModel.call(
    new Prompt(
        "Generate a short headline",
        OllamaChatOptions.builder()
            .model("gpt-oss")
            .thinkLow()
            .build()
    ));

// 中思考等级
ChatResponse response = chatModel.call(
    new Prompt(
        "Analyze this dataset",
        OllamaChatOptions.builder()
            .model("gpt-oss")
            .thinkMedium()
            .build()
    ));

// 高思考等级
ChatResponse response = chatModel.call(
    new Prompt(
        "Solve this complex problem",
        OllamaChatOptions.builder()
            .model("gpt-oss")
            .thinkHigh()
            .build()
    ));

流式思考模式：

Flux<ChatResponse> stream = chatModel.stream(
    new Prompt(
        "Explain quantum entanglement",
        OllamaChatOptions.builder()
            .model("qwen3")
            .enableThinking()
            .build()
    ));

stream.subscribe(response -> {
    String thinking = response.getResult().getMetadata().get("thinking");
    String content = response.getResult().getOutput().getText();

    if (thinking != null && !thinking.isEmpty()) {
        System.out.println("[Thinking] " + thinking);
    }
    if (content != null && !content.isEmpty()) {
        System.out.println("[Response] " + content);
    }
});

多模态

多模态指模型可同时理解和处理文本、图片、音频等多源数据。

Ollama 中支持多模态的模型包括 LLaVA、BakLLaVA 等。

Ollama 消息 API 支持 images 参数传入 base64 编码图片列表，Spring AI 通过 Media 类型实现多模态。

示例代码：

var imageResource = new ClassPathResource("/multimodal.test.png");

var userMessage = new UserMessage("Explain what do you see on this picture?",
        new Media(MimeTypeUtils.IMAGE_PNG, this.imageResource));

ChatResponse response = chatModel.call(new Prompt(this.userMessage,
        OllamaChatOptions.builder().model(OllamaModel.LLAVA)).build());

结构化输出

Ollama 提供结构化输出 API，确保响应严格遵循 JSON Schema 规范。

两种结构化输出模式：

简单 json 格式：返回任意合法 JSON
JSON Schema 格式：返回指定结构的 JSON（推荐生产使用）

简单 JSON 格式：

ChatResponse response = chatModel.call(
    new Prompt(
        "List 3 countries in Europe",
        OllamaChatOptions.builder()
            .model("llama3.2")
            .format("json")
            .build()
    ));

JSON Schema 格式：

String jsonSchema = """
{
    "type": "object",
    "properties": {
        "countries": {
            "type": "array",
            "items": { "type": "string" }
        }
    },
    "required": ["countries"]
}
""";

ChatResponse response = chatModel.call(
    new Prompt(
        "List 3 countries in Europe",
        OllamaChatOptions.builder()
            .model("llama3.2")
            .outputSchema(jsonSchema)
            .build()
    ));

结合 BeanOutputConverter：

record MathReasoning(
    @JsonProperty(required = true, value = "steps") Steps steps,
    @JsonProperty(required = true, value = "final_answer") String finalAnswer) {

    record Steps(
        @JsonProperty(required = true, value = "items") Items[] items) {

        record Items(
            @JsonProperty(required = true, value = "explanation") String explanation,
            @JsonProperty(required = true, value = "output") String output) {
        }
    }
}

var outputConverter = new BeanOutputConverter<>(MathReasoning.class);

Prompt prompt = new Prompt("how can I solve 8x + 7 = -23",
        OllamaChatOptions.builder()
            .model(OllamaModel.LLAMA3_2.getName())
            .outputSchema(outputConverter.getJsonSchema())
            .build());

ChatResponse response = this.ollamaChatModel.call(this.prompt);
String content = this.response.getResult().getOutput().getText();

MathReasoning mathReasoning = this.outputConverter.convert(this.content);

OpenAI API 兼容性

Ollama 兼容 OpenAI API，可使用 Spring AI OpenAI 客户端连接，配置：

spring.ai.openai.chat.base-url=http://localhost:11434
spring.ai.openai.chat.options.model=mistral

可通过 extraBody 传递 Ollama 专属参数。

通过 OpenAI 兼容接口获取推理内容：

@Configuration
class OllamaConfig {
    @Bean
    OpenAiChatModel ollamaChatModel() {
        var openAiApi = new OpenAiApi("http://localhost:11434", "ollama");
        return new OpenAiChatModel(openAiApi,
            OpenAiChatOptions.builder()
                .model("deepseek-r1")
                .build());
    }
}

// 使用
ChatResponse response = chatModel.call(
    new Prompt("How many letter 'r' are in the word 'strawberry'?"));

String reasoning = response.getResult().getMetadata().get("reasoningContent");

HuggingFace 模型

Ollama 原生支持所有 Hugging Face GGUF 聊天模型，拉取命令：

ollama pull hf.co/<username>/<model-repository>

配置自动拉取：

spring.ai.ollama.chat.options.model=hf.co/bartowski/gemma-2-2b-it-GGUF
spring.ai.ollama.init.pull-model-strategy=always

示例控制器

创建 Spring Boot 项目并添加依赖，配置 application.yaml：

spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: mistral
          temperature: 0.7

控制器代码：

@RestController
public class ChatController {

    private final OllamaChatModel chatModel;

    @Autowired
    public ChatController(OllamaChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map<String,String> generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }

    @GetMapping("/ai/generateStream")
	public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        return this.chatModel.stream(prompt);
    }

}

手动配置

不使用自动配置时，添加依赖：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama</artifactId>
</dependency>

手动创建 OllamaChatModel：

var ollamaApi = OllamaApi.builder().build();

var chatModel = OllamaChatModel.builder()
                    .ollamaApi(ollamaApi)
                    .defaultOptions(
                        OllamaChatOptions.builder()
                            .model(OllamaModel.MISTRAL)
                            .temperature(0.9)
                            .build())
                    .build();

ChatResponse response = this.chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

// 流式响应
Flux<ChatResponse> response = this.chatModel.stream(
    new Prompt("Generate the names of 5 famous pirates."));

底层 OllamaApi 客户端

OllamaApi 是轻量级 Java 客户端，为底层 API，不建议直接使用，优先使用 OllamaChatModel。

使用示例：

OllamaApi ollamaApi = new OllamaApi("YOUR_HOST:YOUR_PORT");

// 同步请求
var request = ChatRequest.builder("orca-mini")
    .stream(false)
    .messages(List.of(
            Message.builder(Role.SYSTEM)
                .content("You are a geography teacher. You are talking to a student.")
                .build(),
            Message.builder(Role.USER)
                .content("What is the capital of Bulgaria and what is the size? What is the national anthem?")
                .build()))
    .options(OllamaChatOptions.builder().temperature(0.9).build())
    .build();

ChatResponse response = this.ollamaApi.chat(this.request);

// 流式请求
var request2 = ChatRequest.builder("orca-mini")
    .stream(true)
    .messages(List.of(Message.builder(Role.USER)
        .content("What is the capital of Bulgaria and what is the size? What is the national anthem?")
        .build()))
    .options(OllamaChatOptions.builder().temperature(0.9).build().toMap())
    .build();

Flux<ChatResponse> streamingResponse = this.ollamaApi.streamingChat(this.request2);

标签： Java Spring AI 人工智能

上一篇：Spring AI 使用 NVIDIA LLM API

下一篇：Spring AI 使用 Perplexity