Monday, January 27, 2025

🚀 How to Integrate Ollama with DeepSeek-R1 in Spring Boot

Are you looking to leverage the power of Ollama and DeepSeek-R1 in your Spring Boot application? This post will walk you through the entire process, from understanding what Ollama is to implementing a seamless integration. 




What is Ollama?

Ollama is a powerful tool designed to simplify the deployment and management of large language models (LLMs) locally. It provides an easy-to-use API for interacting with models like DeepSeek-R1, making it an excellent choice for developers who want to integrate AI capabilities into their applications without relying on external cloud services.


With Ollama, you can:

  • Run LLMs locally on your machine.
  • Switch between different model versions effortlessly.
  • Integrate AI capabilities into your applications via a simple API.


Why Integrate Ollama with DeepSeek-R1?

DeepSeek-R1 is a state-of-the-art language model that offers high performance and flexibility. By integrating it with Ollama in your Spring Boot application, you can:

  • Build AI-powered features like chatbots, content generators, and more.
  • Keep your AI logic local, ensuring data privacy and reducing latency.
  • Easily switch between different versions of DeepSeek-R1 based on your application’s needs.


Step 1: Install Ollama

To get started, you’ll need to install Ollama on your system. Run the following command in your terminal:

curl -fsSL https://ollama.com/install.sh | sh

Successful Installation Output:


>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
>>> Creating ollama user...
>>> Adding ollama user to groups...
>>> Creating ollama systemd service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service
>>> Nvidia GPU detected
>>> API available at 127.0.0.1:11434

Once installed, Ollama will be ready to use, and the API will be available at http://localhost:11434.


To verify it's working: 

ollama serve
ollama list 
ollama pull deepseek-r1:1.5b

If deepseek-r1:1.5b isn’t listed, pull it.


Test the model with curl:

curl -X POST http://localhost:11434/api/generate \
  -d '{"model": "deepseek-r1:1.5b", "prompt": "Hello", "stream": false}'

Step 2: Application Configuration

Next, configure your Spring Boot application by updating the application.yml file:

spring:
  application:
    name: demo-deepseek-r1.ollama

# Server configuration
server:
  port: 8080
  error:
    include-message: always

# Ollama configuration
ollama:
  endpoint: http://localhost:11434/api/generate
  model: deepseek-r1:1.5b
  timeout:
    connect: 30000
    read: 60000
This configuration sets up the Ollama endpoint, model, and timeout settings for your application.


Step 3: Core Implementation


Create the following records to handle requests and responses:


// OllamaRequest.java
@JsonInclude(JsonInclude.Include.NON_NULL)
public record OllamaRequest(
    String model,
    String prompt,
    boolean stream
) {}

// OllamaResponse.java
@JsonIgnoreProperties(ignoreUnknown = true)
public record OllamaResponse(
    String model,
    String response,
    String created_at,
    boolean done
) {}

Service Layer


Implement the OllamaService to interact with the Ollama API:

@Service
public class OllamaService {

    private final RestTemplate restTemplate;
    private final OllamaProperties properties;

    public OllamaService(OllamaProperties properties) {
        this.properties = properties;

        RequestConfig config = RequestConfig.custom()
                .setConnectTimeout(Timeout.ofMilliseconds(properties.getTimeout().getConnect()))
                .setResponseTimeout(Timeout.ofMilliseconds(properties.getTimeout().getRead()))
                .build();

        CloseableHttpClient httpClient = HttpClients.custom()
                .setDefaultRequestConfig(config)
                .build();

        HttpComponentsClientHttpRequestFactory requestFactory = new HttpComponentsClientHttpRequestFactory(httpClient);
        this.restTemplate = new RestTemplate(requestFactory);
    }

    public String generateResponse(String prompt) {
        try {
            OllamaRequest request = new OllamaRequest(properties.getModel(), prompt, false);
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_JSON);

            ResponseEntity<OllamaResponse> response = restTemplate.exchange(
                    properties.getEndpoint(),
                    HttpMethod.POST,
                    new HttpEntity<>(request, headers),
                    OllamaResponse.class
            );

            if (response.getStatusCode().is2xxSuccessful() && response.getBody() != null) {
                return response.getBody().response() != null
                        ? response.getBody().response()
                        : "Received empty response from model";
            }
            return "Ollama API returned status: " + response.getStatusCode();
        } catch (RestClientException e) {
            return "Error communicating with Ollama: " + e.getMessage();
        }
    }
}

REST Controller

Create a REST controller to expose the chat endpoint:

@RestController
@RequestMapping("/api/chat")
public class ChatController {

    private final OllamaService ollamaService;

    public ChatController(OllamaService ollamaService) {
        this.ollamaService = ollamaService;
    }

    @PostMapping
    public ResponseEntity<String> chat(@RequestBody String prompt) {
        if (prompt == null || prompt.isBlank()) {
            return ResponseEntity.badRequest().body("Prompt cannot be empty");
        }
        String response = ollamaService.generateResponse(prompt);
        return ResponseEntity.ok(response);
    }
}


Model Version Compatibility

Here’s a quick reference for DeepSeek-R1 model versions and their requirements:



*Check official model availability at:

Ollama Model Library


Testing the Integration

To test the integration, use the following curl  command or postman:

curl -X POST -H "Content-Type: text/plain" -d "Explain AI in simple terms" http://localhost:8080/api/chat

Ouput









🪟 Bonus: Using Ollama in WSL on Windows

If you're on Windows using WSL, follow these steps to expose the Ollama service to Windows:

🔒 WSL Side: Open the Port

sudo ufw enable
sudo ufw allow 11434
sudo systemctl stop ollama
sudo lsof -i :11434
export OLLAMA_HOST=0.0.0.0
ollama serve
Verify Ollama is listening:
sudo ss -tulnp | grep 11434

🪟 Windows Side: Port Forwarding wit admin permission

$wsl_ip = (wsl hostname -I).Split()[0]

netsh interface portproxy add v4tov4 `
  listenport=11434 listenaddress=0.0.0.0 `
  connectport=11434 connectaddress=$wsl_ip

New-NetFirewallRule -DisplayName "Ollama-WSL" `
  -Direction Inbound -Protocol TCP -LocalPort 11434 -Action Allow
🔁 Optional: Persistence Across Reboots

WSL Side
echo 'export OLLAMA_HOST=0.0.0.0' >> ~/.bashrc
echo 'pkill ollama; ollama serve > /tmp/ollama.log 2>&1 &' >> ~/.bashrc
Windows Side

$action = New-ScheduledTaskAction -Execute "wsl" -Argument "-e bash -c 'ollama serve'"
$trigger = New-ScheduledTaskTrigger -AtStartup
Register-ScheduledTask -TaskName "Ollama-WSL" -Action $action -Trigger $trigger -RunLevel Highest


Source Code

Here on GitHub.


🙌 Final Thoughts

Running LLMs locally has never been easier. With Ollama, DeepSeek-R1, and Spring Boot, you can build blazing-fast AI-powered apps while keeping full control over your data.


No comments:

Post a Comment

Integrating Google Cloud Pub/Sub with Terraform and Spring Boot 3 (Java 21)

Introduction In this blog post, I'll demonstrate how to provision Google Cloud Pub/Sub resources using Terraform and integrate them with...