# Caring AI Server

Companion backend for the CareHub browser extension. Runs a local LLM via llama.cpp with an OpenAI-compatible API.

## Quick Start

```bash
# Install dependencies
npm install

# Download llama-server binary + default model (~2.4 GB)
npm run setup

# Start the server
npm start
```

## API (localhost:11435)

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/v1/status` | GET | Full server status |
| `/v1/models` | GET | List available models |
| `/v1/models/:id/download` | POST | Download a model |
| `/v1/models/:id/download/status` | GET | Check download progress |
| `/v1/server/start` | POST | Start llama-server (body: `{model: "id"}`) |
| `/v1/server/stop` | POST | Stop llama-server |
| `/v1/server/switch` | POST | Switch model (body: `{model: "id"}`) |
| `/v1/chat/completions` | POST | OpenAI-compatible chat completions |
| `/v1/setup` | POST | Download binary + default model |
| `/v1/config` | GET/PATCH | View/update config |

## Models

| ID | Size | Description |
|----|------|-------------|
| `phi-3.5-mini` | 2.4 GB | Fast — summaries, drafts, Q&A |
| `llama-3.1-8b` | 4.9 GB | Powerful — chat, reasoning, agentic tasks |

## Data Directory

`~/.caring-ai/`
- `config.json` — server configuration
- `bin/llama-server` — llama.cpp binary
- `models/` — downloaded GGUF model files

## Architecture

```
CareHub Extension ──► Caring AI Server (port 11435) ──► llama-server (port 11436)
                         Express proxy                    llama.cpp process
```

The server manages the llama-server lifecycle (start/stop/switch models) and proxies OpenAI-compatible requests to it. The extension connects via standard `fetch()` calls with CORS support for `chrome-extension://` origins.
