A pure-Python Ollama-compatible LLM server with built-in HTTPS, auto-unload, and 40% faster embedding performance.
TL;DR
Hippo is a local LLM server written in pure Python, fully compatible with Ollama API, featuring native HTTPS support, automatic model unloading, and 40% faster embedding performance than Ollama.
GitHub | v0.1.0 Release
Why Hippo?
Core difference: Ollama is the production Lamborghini, Hippo is the hackable VW Bus. Both get you there, one for speed, one for customization.
v0.1.0 Highlights
1. Built-in HTTPS Support
Self-signed (development):
mkdir-p~/.hippo/ssl&&cd~/.hippo/ssl opensslreq-x509-newkeyrsa:4096-keyoutkey.pem-outcert.pem-days365-nodes-subj"/CN=localhost" hipposerve--ssl--cert~/.hippo/ssl/cert.pem--key~/.hippo/ssl/key.pem
Let's Encrypt (production):
sudocertbotcertonly--standalone-dhippo.example.com hipposerve--ssl--cert/etc/letsencrypt/live/hippo.example.com/fullchain.pem--key/etc/letsencrypt/live/hippo.example.com/privkey.pem
2. 40% Faster Embedding
3. Auto-Unload
# ~/.hippo/config.yamlidle_timeout:300# Auto-unload after 5 minutes
4. Pure Python
classModelManager:defget(self,name:str)->Llama:withself._locks[name]:returnself._load(name)
Quick Start
gitclonehttps://github.com/lawcontinue/hippo.git cdhippo pipinstall-e. hippopullbartowski/Llama-3.2-3B-Instruct-GGUF hipposerve hipporunllama-3.2-3b"Hello World"
API Compatibility
Ollama-compatible:
curlhttp://localhost:8321/api/chat-d'{"model": "llama-3.2-3b", "messages": [{"role": "user", "content": "Hello!"}]}'
Endpoints: - POST /api/chat - Chat completions - POST /api/embeddings - Embedding vectors - GET /api/tags - List models - GET /v1/models - OpenAI-compatible
Use Cases
RAG Applications
importrequestsresponse=requests.post("http://localhost:8321/api/embeddings",json={"model":"nomic-embed-text","prompt":"What is the capital of France?"})embedding=response.json()["embedding"]
Local Chatbots
importopenaiopenai.api_base="http://localhost:8321/v1"openai.api_key="anything"completion=openai.ChatCompletion.create(model="llama-3.2-3b",messages=[{"role":"user","content":"Hello!"}])
Feature Comparison
Docker
FROMpython:3.14-slimASbuilderWORKDIR/appCOPY.. RUNpipinstall--no-cache-dir-e. FROMpython:3.14-slimRUNuseradd-m-u1000hippo USERhippoWORKDIR/appCOPY--from=builder/usr/local/lib/python3.14/site-packages/usr/local/lib/python3.14/site-packages COPY--from=builder/app/app EXPOSE8321CMD["hippo","serve","--host","0.0.0.0","--port","8321"]
Roadmap
- [ ] v0.2.0 - Multi-GPU support
- [ ] v0.2.0 - LoRA adapters
- [ ] v0.3.0 - Batch inference
- [ ] v0.3.0 - Prometheus metrics
Join Us
- GitHub: https://github.com/lawcontinue/hippo
- Issues: https://github.com/lawcontinue/hippo/issues
- Discussions: https://github.com/lawcontinue/hippo/discussions
Bottom line: Use Ollama for production speed. Use Hippo for development happiness, HTTPS support, and embedding-heavy workloads.