Files
hermes/website/docs/user-guide/skills/optional/mlops/mlops-qdrant.md
Teknium 252d68fd45 docs: deep audit — fix stale config keys, missing commands, and registry drift (#22784)
* docs: deep audit — fix stale config keys, missing commands, and registry drift

Cross-checked ~80 high-impact docs pages (getting-started, reference, top-level
user-guide, user-guide/features) against the live registries:

  hermes_cli/commands.py    COMMAND_REGISTRY (slash commands)
  hermes_cli/auth.py        PROVIDER_REGISTRY (providers)
  hermes_cli/config.py      DEFAULT_CONFIG (config keys)
  toolsets.py               TOOLSETS (toolsets)
  tools/registry.py         get_all_tool_names() (tools)
  python -m hermes_cli.main <subcmd> --help (CLI args)

reference/
- cli-commands.md: drop duplicate hermes fallback row + duplicate section,
  add stepfun/lmstudio to --provider enum, expand auth/mcp/curator subcommand
  lists to match --help output (status/logout/spotify, login, archive/prune/
  list-archived).
- slash-commands.md: add missing /sessions and /reload-skills entries +
  correct the cross-platform Notes line.
- tools-reference.md: drop bogus '68 tools' headline, drop fictional
  'browser-cdp toolset' (these tools live in 'browser' and are runtime-gated),
  add missing 'kanban' and 'video' toolset sections, fix MCP example to use
  the real mcp_<server>_<tool> prefix.
- toolsets-reference.md: list browser_cdp/browser_dialog inside the 'browser'
  row, add missing 'kanban' and 'video' toolset rows, drop the stale
  '38 tools' count for hermes-cli.
- profile-commands.md: add missing install/update/info subcommands, document
  fish completion.
- environment-variables.md: dedupe GMI_API_KEY/GMI_BASE_URL rows (kept the
  one with the correct gmi-serving.com default).
- faq.md: Anthropic/Google/OpenAI examples — direct providers exist (not just
  via OpenRouter), refresh the OpenAI model list.

getting-started/
- installation.md: PortableGit (not MinGit) is what the Windows installer
  fetches; document the 32-bit MinGit fallback.
- installation.md / termux.md: installer prefers .[termux-all] then falls
  back to .[termux].
- nix-setup.md: Python 3.12 (not 3.11), Node.js 22 (not 20); fix invalid
  'nix flake update --flake' invocation.
- updating.md: 'hermes backup restore --state pre-update' doesn't exist —
  point at the snapshot/quick-snapshot flow; correct config key
  'updates.pre_update_backup' (was 'update.backup').

user-guide/
- configuration.md: api_max_retries default 3 (not 2); display.runtime_footer
  is the real key (not display.runtime_metadata_footer); checkpoints defaults
  enabled=false / max_snapshots=20 (not true / 50).
- configuring-models.md: 'hermes model list' / 'hermes model set ...' don't
  exist — hermes model is interactive only.
- tui.md: busy_indicator -> tui_status_indicator with values
  kaomoji|emoji|unicode|ascii (not kawaii|minimal|dots|wings|none).
- security.md: SSH backend keys (TERMINAL_SSH_HOST/USER/KEY) live in .env,
  not config.yaml.
- windows-wsl-quickstart.md: there is no 'hermes api' subcommand — the
  OpenAI-compatible API server runs inside hermes gateway.

user-guide/features/
- computer-use.md: approvals.mode (not security.approval_level); fix broken
  ./browser-use.md link to ./browser.md.
- fallback-providers.md: top-level fallback_providers (not
  model.fallback_providers); the picker is subcommand-based, not modal.
- api-server.md: API_SERVER_* are env vars — write to per-profile .env,
  not 'hermes config set' which targets YAML.
- web-search.md: drop web_crawl as a registered tool (it isn't); deep-crawl
  modes are exposed through web_extract.
- kanban.md: failure_limit default is 2, not '~5'.
- plugins.md: drop hard-coded '33 providers' count.
- honcho.md: fix unclosed quote in echo HONCHO_API_KEY snippet; document
  that 'hermes honcho' subcommand is gated on memory.provider=honcho;
  reconcile subcommand list with actual --help output.
- memory-providers.md: legacy 'hermes honcho setup' redirect documented.

Verified via 'npm run build' — site builds cleanly; broken-link count went
from 149 to 146 (no regressions, fixed a few in passing).

* docs: round 2 audit fixes + regenerate skill catalogs

Follow-up to the previous commit on this branch:

Round 2 manual fixes:
- quickstart.md: KIMI_CODING_API_KEY mentioned alongside KIMI_API_KEY;
  voice-mode and ACP install commands rewritten — bare 'pip install ...'
  doesn't work for curl-installed setups (no pip on PATH, not in repo
  dir); replaced with 'cd ~/.hermes/hermes-agent && uv pip install -e
  ".[voice]"'. ACP already ships in [all] so the curl install includes it.
- cli.md / configuration.md: 'auxiliary.compression.model' shown as
  'google/gemini-3-flash-preview' (the doc's own claimed default);
  actual default is empty (= use main model). Reworded as 'leave empty
  (default) or pin a cheap model'.
- built-in-plugins.md: added the bundled 'kanban/dashboard' plugin row
  that was missing from the table.

Regenerated skill catalogs:
- ran website/scripts/generate-skill-docs.py to refresh all 163 per-skill
  pages and both reference catalogs (skills-catalog.md,
  optional-skills-catalog.md). This adds the entries that were genuinely
  missing — productivity/teams-meeting-pipeline (bundled),
  optional/finance/* (entire category — 7 skills:
  3-statement-model, comps-analysis, dcf-model, excel-author, lbo-model,
  merger-model, pptx-author), creative/hyperframes,
  creative/kanban-video-orchestrator, devops/watchers,
  productivity/shop-app, research/searxng-search,
  apple/macos-computer-use — and rewrites every other per-skill page from
  the current SKILL.md. Most diffs are tiny (one line of refreshed
  metadata).

Validation:
- 'npm run build' succeeded.
- Broken-link count moved 146 -> 155 — the +9 are zh-Hans translation
  shells that lag every newly-added skill page (pre-existing pattern).
  No regressions on any en/ page.
2026-05-09 13:19:51 -07:00

515 lines
14 KiB
Markdown

---
title: "Qdrant Vector Search — High-performance vector similarity search engine for RAG and semantic search"
sidebar_label: "Qdrant Vector Search"
description: "High-performance vector similarity search engine for RAG and semantic search"
---
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
# Qdrant Vector Search
High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.
## Skill metadata
| | |
|---|---|
| Source | Optional — install with `hermes skills install official/mlops/qdrant` |
| Path | `optional-skills/mlops/qdrant` |
| Version | `1.0.0` |
| Author | Orchestra Research |
| License | MIT |
| Dependencies | `qdrant-client>=1.12.0` |
| Platforms | linux, macos, windows |
| Tags | `RAG`, `Vector Search`, `Qdrant`, `Semantic Search`, `Embeddings`, `Similarity Search`, `HNSW`, `Production`, `Distributed` |
## Reference: full SKILL.md
:::info
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
:::
# Qdrant - Vector Similarity Search Engine
High-performance vector database written in Rust for production RAG and semantic search.
## When to use Qdrant
**Use Qdrant when:**
- Building production RAG systems requiring low latency
- Need hybrid search (vectors + metadata filtering)
- Require horizontal scaling with sharding/replication
- Want on-premise deployment with full data control
- Need multi-vector storage per record (dense + sparse)
- Building real-time recommendation systems
**Key features:**
- **Rust-powered**: Memory-safe, high performance
- **Rich filtering**: Filter by any payload field during search
- **Multiple vectors**: Dense, sparse, multi-dense per point
- **Quantization**: Scalar, product, binary for memory efficiency
- **Distributed**: Raft consensus, sharding, replication
- **REST + gRPC**: Both APIs with full feature parity
**Use alternatives instead:**
- **Chroma**: Simpler setup, embedded use cases
- **FAISS**: Maximum raw speed, research/batch processing
- **Pinecone**: Fully managed, zero ops preferred
- **Weaviate**: GraphQL preference, built-in vectorizers
## Quick start
### Installation
```bash
# Python client
pip install qdrant-client
# Docker (recommended for development)
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
# Docker with persistent storage
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
```
### Basic usage
```python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
# Connect to Qdrant
client = QdrantClient(host="localhost", port=6333)
# Create collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
# Insert vectors with payload
client.upsert(
collection_name="documents",
points=[
PointStruct(
id=1,
vector=[0.1, 0.2, ...], # 384-dim vector
payload={"title": "Doc 1", "category": "tech"}
),
PointStruct(
id=2,
vector=[0.3, 0.4, ...],
payload={"title": "Doc 2", "category": "science"}
)
]
)
# Search with filtering
results = client.search(
collection_name="documents",
query_vector=[0.15, 0.25, ...],
query_filter={
"must": [{"key": "category", "match": {"value": "tech"}}]
},
limit=10
)
for point in results:
print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")
```
## Core concepts
### Points - Basic data unit
```python
from qdrant_client.models import PointStruct
# Point = ID + Vector(s) + Payload
point = PointStruct(
id=123, # Integer or UUID string
vector=[0.1, 0.2, 0.3, ...], # Dense vector
payload={ # Arbitrary JSON metadata
"title": "Document title",
"category": "tech",
"timestamp": 1699900000,
"tags": ["python", "ml"]
}
)
# Batch upsert (recommended)
client.upsert(
collection_name="documents",
points=[point1, point2, point3],
wait=True # Wait for indexing
)
```
### Collections - Vector containers
```python
from qdrant_client.models import VectorParams, Distance, HnswConfigDiff
# Create with HNSW configuration
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=384, # Vector dimensions
distance=Distance.COSINE # COSINE, EUCLID, DOT, MANHATTAN
),
hnsw_config=HnswConfigDiff(
m=16, # Connections per node (default 16)
ef_construct=100, # Build-time accuracy (default 100)
full_scan_threshold=10000 # Switch to brute force below this
),
on_disk_payload=True # Store payload on disk
)
# Collection info
info = client.get_collection("documents")
print(f"Points: {info.points_count}, Vectors: {info.vectors_count}")
```
### Distance metrics
| Metric | Use Case | Range |
|--------|----------|-------|
| `COSINE` | Text embeddings, normalized vectors | 0 to 2 |
| `EUCLID` | Spatial data, image features | 0 to ∞ |
| `DOT` | Recommendations, unnormalized | -∞ to ∞ |
| `MANHATTAN` | Sparse features, discrete data | 0 to ∞ |
## Search operations
### Basic search
```python
# Simple nearest neighbor search
results = client.search(
collection_name="documents",
query_vector=[0.1, 0.2, ...],
limit=10,
with_payload=True,
with_vectors=False # Don't return vectors (faster)
)
```
### Filtered search
```python
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
# Complex filtering
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter=Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="tech")),
FieldCondition(key="timestamp", range=Range(gte=1699000000))
],
must_not=[
FieldCondition(key="status", match=MatchValue(value="archived"))
]
),
limit=10
)
# Shorthand filter syntax
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter={
"must": [
{"key": "category", "match": {"value": "tech"}},
{"key": "price", "range": {"gte": 10, "lte": 100}}
]
},
limit=10
)
```
### Batch search
```python
from qdrant_client.models import SearchRequest
# Multiple queries in one request
results = client.search_batch(
collection_name="documents",
requests=[
SearchRequest(vector=[0.1, ...], limit=5),
SearchRequest(vector=[0.2, ...], limit=5, filter={"must": [...]}),
SearchRequest(vector=[0.3, ...], limit=10)
]
)
```
## RAG integration
### With sentence-transformers
```python
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
# Initialize
encoder = SentenceTransformer("all-MiniLM-L6-v2")
client = QdrantClient(host="localhost", port=6333)
# Create collection
client.create_collection(
collection_name="knowledge_base",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
# Index documents
documents = [
{"id": 1, "text": "Python is a programming language", "source": "wiki"},
{"id": 2, "text": "Machine learning uses algorithms", "source": "textbook"},
]
points = [
PointStruct(
id=doc["id"],
vector=encoder.encode(doc["text"]).tolist(),
payload={"text": doc["text"], "source": doc["source"]}
)
for doc in documents
]
client.upsert(collection_name="knowledge_base", points=points)
# RAG retrieval
def retrieve(query: str, top_k: int = 5) -> list[dict]:
query_vector = encoder.encode(query).tolist()
results = client.search(
collection_name="knowledge_base",
query_vector=query_vector,
limit=top_k
)
return [{"text": r.payload["text"], "score": r.score} for r in results]
# Use in RAG pipeline
context = retrieve("What is Python?")
prompt = f"Context: {context}\n\nQuestion: What is Python?"
```
### With LangChain
```python
from langchain_community.vectorstores import Qdrant
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Qdrant.from_documents(documents, embeddings, url="http://localhost:6333", collection_name="docs")
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
```
### With LlamaIndex
```python
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
vector_store = QdrantVectorStore(client=client, collection_name="llama_docs")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
query_engine = index.as_query_engine()
```
## Multi-vector support
### Named vectors (different embedding models)
```python
from qdrant_client.models import VectorParams, Distance
# Collection with multiple vector types
client.create_collection(
collection_name="hybrid_search",
vectors_config={
"dense": VectorParams(size=384, distance=Distance.COSINE),
"sparse": VectorParams(size=30000, distance=Distance.DOT)
}
)
# Insert with named vectors
client.upsert(
collection_name="hybrid_search",
points=[
PointStruct(
id=1,
vector={
"dense": dense_embedding,
"sparse": sparse_embedding
},
payload={"text": "document text"}
)
]
)
# Search specific vector
results = client.search(
collection_name="hybrid_search",
query_vector=("dense", query_dense), # Specify which vector
limit=10
)
```
### Sparse vectors (BM25, SPLADE)
```python
from qdrant_client.models import SparseVectorParams, SparseIndexParams, SparseVector
# Collection with sparse vectors
client.create_collection(
collection_name="sparse_search",
vectors_config={},
sparse_vectors_config={"text": SparseVectorParams(index=SparseIndexParams(on_disk=False))}
)
# Insert sparse vector
client.upsert(
collection_name="sparse_search",
points=[PointStruct(id=1, vector={"text": SparseVector(indices=[1, 5, 100], values=[0.5, 0.8, 0.2])}, payload={"text": "document"})]
)
```
## Quantization (memory optimization)
```python
from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType
# Scalar quantization (4x memory reduction)
client.create_collection(
collection_name="quantized",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
quantization_config=ScalarQuantization(
scalar=ScalarQuantizationConfig(
type=ScalarType.INT8,
quantile=0.99, # Clip outliers
always_ram=True # Keep quantized in RAM
)
)
)
# Search with rescoring
results = client.search(
collection_name="quantized",
query_vector=query,
search_params={"quantization": {"rescore": True}}, # Rescore top results
limit=10
)
```
## Payload indexing
```python
from qdrant_client.models import PayloadSchemaType
# Create payload index for faster filtering
client.create_payload_index(
collection_name="documents",
field_name="category",
field_schema=PayloadSchemaType.KEYWORD
)
client.create_payload_index(
collection_name="documents",
field_name="timestamp",
field_schema=PayloadSchemaType.INTEGER
)
# Index types: KEYWORD, INTEGER, FLOAT, GEO, TEXT (full-text), BOOL
```
## Production deployment
### Qdrant Cloud
```python
from qdrant_client import QdrantClient
# Connect to Qdrant Cloud
client = QdrantClient(
url="https://your-cluster.cloud.qdrant.io",
api_key="your-api-key"
)
```
### Performance tuning
```python
# Optimize for search speed (higher recall)
client.update_collection(
collection_name="documents",
hnsw_config=HnswConfigDiff(ef_construct=200, m=32)
)
# Optimize for indexing speed (bulk loads)
client.update_collection(
collection_name="documents",
optimizer_config={"indexing_threshold": 20000}
)
```
## Best practices
1. **Batch operations** - Use batch upsert/search for efficiency
2. **Payload indexing** - Index fields used in filters
3. **Quantization** - Enable for large collections (>1M vectors)
4. **Sharding** - Use for collections >10M vectors
5. **On-disk storage** - Enable `on_disk_payload` for large payloads
6. **Connection pooling** - Reuse client instances
## Common issues
**Slow search with filters:**
```python
# Create payload index for filtered fields
client.create_payload_index(
collection_name="docs",
field_name="category",
field_schema=PayloadSchemaType.KEYWORD
)
```
**Out of memory:**
```python
# Enable quantization and on-disk storage
client.create_collection(
collection_name="large_collection",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
quantization_config=ScalarQuantization(...),
on_disk_payload=True
)
```
**Connection issues:**
```python
# Use timeout and retry
client = QdrantClient(
host="localhost",
port=6333,
timeout=30,
prefer_grpc=True # gRPC for better performance
)
```
## References
- **[Advanced Usage](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/qdrant/references/advanced-usage.md)** - Distributed mode, hybrid search, recommendations
- **[Troubleshooting](https://github.com/NousResearch/hermes-agent/blob/main/optional-skills/mlops/qdrant/references/troubleshooting.md)** - Common issues, debugging, performance tuning
## Resources
- **GitHub**: https://github.com/qdrant/qdrant (22k+ stars)
- **Docs**: https://qdrant.tech/documentation/
- **Python Client**: https://github.com/qdrant/qdrant-client
- **Cloud**: https://cloud.qdrant.io
- **Version**: 1.12.0+
- **License**: Apache 2.0