transcription/AGENTS.md
keboss-m eee8f4c8a4 Replace LightRAG with native Python RAG engine + add deploy tooling
- New: src/rag/engine/ — in-process hybrid search (FTS5 BM25 + sqlite-vec + LLM rerank)
- New: src/rag/qmd/ — compatibility layer (qmd_query, qmd_chat, qmd_chat_stream, qmd_index_*)
- New: src/ingest/stub_writer.py — .md stubs for binary files (videos, archives)
- New: scripts/deploy.sh + scripts/pull_models.sh + Makefile + .env.example
- Removed: LightRAG, sentence-transformers embedding via separate package, rag_standalone/
- Removed: @nousresearch/qmd npm dep (package not published); Node.js from Dockerfile
- Updated: tests/ (46 passed), docker-compose, .dockerignore, config.yaml, README

Engine: in-process Python (no daemon, no npm), sentence-transformers 384-dim,
RRF fusion (k=60), BM25 + vector with numpy fallback. WebSocket API unchanged.

Deploy: 'git clone' + 'make init' + 'make pull-models MODELS_SOURCE=...' + 'make up'.
Models (5.83 GB) live outside git; pulled via rsync from dev host.
2026-06-10 14:24:01 +03:00

2.9 KiB
Raw Blame History

Agent Guidelines

Git Workflow

  • Commit frequently: After completing a meaningful unit of work (feature, fix, or file update), stage changes with git add and create a commit with a clear, concise message in the imperative mood (e.g., "Add parser", "Fix timeout").
  • Push to remote: Once the local commit(s) are ready, push them to the remote repository. Use git push -u origin main if the upstream branch is not yet tracked; otherwise use git push.
  • No uncommitted changes left behind: Before finishing a task, ensure all intended changes are committed and pushed to avoid losing work.
  • No empty commits: Avoid creating empty or placeholder commits.

Native RAG Engine

The project uses a native Python RAG engine (no external daemons, no Node.js): hybrid BM25 (SQLite FTS5) + vector (sqlite-vec with numpy fallback) + LLM rerank through OpenCode.

Layout

  • src/rag/engine/ — the engine itself:
    • db.pyDatabase (SQLite + sqlite-vec + FTS5 schema, fallback detection).
    • chunker.py — markdown-aware recursive splitter (~900 chars, 15% overlap).
    • embeddings.py — singleton sentence-transformers model (lazy load).
    • bm25.py — FTS5 BM25 with rank_bm25 fallback.
    • vector.py — sqlite-vec with numpy cosine fallback.
    • hybrid.py — RRF fusion (k=60).
    • rerank.py — LLM rerank through OpenCode.
    • engine.py — public facade: index_file, index_text, search, vsearch, query, get, status, warmup.
  • src/rag/qmd/ — compatibility layer preserving the old qmd_* API: qmd_query, qmd_chat, qmd_chat_stream, qmd_index_meeting, qmd_index_document. main.py / queue.py / ingest_worker.py use these.
  • src/ingest/stub_writer.py.md stubs for binary files (videos, archives).

Conventions

  • Коллекция = processed/<org>/qmd_collections/<project_slug>/ (или _global/) — внутри лежит index.sqlite.
  • Перед изменением src/rag/engine/ — прочитай openspec/changes/native-rag-engine/design.md.
  • При добавлении нового retrieval-режима — обнови LEGACY_MODE_MAP в src/rag/qmd/query.py.
  • При добавлении нового LLM-вызова — обнови CHAT_MODES в src/rag/qmd/query.py.

Tests

  • Все новые модули src/rag/engine/ обязаны иметь unit-тест в tests/test_native_engine.py.
  • Реальные данные: 35 .md файлов в tempfile.TemporaryDirectory().
  • Запуск: python -m pytest tests/ -q (46 passed на момент написания).
  • E2E: tests/test_native_engine_e2e.py — ingest → search → chat-stream с подменой OpenCode.

Fallback-стратегии

  • FTS5 недоступен → rank_bm25 in-memory.
  • sqlite-vec недоступен → numpy cosine in-memory.
  • Embedding-модель не загрузилась → BM25-only режим.