VaultSort 3.0.0
smart-categorizeaiadvanced-organizeprivacymajor-release
VaultSort 3 introduces Smart Categorize — a private, on-device AI engine that reads your files and proposes categories derived from your real library. No cloud. No API key. Nothing leaves your Mac.
What's New
Smart Categorize (headline feature)
- On-device semantic understanding. A small (~470 MB, downloaded once) Apple-Silicon-native embedding model reads PDFs, ePubs, Word, plain text, Markdown, code, and more — entirely on your Mac via the Apple Neural Engine.
- Categories from your content, not a generic list. Smart Categorize clusters your real files and proposes names derived from the patterns it actually finds — Audits, Invoices, Officeholders, Presets — instead of forcing your library into pre-baked buckets.
- Live wizard. Pick a folder, watch the analysis stream in (300-file reservoir sample, ~2 seconds on M-series), then accept, edit, merge, or drop each proposed category before any file moves.
- Saved Category Sets. Curate once, reuse across Advanced Organize jobs.
- Composable with the full Rule Builder. The new
semanticCategorycondition combines with filename, size, date, tag, and every other Advanced Organize predicate. Example: "ifsemanticCategoryis Receipts ANDsize > 500 KB, move to /Receipts/Large."
Quality of life
- Order-independent results. Re-running on the same folder set in any selection order now produces byte-identical proposals.
- Tighter junk-drawer behaviour. Prose-heavy clusters require ≥ 0.7 cohesion to surface; loose groupings merge into Uncategorized rather than littering the proposal list.
- Smarter naming. Original casing preserved for acronyms (
USAHstaysUSAH, notUsahs); irregular plurals (Children,Knives,Mice) handled correctly; calendar tokens, copy markers, and filename modifiers (last,latest,prev) excluded from cluster names. - Filename-evidence wins on disagreement. When body content and filenames genuinely disagree about what a cluster is "about", the filename signal now wins — matching how users actually read their own folders.
Privacy & telemetry
- Embedding model integrity verified via SHA-256 checksum at install.
- Embedding cache scoped per-user, never synced, never transmitted.
- New
smart_categorize_derive_completedand_prepass_completedevents surface health metrics (silhouette, cache hit rate, sanity counts) — all aggregate, no file paths or content.
Improvements
- Fully redesigned Advanced Organize interface to support the Smart Categorize wizard alongside the existing job and rule system.
- Three-button save-collision dialog when accepting a Category Set whose name already exists (Replace / Save as copy / Cancel).
- Schema-version stamp on every saved Category Set with forward-compatible migration shell, so older sets continue to load and newer sets are gracefully skipped on older builds.
- L2-norm sanity guards in both deriver and prepass paths — degenerate embeddings can no longer crash a derive run.
- Embedding-cache dimension validation on open and on each write.
Bug Fixes
- Exemplar deduplication: clusters no longer surface multiple files that share the same basename.
- Fewer "Cluster N" fallbacks; singleton clusters now derive names from their own filenames when TF-IDF has no signal.
- Prose-heavy junk drawers (loose
.md/.txt/.docxgroupings) no longer surface as named proposals at low cohesion. - Telemetry events now include
dropped_non_finiteandrenormalizedcounts so embedding-pipeline health is monitorable.
Compatibility
- macOS 12+ on Apple Silicon. Smart Categorize requires Apple Silicon; the rest of VaultSort 3 runs on Intel as before.
- Existing rules, jobs, and Category Sets from VaultSort 2.x load unchanged.
Pricing
- VaultSort 3 ships with a price increase on new licenses. Existing license-holders receive the upgrade as part of their plan.
