Dewey

DeweySearch

A static search index for .NET.

DeweySearch builds a sharded, Pagefind-style inverted index at build time and queries it in the browser, fetching only the shards a search touches. No server, no runtime service, no dependencies.

Version
0.1.x
Target
net10.0
License
MIT
Status
Preview
005.1

Installation

DeweySearch ships as a single, dependency-free NuGet package. The build-time indexer targets net10.0; the browser client is plain JavaScript with no build step.

bash
# Add DeweySearch to your project
dotnet add package DeweySearch
powershell
# Visual Studio Package Manager Console
Install-Package DeweySearch
xml
<ItemGroup>
  <PackageReference Include="DeweySearch" Version="0.1.*" />
</ItemGroup>

Hosting in ASP.NET Core? The DeweySearch.Web package serves the JavaScript client as a static web asset so the NuGet package and the source can never drift.

005.13

Quick start

Describe your documents, build the index, and query it from the browser.

1

Describe your documents

A SearchDocument is a flat record — URL, title, optional description, heading text, and plain-text body. DeweySearch knows nothing about pages, locales, or sections; the host produces these from whatever content model it has.

csharp
using DeweySearch;
  
var documents = new List<SearchDocument>
{
    new(Url: "/guide/install",
        Title: "Installation",
        Description: "Add DeweySearch to your project.",
        Headings: "Requirements Setup",
        Body: "DeweySearch ships as a single dependency-free NuGet package."),
};
2

Build the index

IndexBuilder.Build produces an in-memory SearchIndex; ToFiles() serializes it to the static JSON artifacts the client fetches (index.json, t-{prefix}.json, f-{docId}.json).

csharp
using DeweySearch;
  
var index = new IndexBuilder(new IndexOptions { ShardPrefixLength = 2 })
    .Build(documents);
  
Directory.CreateDirectory("wwwroot/search-index");
foreach (var (name, bytes) in index.ToFiles())
    File.WriteAllBytes(Path.Combine("wwwroot/search-index", name), bytes);
3

Query in the browser

Point DeweySearchEngine at the directory holding the artifacts and call search(). It loads the manifest once, then only the shards and fragments a query actually needs.

javascript
// dewey-search.js exposes DeweySearchEngine on the global scope — no build step.
const engine = new DeweySearchEngine('/search-index');
const results = await engine.search('install');
  
for (const hit of results) {
    console.log(hit.url, hit.title, hit.score);
}

Checkpoint

The index is data, not code. Rebuild it whenever your content changes and commit the JSON alongside your site — there is nothing to deploy and nothing to run.
005.74

Core concepts

Six ideas cover most of DeweySearch's surface — a build-time C# library that emits static JSON, and a tiny client-side engine that queries it.

005.74 · A

Inverted index

A sharded, Pagefind-style inverted index, built once at build time. The whole index is never shipped to the browser.

005.74 · B

Tokenizer & stemmer

An accent-folding tokenizer and a plurals-only stemmer, implemented identically in C# and JavaScript and pinned by shared conformance fixtures.

005.74 · C

Prefix shards

Postings are split into per-term-prefix shards. A query downloads only the shards its terms touch, not the whole corpus.

005.74 · D

BM25 in the browser

All scoring runs client-side: BM25 with field boosts, prefix completion, bounded fuzzy matching, and synonym expansion.

005.74 · E

Facets

An open facet dictionary — any dimension on a document is interned and shipped in the manifest for client-side filtering.

005.74 · F

Fragments

Per-document excerpt fragments are fetched on demand, only for the results actually shown.

How it works

One sharded inverted index, built once at build time and queried in the browser — only the shards a search actually touches are ever downloaded. Scroll to follow a single document the whole way: tokenizer, postings, shards, then BM25 scoring in the browser.

005.1

A document is just flat fields.

Each page becomes a record with title, headings, description, and body. Its position in the input list is its id — that integer is used everywhere downstream.

005.13

Tokenize.

Fold accents, split on punctuation, then cleave camelCase, acronyms, and digit boundaries. When a run splits, the whole run is kept too — so HeadingStyle is findable as heading, style, and headingstyle.

005.13

Stem — plurals only.

A tiny ordered stemmer; first rule wins. Gerunds are left alone on purpose: string stays string, heading stays heading. That protects technical vocabulary.

005.74

Inverted index.

One posting per (term, document): [docId, fieldFlags, tf]. Field bits OR together — heading appears in title (1), heading (2), and body (8): 1 | 2 | 8 = 11.

005.74

Shard & emit JSON.

Terms file into a shard keyed by the first two characters of the stem. The whole index becomes data, not code — static JSON you commit beside your site.

005.133

One tokenizer, two runtimes.

The query runs through byte-for-byte identical tokenize + stem logic. That contract is why "headings" lands on the index key heading.

005.133

Fetch only what it touches.

From the stem heading, derive prefix he and request t-he.json — and nothing else. The other five shards stay on the server.

005.133

Score in the browser.

For each candidate term, BM25 × field boost × match quality. heading (exact, boost 7) scores ≈ 7.12; headingstyle (prefix completion, boost 2) ≈ 0.64.

005.133

Result.

Fetch f-1.json only for results actually shown; render the snippet with the match highlighted.

005.133

API reference

The public surface is small and splits in two: a build-time C# library that emits the index, and a browser client that queries it. The C# shapes are pulled live from source; the JavaScript client has no Roslyn to mine, so it is documented by hand.

Build-time · C#

SearchDocument

record
One input document fed to Build. The host is responsible for producing these from whatever content model it has — DeweySearch knows nothing about pages, locales, or sections.
Url: string
Canonical URL of the document, surfaced in results and fragments.
Title: string
Title shown in results; weighted highest when ranking.
Description: string? = null
Optional summary; weighted above body.
Headings: string
Space-joined heading text; weighted above body.
Body: string
Plain-text body used for full-text matching and excerpt fragments.
Priority: int = 1
Relative ranking weight; higher wins. Default: 1.
Facets: IReadOnlyDictionary<string, string[]>? = null
Open facet map (dimension name to its values) for client-side filtering, e.g. { "section": ["Guides"], "tag": ["cli", "beginner"] }. Dimensions are arbitrary and caller-defined; values are interned and assigned stable ids at build time. Null for none.
Crumbs: IReadOnlyList<string>? = null
Optional ancestor breadcrumb trail (root-to-parent labels, excluding this document's own Title) for hierarchical results — e.g. a heading record carrying ["Page Title", "Parent Heading"]. Stored verbatim in the manifest for the client to display and group by; DeweySearch does not interpret it. Null or empty for a top-level result.

IndexOptions

class
Configuration for building a SearchIndex.
ShardPrefixLength: int = 2
Number of leading characters of a stemmed term used as its shard key. Lower values produce fewer, larger shards; higher values produce more, smaller shards. Default: 2.
MaxEditDistance: int = 2
Upper bound on the edit distance the client applies for typo-tolerant matching. The client also scales the budget down for short terms; this caps it. Set to 0 to require exact matches. Default: 2.
Synonyms: Dictionary<string, string[]> = []
Query-time synonyms. Each entry maps a term to the additional terms it should also match. Keys and values are stemmed at build time and shipped in the index manifest, so callers write natural words (e.g. "config" => ["configuration"]). Default: empty.

IndexBuilder calls

ctor
new IndexBuilder(IndexOptions options)

Create a builder. Use the parameterless overload for defaults (2-character shard prefixes, edit distance 2).

method
SearchIndex Build(IReadOnlyList<SearchDocument> documents)

Build the inverted index, BM25 stats, document table, facets, and excerpt fragments. A document's id is its position in the list.

method
IReadOnlyDictionary<string, byte[]> ToFiles()

Serialize the index to its static artifacts keyed by leaf file name: index.json, one t-{prefix}.json per shard, one f-{docId}.json per document.

In the browser · JavaScript

DeweySearchEngine

javascript
Self-contained client for the static index — no dependencies, no CDN. Fetches the manifest once, then only the term-prefix shards a query touches and the per-document fragments for results actually shown.
new DeweySearchEngine(basePath)
Construct against the directory holding the artifacts from ToFiles()index.json plus the t-{prefix}.json and f-{docId}.json files. The leaf names are the contract; the host owns the directory.
search(query) → Promise<Result[]>
Tokenize the query, then score every touched document with BM25, field boosts, prefix completion, bounded fuzzy matching and synonyms, and resolve to a ranked list of { docId, score, fields } — best first. Loads the manifest on first call, then fetches only the shards the query's terms touch.
loadManifest() → Promise<Manifest>
Fetch index.json once and cache it. search() calls this implicitly; call it yourself to warm the cache or to read availableFacets() and docEntry() before the first query.
docEntry(docId) → object | null
The manifest record behind a result — u (url), t (title), c (breadcrumb trail), f (facet ids). Use it to render a hit and to group by page via docEntry(id).u.split('#')[0].
loadFragment(docId) → Promise<object | null>
Fetch one document's excerpt fragment (f-{docId}.json) on demand — only for the handful of results you actually render.
availableFacets() → object
The manifest's facet dictionary (dimension → values, interned to ids) for building filter UI. Empty until the manifest is loaded.
matchesFacets(docId, activeFacets) → boolean
True when a document satisfies every active facet selection, so chips can re-filter results without re-fetching shards. activeFacets maps each dimension to a Set of selected ids.
DeweySearchEngine.FieldFlags → static
Frozen bit flags OR-ed into each result's fields{ Title: 1, Heading: 2, Description: 4, Body: 8 } — so a host can tell a title or heading hit from a body-only one and skip a redundant snippet.

The module also exports tokenize(text) and stem(word) — the cross-language contract primitives, byte-for-byte mirrors of the C# Tokenizer and Stemmer and pinned by shared conformance fixtures.