Inverted index
A sharded, Pagefind-style inverted index, built once at build time. The whole index is never shipped to the browser.
DeweySearch
DeweySearch builds a sharded, Pagefind-style inverted index at build time and queries it in the browser, fetching only the shards a search touches. No server, no runtime service, no dependencies.
DeweySearch ships as a single, dependency-free NuGet package. The build-time indexer
targets net10.0; the browser client is plain JavaScript with no build step.
# Add DeweySearch to your project
dotnet add package DeweySearch
# Visual Studio Package Manager Console
Install-Package DeweySearch
<ItemGroup>
<PackageReference Include="DeweySearch" Version="0.1.*" />
</ItemGroup>
Hosting in ASP.NET Core? The DeweySearch.Web package serves the JavaScript
client as a static web asset so the NuGet package and the source can never drift.
Describe your documents, build the index, and query it from the browser.
A SearchDocument is a flat record — URL, title, optional description,
heading text, and plain-text body. DeweySearch knows nothing about pages, locales,
or sections; the host produces these from whatever content model it has.
using DeweySearch;
var documents = new List<SearchDocument>
{
new(Url: "/guide/install",
Title: "Installation",
Description: "Add DeweySearch to your project.",
Headings: "Requirements Setup",
Body: "DeweySearch ships as a single dependency-free NuGet package."),
};
IndexBuilder.Build produces an in-memory SearchIndex;
ToFiles() serializes it to the static JSON artifacts the client
fetches (index.json, t-{prefix}.json, f-{docId}.json).
using DeweySearch;
var index = new IndexBuilder(new IndexOptions { ShardPrefixLength = 2 })
.Build(documents);
Directory.CreateDirectory("wwwroot/search-index");
foreach (var (name, bytes) in index.ToFiles())
File.WriteAllBytes(Path.Combine("wwwroot/search-index", name), bytes);
Point DeweySearchEngine at the directory holding the artifacts and call
search(). It loads the manifest once, then only the shards and fragments
a query actually needs.
// dewey-search.js exposes DeweySearchEngine on the global scope — no build step.
const engine = new DeweySearchEngine('/search-index');
const results = await engine.search('install');
for (const hit of results) {
console.log(hit.url, hit.title, hit.score);
}
Checkpoint
The index is data, not code. Rebuild it whenever your content changes and commit the JSON alongside your site — there is nothing to deploy and nothing to run.Six ideas cover most of DeweySearch's surface — a build-time C# library that emits static JSON, and a tiny client-side engine that queries it.
A sharded, Pagefind-style inverted index, built once at build time. The whole index is never shipped to the browser.
An accent-folding tokenizer and a plurals-only stemmer, implemented identically in C# and JavaScript and pinned by shared conformance fixtures.
Postings are split into per-term-prefix shards. A query downloads only the shards its terms touch, not the whole corpus.
All scoring runs client-side: BM25 with field boosts, prefix completion, bounded fuzzy matching, and synonym expansion.
An open facet dictionary — any dimension on a document is interned and shipped in the manifest for client-side filtering.
Per-document excerpt fragments are fetched on demand, only for the results actually shown.
How it works
One sharded inverted index, built once at build time and queried in the browser — only the shards a search actually touches are ever downloaded. Scroll to follow a single document the whole way: tokenizer, postings, shards, then BM25 scoring in the browser.
Each page becomes a record with title, headings, description, and body. Its position in the input list is its id — that integer is used everywhere downstream.
Fold accents, split on punctuation, then cleave camelCase, acronyms, and digit boundaries. When a run splits, the whole run is kept too — so HeadingStyle is findable as heading, style, and headingstyle.
A tiny ordered stemmer; first rule wins. Gerunds are left alone on purpose: string stays string, heading stays heading. That protects technical vocabulary.
One posting per (term, document): [docId, fieldFlags, tf]. Field bits OR together — heading appears in title (1), heading (2), and body (8): 1 | 2 | 8 = 11.
Terms file into a shard keyed by the first two characters of the stem. The whole index becomes data, not code — static JSON you commit beside your site.
The query runs through byte-for-byte identical tokenize + stem logic. That contract is why "headings" lands on the index key heading.
From the stem heading, derive prefix he and request t-he.json — and nothing else. The other five shards stay on the server.
For each candidate term, BM25 × field boost × match quality. heading (exact, boost 7) scores ≈ 7.12; headingstyle (prefix completion, boost 2) ≈ 0.64.
Fetch f-1.json only for results actually shown; render the snippet with the match highlighted.
The public surface is small and splits in two: a build-time C# library that emits the index, and a browser client that queries it. The C# shapes are pulled live from source; the JavaScript client has no Roslyn to mine, so it is documented by hand.
Build-time · C#
SearchDocumentBuild. The host is responsible for producing these from whatever content model it has — DeweySearch knows nothing about pages, locales, or sections.{ "section": ["Guides"], "tag": ["cli", "beginner"] }. Dimensions are arbitrary and caller-defined; values are interned and assigned stable ids at build time. Null for none.Title) for hierarchical results — e.g. a heading record carrying ["Page Title", "Parent Heading"]. Stored verbatim in the manifest for the client to display and group by; DeweySearch does not interpret it. Null or empty for a top-level result.IndexOptionsSearchIndex."config" => ["configuration"]). Default: empty.Create a builder. Use the parameterless overload for defaults (2-character shard prefixes, edit distance 2).
Build the inverted index, BM25 stats, document table, facets, and excerpt fragments. A document's id is its position in the list.
Serialize the index to its static artifacts keyed by leaf file name: index.json, one t-{prefix}.json per shard, one f-{docId}.json per document.
In the browser · JavaScript
DeweySearchEngineToFiles() — index.json plus the t-{prefix}.json and f-{docId}.json files. The leaf names are the contract; the host owns the directory.{ docId, score, fields } — best first. Loads the manifest on first call, then fetches only the shards the query's terms touch.index.json once and cache it. search() calls this implicitly; call it yourself to warm the cache or to read availableFacets() and docEntry() before the first query.u (url), t (title), c (breadcrumb trail), f (facet ids). Use it to render a hit and to group by page via docEntry(id).u.split('#')[0].f-{docId}.json) on demand — only for the handful of results you actually render.activeFacets maps each dimension to a Set of selected ids.fields — { Title: 1, Heading: 2, Description: 4, Body: 8 } — so a host can tell a title or heading hit from a body-only one and skip a redundant snippet.
The module also exports tokenize(text) and stem(word) — the
cross-language contract primitives, byte-for-byte mirrors of the C# Tokenizer
and Stemmer and pinned by shared conformance fixtures.