The content pipeline and union types

Why does Pennington model its content flow as a union of four records instead of a single ContentItem class with a status field or a polymorphic base class?

Context

A content engine has to move each page through at least four distinct phases — discovery, parsing, rendering, and emission — and each phase legitimately knows different things about the item. A discovered file has a source location and a route, but no parsed front matter or markdown body. A parsed item has structured metadata and text, but no HTML. A rendered item has HTML and an outline, ready to write to disk. These are not the same data at different points in time; they are genuinely different shapes.

The conventional escape routes both have a cost. A single ContentItem class with nullable Metadata, Html, and Error fields invites "is it safe to touch this yet?" checks at every call site, and the compiler has no way to enforce which combination of fields is populated at which stage. A traditional inheritance hierarchy — ContentItem → ParsedItem → RenderedItem — puts discovery-stage code and render-stage code in a subtyping relationship that does not reflect how they are actually used, and forces the failure case into either a parallel branch that breaks is-checks downstream or a nullable error field on every subclass. C# 15 discriminated unions offer a third path: the compiler tracks which case you hold, pattern matching is exhaustive, and each case carries exactly the fields that exist at that stage.

How it works

The union shape

ContentItem is a union of four record cases: DiscoveredItem, ParsedItem, RenderedItem, and FailedItem. Each case is a plain record holding only the fields that make sense at its stage — there is no base class, no status enum, and no nullable placeholder for data that has not arrived yet. The union itself exposes a single Route property that all four cases share, because "every content item, even a failed one, belongs to a route" is a genuine invariant. Call sites that only need to know the route never have to pattern-match; call sites that need the rendered HTML must match and will get a compile error if they forget a case.

namespace Pennington.Pipeline;
  
using FrontMatter;
using Routing;
  
// Case types — each is a standalone record
  
/// <summary>A content item discovered by a content service but not yet parsed.</summary>
/// <param name="Route">Canonical route for the item.</param>
/// <param name="Source">Origin describing how the item's content is produced.</param>
public record DiscoveredItem(ContentRoute Route, ContentSource Source);
  
/// <summary>A content item whose front matter and raw markdown body have been parsed.</summary>
/// <param name="Route">Canonical route for the item.</param>
/// <param name="Metadata">Parsed front matter metadata.</param>
/// <param name="RawMarkdown">Markdown body text, with front matter stripped.</param>
public record ParsedItem(ContentRoute Route, IFrontMatter Metadata, string RawMarkdown);
  
/// <summary>A content item whose body has been rendered to HTML.</summary>
/// <param name="Route">Canonical route for the item.</param>
/// <param name="Metadata">Parsed front matter metadata.</param>
/// <param name="Content">Rendered content and extracted page data.</param>
public record RenderedItem(ContentRoute Route, IFrontMatter Metadata, RenderedContent Content);
  
/// <summary>A content item that failed during parsing or rendering.</summary>
/// <param name="Route">Canonical route for the item that failed.</param>
/// <param name="Error">Error describing the failure.</param>
public record FailedItem(ContentRoute Route, ContentError Error);
  
/// <summary>Union of all content item states flowing through the pipeline.</summary>
// The union — compiler enforces exhaustive matching over exactly these four types.
// The #else branch is a transitional net10.0 shim while C# 15 unions require net11.0.
#if NET11_0_OR_GREATER
public union ContentItem(DiscoveredItem, ParsedItem, RenderedItem, FailedItem)
{
    /// <summary>The route for the current item regardless of state.</summary>
    public ContentRoute Route => this switch
    {
        DiscoveredItem d => d.Route,
        ParsedItem p => p.Route,
        RenderedItem r => r.Route,
        FailedItem f => f.Route,
        null => throw new InvalidOperationException("Uninitialized ContentItem")
    };
}
#else
[System.Runtime.CompilerServices.Union]
public readonly struct ContentItem : System.Runtime.CompilerServices.IUnion
{
    /// <summary>Wrapped case instance; inspect via pattern matching on the case types.</summary>
    public object? Value { get; }
    /// <summary>Wraps a <see cref="DiscoveredItem"/>.</summary>
    public ContentItem(DiscoveredItem value) { Value = value; }
    /// <summary>Wraps a <see cref="ParsedItem"/>.</summary>
    public ContentItem(ParsedItem value) { Value = value; }
    /// <summary>Wraps a <see cref="RenderedItem"/>.</summary>
    public ContentItem(RenderedItem value) { Value = value; }
    /// <summary>Wraps a <see cref="FailedItem"/>.</summary>
    public ContentItem(FailedItem value) { Value = value; }
    /// <summary>Implicit conversion from <see cref="DiscoveredItem"/>.</summary>
    public static implicit operator ContentItem(DiscoveredItem value) => new(value);
    /// <summary>Implicit conversion from <see cref="ParsedItem"/>.</summary>
    public static implicit operator ContentItem(ParsedItem value) => new(value);
    /// <summary>Implicit conversion from <see cref="RenderedItem"/>.</summary>
    public static implicit operator ContentItem(RenderedItem value) => new(value);
    /// <summary>Implicit conversion from <see cref="FailedItem"/>.</summary>
    public static implicit operator ContentItem(FailedItem value) => new(value);
  
    /// <summary>The route for the current item regardless of state.</summary>
    public ContentRoute Route => Value switch
    {
        DiscoveredItem d => d.Route,
        ParsedItem p => p.Route,
        RenderedItem r => r.Route,
        FailedItem f => f.Route,
        _ => throw new InvalidOperationException("Uninitialized ContentItem")
    };
}
#endif

The four case records are siblings in the same union — none inherits from another, and there is no Stage or Status field anywhere. Route is the one projection lifted onto the union, and that narrowness is deliberate: every additional lifted property would need a sensible value for all four cases, which is exactly the nullable-field trap the union is meant to avoid.

`ContentSource` discriminates where an item came from

A DiscoveredItem pairs a ContentRoute with a second union, ContentSource, which records where the item came from: a file on disk, a Razor @page, a redirect definition, or a programmatic generator. The reason this is a separate union rather than a field on DiscoveredItem is that discovery is itself a pluggable step — different sources need to carry different data (a file path, a Razor component type, a target URL) without forcing later stages to care. Once an item has been parsed, its source has already done its job; the parser and renderer work entirely against the resolved front matter and content text, and ContentSource disappears from the picture.

namespace Pennington.Pipeline;
  
using Routing;
  
/// <summary>Content sourced from a markdown file on disk.</summary>
/// <param name="Path">Absolute path to the markdown file.</param>
public record MarkdownFileSource(FilePath Path);
  
/// <summary>Content rendered by a Razor page/component.</summary>
/// <param name="ComponentType">Fully qualified name of the component type.</param>
public record RazorPageSource(string ComponentType);
  
/// <summary>A route that redirects to another URL.</summary>
/// <param name="TargetUrl">Destination URL for the redirect.</param>
public record RedirectSource(UrlPath TargetUrl);
  
/// <summary>Content produced programmatically by a generator.</summary>
/// <param name="Generator">Generator that produces the content on demand.</param>
public record ProgrammaticSource(IProgrammaticContentGenerator Generator);
  
/// <summary>
/// Marker source for routes whose content is produced by a live HTTP endpoint
/// (e.g., the SPA data endpoint). These items exist so the build crawler
/// discovers the URL and fetches it through the live pipeline — they do not
/// participate in parse/render, are not redirects, and do not appear in the
/// sitemap.
/// </summary>
public record EndpointSource();
  
/// <summary>Union of all ways content can be sourced for a route.</summary>
#if NET11_0_OR_GREATER
public union ContentSource(MarkdownFileSource, RazorPageSource, RedirectSource, ProgrammaticSource, EndpointSource);
#else
[System.Runtime.CompilerServices.Union]
public readonly struct ContentSource : System.Runtime.CompilerServices.IUnion
{
    /// <summary>Wrapped case instance; inspect via pattern matching on the case types.</summary>
    public object? Value { get; }
    /// <summary>Wraps a <see cref="MarkdownFileSource"/>.</summary>
    public ContentSource(MarkdownFileSource value) { Value = value; }
    /// <summary>Wraps a <see cref="RazorPageSource"/>.</summary>
    public ContentSource(RazorPageSource value) { Value = value; }
    /// <summary>Wraps a <see cref="RedirectSource"/>.</summary>
    public ContentSource(RedirectSource value) { Value = value; }
    /// <summary>Wraps a <see cref="ProgrammaticSource"/>.</summary>
    public ContentSource(ProgrammaticSource value) { Value = value; }
    /// <summary>Wraps an <see cref="EndpointSource"/>.</summary>
    public ContentSource(EndpointSource value) { Value = value; }
    /// <summary>Implicit conversion from <see cref="MarkdownFileSource"/>.</summary>
    public static implicit operator ContentSource(MarkdownFileSource value) => new(value);
    /// <summary>Implicit conversion from <see cref="RazorPageSource"/>.</summary>
    public static implicit operator ContentSource(RazorPageSource value) => new(value);
    /// <summary>Implicit conversion from <see cref="RedirectSource"/>.</summary>
    public static implicit operator ContentSource(RedirectSource value) => new(value);
    /// <summary>Implicit conversion from <see cref="ProgrammaticSource"/>.</summary>
    public static implicit operator ContentSource(ProgrammaticSource value) => new(value);
    /// <summary>Implicit conversion from <see cref="EndpointSource"/>.</summary>
    public static implicit operator ContentSource(EndpointSource value) => new(value);
}
#endif

The five cases cover every origin Pennington ships, and downstream stages — parsers, renderers, the output writer — never pattern-match on ContentSource; by the time they run, the source has been replaced by the parsed shape. For the construction and consumption shapes in detail, including the .Value pattern that works across both target frameworks, see ContentSource: constructing and pattern-matching the union.

Stage transitions replace the item

Each stage in the pipeline works by replacing the incoming union case with the next one. ParseAsync pulls a stream of ContentItem values and, for each DiscoveredItem, hands its content to the registered IContentParser. When the parser succeeds, the DiscoveredItem is replaced by a ParsedItem carrying the resolved front matter and text. RenderAsync does the same thing one level further: each ParsedItem is handed to the IContentRenderer, and on success a RenderedItem takes its place, now carrying the HTML output and a navigation outline. The final stage, GenerateAsync, pattern-matches on the full union to write output files and accumulate the build report.

The replacement invariant is what gives the pipeline its composability. A RenderedItem flowing into ParseAsync is already past that stage, so ParseAsync passes it through unchanged. A ParsedItem flowing into RenderAsync gets rendered; a RenderedItem in the same stream passes through. This means you can hand the pipeline a partially-processed stream — one that mixes discovered and already-parsed items — and it will do the right thing for each. There is no need to coordinate which stage ran last; the case type carries that information.

await foreach (var item in items)
{
    // FailedItems pass through unchanged
    if (item.Value is FailedItem)
    {
        yield return item;
        continue;
    }
  
    if (item.Value is DiscoveredItem discovered)
    {
        // RedirectSource items are handled by PenningtonRedirectMiddleware at
        // request time (dev) and captured as 301 responses by the build crawler;
        // they don't participate in parse/render and must not reach the parser.
        // EndpointSource items (e.g., /_spa-data/*.json) are produced by a live
        // HTTP endpoint — there's no file to parse, same skip applies.
        if (discovered.Source.Value is RedirectSource or EndpointSource) continue;
  
        ContentItem result;
        try
        {
            result = await _parser.ParseAsync(discovered);
        }
        catch (Exception ex)
        {
            result = new FailedItem(discovered.Route,
                new ContentError($"Parse failed: {ex.Message}", ex));
        }
        yield return result;
    }
    else
    {
        // Already parsed or rendered — pass through
        yield return item;
    }
}

The implementation has three explicit branches: a FailedItem passes through without touching the parser, a DiscoveredItem is handed to IContentParser.ParseAsync inside a try/catch that demotes any exception to a FailedItem, and a ParsedItem or RenderedItem passes through unchanged because the work for that stage is already done. There is also a guard for RedirectSource: items whose source is a redirect skip the parser entirely, because a redirect has no body to parse and is served by middleware at request time rather than written as an HTML file.

`FailedItem` as a peer case, not an exception

Exceptions thrown inside IContentParser.ParseAsync or IContentRenderer.RenderAsync are caught at the pipeline boundary and rewritten as a FailedItem carrying the route and a ContentError that describes what went wrong. From that point on, the failed item rides the same async stream as the successful ones. Downstream stages check is FailedItem and short-circuit without touching the error or trying to make sense of absent fields.

The payoff arrives in GenerateAsync. Because FailedItem is a peer case in the union, the exhaustive pattern match there routes it to BuildReportBuilder.AddError — every parse or render exception ends up as a named entry in the build report rather than an unhandled exception that aborts the crawl. One broken markdown file does not prevent the other four hundred from rendering. This is the concrete benefit of treating failure as data: the "sad path" and the "happy path" have identical shape in the stream, which means the final aggregation step sees them through the same lens.

/// <summary>A content item that failed during parsing or rendering.</summary>
/// <param name="Route">Canonical route for the item that failed.</param>
/// <param name="Error">Error describing the failure.</param>
public record FailedItem(ContentRoute Route, ContentError Error);

Treating failure as a data case is what makes the exhaustive match in GenerateAsync meaningful rather than ceremonial. If FailedItem were an exception that escaped the pipeline, there would be no case to match — and no way for the compiler to confirm that every outcome was handled.

Trade-offs

Cost: The union is a change-amplification point. Adding a fifth stage or a new terminal case means every switch on ContentItem across the codebase must be updated — the compiler will surface all the gaps, which is the feature, but it is still a broad change. The union keyword is also a C# 15 feature, and LSP tooling in some editors still flags false errors on the declaration; the compiler handles it correctly, but the red squiggles are real until tooling catches up.
Alternative considered — discriminator property: A ContentItem class with a Stage enum and nullable Metadata, Html, and Error fields was rejected because every consumer ends up writing the same null-check sequence, and the compiler has no way to prove which combination of fields is actually populated at any given call site. The safety that pattern matching provides would have to be replicated manually everywhere.
Alternative considered — inheritance hierarchy: An abstract ContentItem with ParsedItem : ContentItem and RenderedItem : ParsedItem was rejected because it creates a subtyping relationship that does not reflect how the cases are actually used, and because it forces FailedItem into either a parallel branch that breaks is-checks downstream or a nullable error field on every subclass — which leads straight back to the nullable-field problem.
Consequence: Downstream code that only needs the ContentRoute still pays a pattern match unless it goes through the Route property lifted onto the union. That one projection is why the design resists lifting anything else: each additional shared property needs a sensible value for all four cases, and "sensible for all four cases" is the definition of the nullable-field trap. Adding another lifted property is a deliberate design decision, not a convenience shortcut.