Portfolio

beskar-markdown

The Journey to Native Markdown: Frontmatter, Slugs, and Code Interceptors

A deep-dive article detailing the transition from expensive string post-processing to zero-allocation native features in Beskar.Markdown.

May 19, 2026 5 min read Marvin Drude
C#.NETMarkdownPerformanceArchitectureAPI Design

The limits of "just parsing"

When I first wrote Beskar.Markdown, the focus was strictly on performance and low allocations. By storing document hierarchies as indices in contiguous value buffers and avoiding string allocations via ReadOnlySpan<char>, the parser achieved remarkable performance, parsing and rendering markdown documents with a fraction of the memory footprint of older common-place libraries.

But when I went to use it in a real production app (this portfolio website) I hit a harsh reality.

A modern publishing platform does not just turn a markdown body into HTML. It requires:

  1. Frontmatter extraction: Separating YAML headers (titles, publication dates, tags, etc.) from the content body.
  2. Table of Contents generation: Generating a list of headings to build interactive sidebar navigation.
  3. Sluggable Headers: Injecting unique id attributes into HTML header tags (e.g., <h2 id="the-limits-of-just-parsing">...) to support deep linking.
  4. Syntax Highlighting: Finding code blocks, identifying the language, and formatting them with highlighted code and line numbers.

In my initial implementation, I solved this by writing a series of post-processing helpers inside my website's BlogArticleService. I had regexes to split the frontmatter, regexes to scan for headings, HTML parsing scripts to inject id attributes, and another round of regex search-and-replace to intercept code blocks and route them to ColorCode.

It worked, but it was tragic.

Every single regex match allocated new strings. The HTML tag injections forced the GC to clean up hundreds of kilobytes of intermediate fragments. In other words, all the zero-allocation, ultra-high-performance work of the parser was completely negated by the web app's post-processing pipeline.

I knew there had to be a better way. The features needed to become native.


May 18, 2026: The Frontmatter and Slugs Pipeline

I set to work in a branch called feature/frontmatter. My goal was to move both metadata extraction and Table of Contents (slug) generation directly into the parser core, allowing it to occur in a single pass before rendering.

1. Zero-Allocation YAML Frontmatter

Instead of splitting strings outside the parser, I updated MarkdownParser to scan for YAML markers (---) at the very beginning of the document.

By detecting this block during the first block-scanning pass, the parser can extract the raw keys and values directly without copying the document body. To enable it, you simply register the option on the builder:

C#
var options = MarkdownOptionBuilder.Create()
    .WithFrontMatter()
    .Build();

var result = BeMarkdown.Parse(markdown, options);
var title = result.Context.FrontMatter["title"];

Because this is done at the parser level, the YAML block is cleanly stripped from the rendering pipeline, meaning you don't get frontmatter text accidentally rendered in your HTML body.

2. Sluggable Headers and Anchor Uniqueing

Building a robust sluggable header system is surprisingly tricky. A slugification routine has to:

  • Strip out special symbols, punctuation, and emojis.
  • Replace spaces and consecutive special characters with a single, clean hyphen (-).
  • Convert characters to lowercase.
  • Ensure HTML standards compliance: What happens if the writer uses the header ## Summary three times in the same article? Eagerly creating <h2 id="summary"> three times results in invalid, non-unique HTML.

I solved this by tracking generated slugs within the MarkdownContext. When a duplicate slug is detected, the renderer automatically appends an incrementing suffix (e.g., summary-1, summary-2), ensuring standard-compliant uniqueness.

Furthermore, I made this data queryable directly from the result:

C#
var options = MarkdownOptionBuilder.Create()
    .WithSluggableHeaders()
    .Build();

var result = BeMarkdown.Parse(markdown, options);

foreach (var header in result.Context.Headers)
{
    Console.WriteLine($"{header.Level}: {header.PlainText} -> #{header.Slug}");
}

The system automatically generates the <h2 id="my-header"> HTML attributes while exposing the structured Headers collection, allowing me to build my site's Table of Contents sidebar with zero intermediate string lookups or HTML parsing libraries!


May 19, 2026: Zero-Allocation Code Block Interception

With frontmatter and headers running natively, the last remaining memory hog was the syntax highlighter.

Normally, a syntax highlighter requires post-processing the final HTML: parsing out the <pre><code> blocks, extracting the text, highlighting it, and stitching the HTML back together. This is extremely slow and memory-intensive.

To solve this, I designed a brand-new interface in feature/fixes: ICodeBlockRenderer.

C#
public interface ICodeBlockRenderer
{
   public bool TryRender<TData>(
      MarkdownContext<TData> context,
      ref TextWriterIndentSlim writer,
      ReadOnlySpan<char> code,
      ReadOnlySpan<char> language);
}

Instead of building HTML and parsing it later, the parser renderer now delegates directly to this interface during the primary rendering loop.

When a code block (fenced or indented) is encountered, the custom renderer is invoked with the raw code span and the language code span. If the renderer returns true, the default block rendering is bypassed entirely, and the custom output is written directly to the high-performance TextWriterIndentSlim output buffer.

Here is how I implemented it on my website to stream syntax highlighting (via ColorCode) and line-number listings:

C#
private sealed class BlogCodeBlockRenderer : ICodeBlockRenderer
{
   public bool TryRender<TData>(
      MarkdownContext<TData> context,
      ref TextWriterIndentSlim writer,
      ReadOnlySpan<char> codeSpan,
      ReadOnlySpan<char> languageSpan)
   {
      var languageId = languageSpan.ToString();
      var code = codeSpan.TrimEnd().ToString();
      
      // Setup the custom blog code container frame
      writer.Write("<figure class=\"blog-code-frame\" data-language=\"");
      writer.WriteHtmlDecodedAndEncoded(languageId);
      writer.Write("\">");
      
      // Write custom line numbers and highlight the body
      writer.Write(BuildLineNumbers(code));
      writer.Write("<div class=\"blog-code-scroll\">");
      
      var highlightedHtml = HighlightCode(code, languageId);
      writer.Write(highlightedHtml);
      
      writer.Write("</div></figure>");
      return true;
   }
}

By hooking this up via WithCodeBlockRenderer(new BlogCodeBlockRenderer()), the web application streams custom code layouts and highlighted code blocks directly into the output stream during the initial parse pass. No intermediate strings, no post-processing passes.


The Payoff

Integrating these features into the portfolio website felt like magic.

Because Beskar.Markdown now handles frontmatter, sluggable headers, Table of Contents metadata, and code block interception natively during parsing, I was able to delete over 350 lines of complex, regex-heavy post-processing code from BlogArticleService.cs.

The service has shrunk from a sprawling, allocation-prone mess into a concise, focused implementation:

C#
private BlogArticle LoadArticle(string filePath)
{
   var source = File.ReadAllText(filePath);
   var result = BeMarkdown.Parse(source, MarkdownOpts);
   
   var metadata = ParseMetadata(result.Context.FrontMatter);
   var toc = result.Context.Headers
      .Select(h => new BlogTocItem(h.Slug.ToString(), h.PlainText.ToString(), h.Level))
      .ToList();

   return new BlogArticle(metadata, toc, result.Html, ...);
}

This journey taught me an important lesson in performance engineering. You can optimize a parser to be as fast as possible, but if your API forces consumers to perform expensive string operations and post-processing on the output, those optimizations are wasted.

By designing rich, extensible native options like frontmatter, slug uniqueing, and code interception, Beskar.Markdown v1.0.3 keeps both the parser core and the consumer application running as fast and close to the metal as possible.

An unhandled error has occurred. Reload 🗙

Rejoining the server...

Rejoin failed... trying again in seconds.

Failed to rejoin.
Please retry or reload the page.

The session has been paused by the server.

Failed to resume the session.
Please retry or reload the page.