The goal was not just "parse markdown"
Markdown parsers are a perfect playground for performance work because the input looks simple until it does not.
You have headings, paragraphs, lists, blockquotes, tables, code fences, links, images, inline emphasis, raw HTML, escaping rules, nesting, continuations, and all the tiny edge cases that make CommonMark interesting.
With Beskar.Markdown on GitHub, I wanted a parser that stays small in memory without becoming awkward to use:
using Beskar.Markdown; var html = BeMarkdown.ToHtml(markdown);
That API is intentionally boring. The interesting part is behind it: keep the public surface tiny, then make the hot path careful.
You can install it from NuGet:
dotnet add package Beskar.Markdown
The central rule: do not copy text unless you must
The most important design choice is that parsed nodes do not need to own their text.
Instead of every node carrying a new string, a node can point back into the original input:
public readonly record struct TextSpan(int Start, int Length) { public ReadOnlySpan<char> Slice(ReadOnlySpan<char> source) { return source.Slice(Start, Length); } }
That sounds small, but it changes the whole memory model.
A heading like this:
## Low allocation markdowndoes not need a new string for Low allocation markdown. The parser can store:
new MarkdownNode { Type = NodeType.Heading, TextSpan = new TextSpan(start: 3, length: 23) };
The renderer later receives the original text and slices only what it needs:
var text = node.TextSpan.Slice(markdown);
writer.WriteEscaped(text);That is the core trick: parsing creates structure, not copies.
Compact nodes beat object graphs
A common parser shape is an object tree:
abstract class MarkdownBlock { public List<MarkdownBlock> Children { get; } = []; }
That is pleasant to model, but it is expensive when a document has many small nodes. Every node becomes an object. Every child list becomes another allocation. Traversal follows references all over the heap.
Beskar.Markdown uses a flatter shape: nodes are values in contiguous buffers, and relationships are represented with indexes.
The shape is closer to this:
public struct MarkdownNode { public NodeType Type; public TextSpan TextSpan; public int FirstChildIndex; public int LastChildIndex; public int NextSiblingIndex; }
Instead of this:
node.Children.Add(child);
the parser can link nodes by index:
parent.FirstChildIndex = childIndex; parent.LastChildIndex = childIndex;
That keeps the parse tree compact and cache-friendly. It also makes ownership simple: the buffer owns the nodes, and the nodes point into the input.
Use spans for the scanner
The parser spends most of its time asking small questions:
- is this line blank?
- how many spaces are in front?
- does this line start a fence?
- is this character a trigger for inline parsing?
- where does this delimiter run end?
Those questions should not allocate.
The scanner can stay in ReadOnlySpan<char>:
public ref struct LineState { public ReadOnlySpan<char> RawLine { get; private set; } public int GlobalOffset { get; private set; } public int LeadingSpaces { get; private set; } public bool StartsWith(ReadOnlySpan<char> value) { return RawLine.TrimStart().StartsWith(value, StringComparison.Ordinal); } }
That lets block parsers make decisions without creating temporary strings:
if (line.RawLine.StartsWith("```", StringComparison.Ordinal)) { return ParseFence(ref line, ref nodes); }
The same idea works for inline parsing. Trigger characters keep the parser from running every parser against every character.
public interface IInlineParser { char TriggerChar { get; } bool TryMatch( ref InlineState state, ref BufferWriter<MarkdownNode> writer, ParserOptions options); }
If the current character is not a trigger, the parser can keep walking.
Make buffers reusable, not magical
Low allocation code usually comes down to boring ownership.
If the parser needs a temporary list of nodes, it should be clear who owns it and when it is released. That is where buffer writers and pools are useful.
The shape is:
var writer = new BufferWriter<MarkdownNode>(initialCapacity: 256); try { ParseBlocks(markdown, ref writer); Render(markdown, writer.WrittenSpan, ref html); } finally { writer.Dispose(); }
The important detail is not the exact implementation. The important detail is that temporary memory has a lifetime.
When memory has a lifetime, you can reason about it:
- stack memory for tiny bounded work
- pooled arrays for larger temporary buffers
- spans to view data without owning it
- final strings only when the API boundary needs them
Rendering should stream, not assemble fragments
Rendering can accidentally undo all the parser work.
This is the slow shape:
var html = ""; foreach (var node in nodes) { html += RenderNode(node); }
Every += risks new strings. Every RenderNode result is another fragment to allocate.
A better shape is to write directly to a writer:
public interface INodeRenderer { void Render<TData>( MarkdownContext<TData> context, ReadOnlySpan<char> rawText, ref TextWriterIndentSlim writer, in MarkdownNode current, ReadOnlySpan<MarkdownNode> nodes, RenderOptions options); }
Now a paragraph renderer can be direct:
writer.Write("<p>"); current.RenderChildren(context, rawText, nodes, ref writer, options); writer.Write("</p>");
No intermediate paragraph string is required.
Escape only at the output boundary
Markdown text can contain characters that are meaningful in HTML:
5 < 10 and 10 > 5
The parser should not eagerly allocate escaped copies. It should preserve spans and let the renderer escape when writing:
public static void WriteEscaped(ref TextWriterIndentSlim writer, ReadOnlySpan<char> text) { foreach (var character in text) { switch (character) { case '<': writer.Write("<"); break; case '>': writer.Write(">"); break; case '&': writer.Write("&"); break; case '"': writer.Write("""); break; default: writer.Write(character); break; } } }
The real implementation can be more optimized than this example, but the principle is the same: do not transform text early just because it might need escaping later.
Extensions without blowing up the core
A parser gets messy when every feature becomes hard-coded.
Beskar.Markdown supports extension points so custom inline and block behavior can be plugged in without turning the core parser into a giant conditional.
The pattern looks like this:
var options = MarkdownOptionBuilder.Create() .WithExtension(new CalloutBlockExtension()) .Build(); var html = BeMarkdown.ToHtml(markdown, options);
An extension owns its parser and renderer:
public sealed class CalloutBlockExtension : BaseBlockExtension { private const int TargetType = BeMarkdown.BuiltInNodeTypeValueOffset + 20; public CalloutBlockExtension() { Parsers = [new CalloutBlockParser(TargetType)]; Renderers = [new HtmlCalloutRenderer(TargetType)]; } }
That keeps the built-in pipeline focused while still allowing project-specific markdown.
Contextual rendering is the quiet feature
Sometimes the renderer needs data that is not in the markdown.
Maybe you want internal links to resolve against a site map. Maybe image paths should go through an asset pipeline. Maybe code blocks need project-specific framing.
That is where contextual rendering is useful:
public sealed record DocsRenderContext( string BaseUrl, IReadOnlyDictionary<string, string> AssetMap);
Then renderers can read context without global state:
var options = MarkdownOptionBuilder.Create().Build(); var context = new DocsRenderContext( BaseUrl: "https://marvindrude.com", AssetMap: assets); var result = BeMarkdown.ParseContextual(markdown, options, context); var html = result.Html;
The performance angle is subtle: passing context through the pipeline avoids random service lookups, static state, and string post-processing after rendering.
The benchmark result that matters
Benchmarks are not identity. They are a measurement of one workload on one machine with one configuration.
Still, they are useful because they show the shape of a design.
The README benchmark table shows Beskar.Markdown allocating far less than several common alternatives in the tested cases. For example, the "Small" case lists Beskar.Markdown at 504 B allocated, while the same table lists Markdig at 1144 B, CommonMark.Net at 11488 B, and MarkdownSharp at 2752 B.
The larger cases show the same point more clearly: the parse and render pipeline is not allocation-free, but the amount of memory stays much more controlled.
That is what I cared about most: not a heroic single number, but a pipeline where memory growth stays explainable.
Why it is still fast
Low memory and speed are related, but they are not the same.
Beskar.Markdown stays fast because the parser tries to keep the CPU doing useful work:
- scan spans instead of building temporary strings
- use trigger characters for inline parsers
- keep nodes compact and contiguous
- avoid object-heavy trees
- write HTML directly to a writer
- make extensions explicit instead of reflection-heavy
- delay escaping and transformation until output
The system is not fast because of one trick. It is fast because many small sources of waste are removed from the hot path.
The honest tradeoffs
This kind of implementation has tradeoffs.
An object tree is often easier to debug. A compact node buffer takes more discipline. Index-based relationships are less forgiving than Children.Add. Span-heavy code is stricter about lifetimes. Pooling makes ownership more important.
That means the code has to be written with guardrails:
- tests for CommonMark edge cases
- fuzzing for parser resilience
- clear extension contracts
- simple public API
- careful security documentation
Also, Beskar.Markdown does not sanitize HTML by default. If the markdown comes from untrusted users, sanitize the rendered HTML before showing it in a browser:
var rawHtml = BeMarkdown.ToHtml(userInput); var safeHtml = sanitizer.Sanitize(rawHtml);
That is not optional for user-generated content. A markdown renderer and an HTML sanitizer are different tools.
What I like about the result
The best part is that the API does not expose the complexity:
var html = BeMarkdown.ToHtml(markdown);But under that line, the implementation is doing the work I care about:
- borrow text through spans
- store structure in compact nodes
- reuse temporary buffers
- render directly
- keep extension points explicit
- measure instead of guessing
That is the style of performance work I like most. The user gets a simple API. The runtime gets less garbage. The implementation stays close enough to the metal to be interesting, but not so clever that it becomes unusable.
If you want to inspect the code or try it in a project: