Extending goldmark renderer for advanced syntax highlighting

Since starting this project a few months ago, this site's only purpose was to help me learn Go more deeply by implementing solutions to actual real-world problems with zero dependencies¹. Dealing with dependencies at work on a daily basis in literally every JavaScript-based project is enough for me to see great value in having as few of them as possible because it greatly reduces vulnerabilities and overhead when updating. Especially if a project is not monitored regularly.

Considering the fact that most content for this page comes from Markdown files, the only real requirement was to find a good parser to do the job. I never wrote one on my own and didn't really have a good reason to do so, so my instinct was to use the most suitable Go package for it. After some quick research, I decided to pick goldmark, which seemed to have everything I would probably ever need, and in case I missed something, it can be easily extended. It also follows CommonMark compliance, so the output is always predictable.

Basic configuration for parsing Markdown

During the very first iteration, I followed the official configuration to keep everything dead simple:

1var html strings.Builder
2if err := goldmark.Convert([]byte(markdown), &html); err != nil {
3	return "", err
4}

With the above code, I achieved the two most vital things for the site: parsing and rendering Markdown articles. This was incredibly easy. Of course, for a productive version, I needed more than that. My initial requirements were based on what I already saw online in a similar context:

output simple and semantically correct HTML
read metadata from article's front matter
syntax highlighting for code snippets
table of contents
headings as anchor links
footnotes

With the basic configuration, I checked only the first point. The rest was missing. I also excluded front matter from parsing by goldmark as it would involve adding another dependency to extend its basic functionality, and for now, I wasn't ready for that. At first, I decided to embed it as JSON and take advantage of Go's built-in support for decoding this format. With a little bit of extra code, I had a working parser for the front matter without introducing any new dependency, so I was happy with it, although writing JSON by hand is a bit uncomfortable.

The next logical step was to actually use available configuration options that goldmark gives us for free. For this we need to adjust the code a bit:

 1md := goldmark.New(
 2	goldmark.WithExtensions(
 3		extension.GFM,
 4		extension.Footnote,
 5	),
 6)
 7
 8var html strings.Builder
 9if err := md.Convert([]byte(markdown), &html); err != nil {
10	return "", err
11}

Newly enabled extensions give us a few things out of the box: footnotes and a bunch of goodies specific to GitHub-flavored Markdown like tables, strikethrough, auto links, and task lists. It ain't much, but it already adds convenience when writing and rendering Markdown.

Spice things up by enabling syntax highlighting

Syntax highlighting is the heart and soul of every programming blog. To enable this feature in goldmark, its author has written a separate plugin called goldmark-highlighting. Under the hood, it takes advantage of Chroma, which has even more options to extend basic configuration. But for starters, we use the most straightforward approach:

 1md := goldmark.New(
 2	goldmark.WithExtensions(
 3		extension.GFM,
 4		extension.Footnote,
 5		highlighting.NewHighlighting(
 6			highlighting.WithStyle("catppuccin-mocha"),
 7			highlighting.WithFormatOptions(
 8				chromahtml.WithLineNumbers(true),
 9			),
10		),
11	),
12)
13
14// The rest of the code stay the same
15// ...

With all that, I could call it a day and forget about Markdown configuration for good as the rendered HTML includes everything for my use case and the code required to achieve this is both simple and concise.

Changing code block theme based on color scheme

The very last step to enable a perfect experience was to display everything according to the user's color scheme. This site uses, for now, a simplistic approach for setting light or dark themes by taking advantage of the prefers-color-scheme media query. It's simple, works without the extra overhead of changing, storing, and loading the selected mode with JavaScript, and it also completely mitigates the possibility of having a "flash of incorrect theme" ( FOIT).

Using this method, a CSS file with styles for both dark and light chroma themes is needed. Then one of them needs to be enclosed by the above media query. It's not very complicated. The idea is simple, but the road to actually create such a CSS file is a tiny bit more complex.

To retrieve the CSS of a given theme, chroma exports a method to read its styles: styles.Get("catppuccin-mocha"), which then can be saved into a CSS file. When combined with go generate the entire process of setting a theme and generating a dedicated stylesheet becomes quite trivial but effective task.

Building a custom renderer

The default implementation of goldmark-highlighting has one big limitation: it only lets you specify lines to highlight once, while setting up the constructor. Not very helpful for a programming blog, where just like in this article I could have a bunch of code blocks with multiple highlighted lines to indicate changes between snippets. On top of that, when using goldmark-highlighting, the information about the language used in a given code block is not rendered in the HTML anymore, so there is no possibility to tell the end user what language they are looking at. Perhaps it's not a big deal, but I like to have this kind of detail. Especially if the default configuration of goldmark includes it in the output. The only option to retain this information is by having a custom renderer.

Conveniently Hugo, the static site generator, is written in Go, uses goldmark to parse Markdown files, and also lets users embed config inside those files to change the way code blocks are displayed. This is something goldmark deliberately forbids by default, and that's why I took a look into their code base for inspiration and a roadmap because I struggled a bit with understanding the documentation of chroma and goldmark regarding how a custom renderer exactly works and what data it needs.

After digging a little bit, I could find all relevant information regarding what parts, how, and where to connect things together:

Chroma's quick package explains how and what data should be passed into Chroma,
goldmark's HTML renderer shows how a custom renderer shapes output structure
Hugo's inline config tells how passing additional information to a parser works

With all that info I could proceed to write my own custom renderer. Now let's break everything down into detail to understand how it exactly works.

Chroma's highlighting

From the quick package we know the following function:

 1func Highlight(w io.Writer, source, lexer, formatter, style string) error {
 2	// Determine lexer.
 3	l := lexers.Get(lexer)
 4	if l == nil {
 5		l = lexers.Analyse(source)
 6	}
 7	if l == nil {
 8		l = lexers.Fallback
 9	}
10	l = chroma.Coalesce(l)
11	
12	// Determine formatter.
13	f := formatters.Get(formatter)
14	if f == nil {
15		f = formatters.Fallback
16	}
17	
18	// Determine style.
19	s := styles.Get(style)
20	if s == nil {
21		s = styles.Fallback
22	}
23	
24	it, err := l.Tokenise(nil, source)
25	if err != nil {
26		return err
27	}
28	return f.Format(w, s, it)
29}

The workflow looks like this: source code -> lexer -> token iterator -> formatter -> HTML

The first step is to setup the lexer: lexer.Get returns a pointer to Chroma's Lexer by name, alias, or file extension. In case a file extension wasn't specified, chroma will try to guess it by analyzing the source code. If that's impossible, a fallback will be set instead. Next, the lexer.Coalesce function is called to merge adjacent tokens of identical type, so the result has fewer tokens, which can speed up parsing and limit redundant elements from the output HTML. So instead of:

txt

1Text(" ")
2Text(" ")
3Text(" ")

we end up with only Text(" "). It also always returns a valid lexer, safe to use. At the end, calling lexer.Tokenise will transform raw text into a lazy stream of typed lexical tokens, ready for formatting.

Additionally calling styles.Get will set a given theme for the code block. Nothing magical here. The last step is to pass all this into the f.Format function, which is a render function turning every token into proper HTML written into a buffer. That's the core functionality of the syntax highlighting offered by Chroma.

With that in mind let's start to write our custom renderer:

 1type codeBlockRenderer struct {
 2	baseOptions []chromahtml.Option
 3}
 4
 5func newCodeBlockRenderer() *codeBlockRenderer {
 6	options := []chromahtml.Option{
 7		chromahtml.WithLineNumbers(true),
 8		chromahtml.WithClasses(true),
 9	}
10
11	return &codeBlockRenderer{
12		baseOptions: options,
13	}
14}
15
16func (r *codeBlockRenderer) render(w util.BufWriter, lang, code string) error {
17	lexer := lexers.Get(lang)
18	if lexer == nil {
19		lexer = lexers.Fallback
20	}
21	
22	lexer = chroma.Coalesce(lexer)
23	it, err := lexer.Tokenise(nil, code)
24	formatter := chromahtml.New(opts...)
25	
26	return formatter.Format(w, r.style, it)
27}

Goldmark renderer

Now let's jump into goldmark's implementation of the default renderer for a code block:

 1func (r *Renderer) renderFencedCodeBlock(
 2	w util.BufWriter, source []byte, node ast.Node, entering bool) (ast.WalkStatus, error) {
 3	n := node.(*ast.FencedCodeBlock)
 4	if entering {
 5		_, _ = w.WriteString("<pre><code")
 6		language := n.Language(source)
 7		if language != nil {
 8			_, _ = w.WriteString(" class=\"language-")
 9			r.Writer.Write(w, language)
10			_, _ = w.WriteString("\"")
11		}
12		_ = w.WriteByte('>')
13		r.writeLines(w, source, n)
14		} else {
15			_, _ = w.WriteString("</code></pre>\n")
16	}
17	return ast.WalkContinue, nil
18}

Over here a few things are happening. For us, the important bits are the ones where and what we write into a buffer. Based on the entering flag, signaling reaching a new node in the tree, we can control our markup. The rest of the input is handled by the writeLines function, which takes care of extracting text content from the source. In this case, the source represents the entire Markdown document with front matter and any other additional data, whereas the content extracted with the help of n.Block().Lines(source) is specifically only the text written between ```.

In the custom renderer, we need to take care of everything that a default renderer was already doing, because Goldmark doesn't compose multiple renderers; rather, it replaces them. Thus using a custom one means overriding or skipping the default one. It's an important thing to keep in mind.

With the above knowledge, we can further refactor the skeleton of the custom renderer to actually walk the node tree and write content into a buffer. First, we need to implement RegisterFuncs required by goldmark:

1func (r *codeBlockRenderer) RegisterFuncs(reg renderer.NodeRendererFuncRegisterer) {
2	reg.Register(ast.KindFencedCodeBlock, r.render)
3}

Next rewrite the render method, which must implement the NodeRendererFunc interface:

 1func (r *codeBlockRenderer) render(w util.BufWriter,
 2	src []byte,
 3	node ast.Node,
 4	entering bool,
 5) (ast.WalkStatus, error) {
 6	block := node.(*ast.FencedCodeBlock)
 7	lang := string(block.Language(src))
 8	code := string(block.Lines().Value(src))
 9	
10	// Lexer setup from earlier
11	// ...
12	it, err := lexer.Tokenise(nil, code)
13	if err != nil {
14		return ast.WalkStop, err
15	}
16
17	_, _ = w.WriteString(`<div class="code-block">`)
18    if lang != "" {
19    	_, _ = fmt.Fprintf(w, `<span class="code-lang">%s</span>`, lang)
20    }
21
22    formatter := chromahtml.New(r.baseOptions...)
23    // Needed for the formatter, actual styles come from CSS.
24    style := styles.Fallback
25    err = formatter.Format(w, style, it)
26    if err != nil {
27    	return ast.WalkStop, err
28    }
29
30    _, _ = w.WriteString(`</div>`)
31
32	return ast.WalkContinue, nil
33}

With all the changes we extend the default formatter with a few new things. Personally, I think the most important one was to display the file extension used inside a specific code block. To keep everything free of JavaScript, I decided to render it as a separate HTML element, which I can freely style with CSS. The last missing piece is dynamic line highlighting based on optional inline config placed inside a given code block.

To implement that, let's start by adding the following code handling this particular task to the render method:

 1func (r *codeBlockRenderer) render(w util.BufWriter,
 2	src []byte,
 3	node ast.Node,
 4	entering bool,
 5) (ast.WalkStatus, error) {
 6	code := string(block.Lines().Value(src))
 7	// ...
 8	
 9	info := block.Info.Segment.Value(src) // note: it'll panic if a file extension is missing
10	opts := slices.Clone(r.baseOptions)
11	
12	options := newBlockOptions()
13	if ok := options.parseInfo(info); ok {
14		opts = append(opts, chromahtml.HighlightLines(options.highlightedLines))
15	}
16	
17	lexer := lexers.Get(lang)
18	// ...
19	
20	formatter := chromahtml.New(opts...)
21	// ...

With this piece of code, we are now telling the renderer to read content from an info line (the very first line, where the file extension is being defined ```go ...) of a code block. Then inside the parserInfo function, all the relevant data should be parsed accordingly and added to the existing options we're passing into Chroma. Below you can see a detailed implementation of this functionality covering the usage of hl_lines for now:

 1const (
 2	mdBlockHighlightedLines = "hl_lines"
 3)
 4
 5type codeBlockOptions struct {
 6	highlightedLines [][2]int // this type is required by Chroma
 7}
 8
 9func newBlockOptions() codeBlockOptions {
10	return codeBlockOptions{
11		highlightedLines: nil,
12	}
13}
14
15func (cbo *codeBlockOptions) parseInfo(options []byte) bool {
16	start := slices.Index(options, '{')
17	end := slices.Index(options, '}')
18	if start == -1 || end == -1 {
19		return false
20	}
21
22	content := options[start+1 : end]
23	for line := range strings.FieldsSeq(string(content)) {
24		key, value, _ := strings.Cut(line, "=")
25		if found := cbo.applyOption(key, value); !found {
26			fmt.Printf("invalid code block info: %q", key)
27		}
28	}
29
30	return true
31}
32
33func (cbo *codeBlockOptions) applyOption(key, value string) bool {
34	switch key {
35	case mdBlockHighlightedLines:
36		cbo.highlightedLines = append(cbo.highlightedLines, addLineHighlighting(value)...)
37		return true
38	default:
39		return false
40	}
41}
42
43func addLineHighlighting(ranges string) [][2]int {
44	var hl [][2]int
45	for n := range strings.SplitSeq(ranges, ",") {
46		if len(n) == 1 {
47			if lineNo, err := strconv.Atoi(n); err == nil {
48				hl = append(hl, [2]int{lineNo, lineNo})
49				continue
50			}
51		}
52	
53		from, to, found := strings.Cut(n, "-")
54		if !found {
55			continue
56		}
57	
58		rg := [2]int{}
59		if lineNo, err := strconv.Atoi(from); err == nil {
60			rg[0] = lineNo
61		}
62		if lineNo, err := strconv.Atoi(to); err == nil {
63			rg[1] = lineNo
64		}
65		
66		hl = append(hl, rg)
67	}
68
69	return hl
70}

This inline config parser is based on what Hugo already does. For now, it's simple and handles only one option, but the way it's built offers an easy way to extend it, so that if I decide in the future that I would like to display a file name or any other particular piece of information, I will be able to do it without much work. For example ```go {hl_lines=1,3-5} filename=main.go}.

The above implementation extends the functionality so that we actually implement a goldmark renderer. After that, we're good to go and can pass it into the goldmark config:

 1goldmark.New(
 2	// Other options
 3	// ...
 4	goldmark.WithRendererOptions(
 5		renderer.WithNodeRenderers(
 6			util.Prioritized(
 7				newCodeBlockRenderer(),
 8				// Priority of the new renderer. It's important because a renderer with the highest priority will be used.
 9				// If some plugin uses one internally, it could possibly overwrite the one defined here.
10				200,
11			),
12		),
13	),
14)

After all that effort we can finally use the custom renderer with arbitrary metadata such as what lines should be highlighted. Generally, this approach opens a good range of opportunities for how to handle and enhance a simple Markdown document. Additionally, I think for most Markdown parsers supporting custom renderers this approach should work just fine after some adjustment. At the same time, I think it should be used mindfully as all that metadata we put inside the document is arbitrary and has little to no meaning without our custom renderer. It also is rather cryptic to people not knowing about our internal implementation, so in case there's a marketing team responsible for writing such documents, documentation explaining available options would be rather necessary.

Writing a custom renderer was a great lesson for me to dive deeper into an unfamiliar code base and retrieve the information I needed to move forward. At first, I was a bit intimidated as the documentation of both chroma and especially goldmark was rather sparse regarding extending basic functionality and many functions or methods do have any somewhere in their signatures, which is not helping in understanding the underlying mechanisms. But after discovering and analyzing code snippets mentioned above, I could start shaping my own renderer, tailored directly to my needs. And the final effect is really good.

A Markdown parser was planned as the only dependency required to create this site. In the future it might change, but for now, I would like to keep the project simple. ↩︎