A technical walkthrough of the four-stage pipeline that turns a YAML file into a typeset PDF: parse, validate, template, compile.

RenderCV takes a YAML file describing a CV and produces a typeset PDF. The interesting part is not the output. It is that the entire transformation is a pipeline of four well-chosen libraries, each replacing a hand-rolled approach that would have collapsed under its own weight. This post walks through that pipeline and explains why each stage exists.

The whole thing is one short chain:

YAML  ->  Pydantic  ->  Jinja2  ->  Typst  ->  PDF
(parse)  (validate)   (template)  (compile)

Read a YAML file, validate it into a typed model, render that model into a Typst source file, compile the Typst to PDF. Markdown support, watch mode, PNG export, and HTML output all build on top of this core. Let's take it one stage at a time.

Stage 1: Parsing YAML into a dict

A user hands RenderCV something like this:

cv:
  name: John Doe
  location: San Francisco, CA
  sections:
    education:
      - institution: MIT
        degree: PhD
        start_date: 2020-09
        end_date: 2024-05

Python has no built-in YAML parser, so you reach for a library. RenderCV uses ruamel.yaml, one of the most capable parsers available (it round-trips comments and formatting, which matters when a tool needs to write YAML back out, not just read it).

The job at this stage is narrow: turn YAML text into a plain Python dictionary.

from ruamel.yaml import YAML

yaml = YAML()
data = yaml.load(open("cv.yaml"))

# data is now an ordinary nested dict:
data["cv"]["name"]                                  # "John Doe"
data["cv"]["sections"]["education"][0]["institution"]  # "MIT"

That is the entire contribution of stage one. No semantics, no type guarantees, just structure. The parser will happily hand you back a dict where name is missing, start_date is the string "banana", or education is accidentally a mapping instead of a list. Catching that is the next stage's problem. The call lives in yaml_reader.py.

Stage 2: Validating into a typed model

A raw dict is untrustworthy. Before generating anything, RenderCV needs to know that required fields are present, types are correct, and cross-field constraints hold (a start_date should not come after an end_date).

The naive version of this is a wall of imperative checks:

if "name" not in data["cv"]:
    raise ValueError("Missing 'name' field")

if not isinstance(data["cv"]["name"], str):
    raise ValueError("name must be a string")

if "sections" in data["cv"]:
    for section_name, entries in data["cv"]["sections"].items():
        for entry in entries:
            if "start_date" in entry and "end_date" in entry:
                # parse both dates, compare them, handle "present"...
                # already hundreds of lines, and we have barely started

This grows without bound. Every field multiplies the branches, error messages drift out of sync with reality, and nested structures force nested loops. It is exactly the kind of code that rots.

Pydantic inverts the problem. You declare the shape once as typed classes, and validation falls out of the declaration:

import pydantic
from pydantic import BaseModel
from datetime import date as Date

class Education(BaseModel):
    institution: str
    start_date: Date
    end_date: Date

    @pydantic.model_validator(mode="after")
    def check_dates(self):
        if self.start_date > self.end_date:
            raise ValueError("start_date cannot be after end_date")
        return self

class Cv(BaseModel):
    name: str
    location: str | None = None
    education: list[Education]

The type annotations do the structural work: name: str is the presence-and-type check, location: str | None = None is the optional field, list[Education] is the nested-collection check. The model_validator(mode="after") handles the cross-field logic that types alone cannot express, the kind of invariant (start_date < end_date) you would otherwise scatter across imperative code.

Feeding it a dict produces a clean, typed object:

cv = Cv.model_validate(data)

cv.name                          # "John Doe"
cv.education[0].institution      # "MIT"
cv.education[0].start_date       # a real date, guaranteed valid

The real win is that this composes. RenderCV's data model is Pydantic all the way down:

class RenderCVModel(BaseModel):
    cv: Cv              # a Pydantic model
    design: Design      # a Pydantic model
    locale: Locale      # a Pydantic model
    settings: Settings  # a Pydantic model

Each field is itself a model, and each of those contains more (Cv holds EducationEntry, ExperienceEntry, PublicationEntry, and so on). Validating the top-level RenderCVModel recursively validates the entire tree in a single model_validate() call. As a bonus, Pydantic emits a JSON Schema from the same definitions, which is what gives you autocomplete and inline validation in your editor while writing the YAML. The typed model is also what makes a CV behave like resume as code: the data has a contract, so errors surface before rendering instead of inside a broken PDF.

Stage 3: Generating Typst with templates

Now there is a validated RenderCVModel, and the goal is a Typst source file:

= John Doe
San Francisco, CA

== Education
#strong[MIT] #h(1fr) 2020 – 2024
PhD in Computer Science

The obvious approach is string concatenation, and it is a trap:

typst = f"= {cv.name}\n"
if cv.location:
    typst += f"{cv.location}\n"
for section_title, entries in cv.sections.items():
    typst += f"== {section_title}\n"
    for entry in entries:
        typst += f"#strong[{entry.institution}]"
        # optional fields? spacing? line breaks?
        # multiple themes with different layouts? unmaintainable.

Mixing layout decisions into Python control flow means every whitespace tweak is a code change, and supporting several themes means duplicating the whole mess per theme. This is the precise problem templating engines were built to solve: separating the text you want to produce from the data that fills it in.

RenderCV uses Jinja2. A template is a .typ file with placeholders:

= {{ cv.name }}
{% if cv.location %}
{{ cv.location }}
{% endif %}

{% if cv.email %}
#link("mailto:{{ cv.email }}")
{% endif %}

Rendering it is two lines:

template = jinja2_env.get_template("Header.j2.typ")
output = template.render(cv=cv)

Layout lives in the templates, data lives in the model, and the two never tangle. Because the templates are just files, users can override them to customize their CV without touching a line of Python. The Typst templates live under templates/typst/ and the rendering happens in templater.py.

The Markdown detour

There is one wrinkle. Users want to write Markdown inside their YAML values:

highlights:
  - "**Published** [3 papers](https://example.com) on neural networks"
  - "Collaborated with *Professor Smith*"

Typst does not understand **bold** or [links](url). It wants #strong[bold] and #link("url")[text]. Rather than reach for fragile regular expressions, RenderCV uses the Python Markdown library to parse each value into an XML element tree, then walks that tree and emits the Typst equivalent for each node:

match element.tag:
    case "strong":
        return f"#strong[{content}]"
    case "em":
        return f"#emph[{content}]"
    case "a":
        href = element.get("href")
        return f'#link("{href}")[{content}]'

So **Published** [3 papers](https://example.com) becomes #strong[Published] #link("https://example.com")[3 papers]. Walking a parsed tree rather than pattern-matching raw text means nesting (a link inside bold inside a list item) just works, because the structure is already resolved before conversion. The logic lives in markdown_to_typst() inside markdown_parser.py.

Stage 4: Compiling Typst to PDF

The pipeline now holds a complete .typ file. The last step is turning it into a PDF, and that is the one stage RenderCV does not implement itself. Typst is a typesetting language with its own compiler, and there are Python bindings, typst-py, that expose it directly:

from typst import compile

compile("cv.typ", output="cv.pdf")

That is the whole stage. The bindings wrap the Rust compiler, so there is no shelling out to an external binary and no multi-gigabyte toolchain to install. Typst was chosen over LaTeX for concrete reasons: it compiles in milliseconds instead of seconds, ships as a single small dependency, and produces output that is typographically indistinguishable from LaTeX for document layouts. If you want the full comparison, see Typst vs LaTeX for CVs, and for the bindings specifically, generating PDFs in Python with Typst. The compile call lives in pdf_png.py.

The pipeline, end to end

When you run rendercv render cv.yaml, four libraries hand off in sequence:

Parse: ruamel.yaml reads the YAML into a Python dict.
Validate: Pydantic validates the dict into a typed RenderCVModel.
Generate: Jinja2 renders templates against the model into a Typst file (with Python Markdown handling inline formatting along the way).
Compile: typst-py compiles the Typst into a PDF.

The recurring theme is that every stage replaces a hand-written version that does not scale. Hand-rolled YAML parsing, imperative validation, string-concatenated templating, and a bundled typesetting engine would each be a maintenance sink on its own. Composed correctly, they reduce to a pipeline you can hold in your head, and the full path through it is readable in run_rendercv.py.

If you want to see the output without reading any source, build a CV at rendercv.com. And if you do want to read the source, all of it is MIT-licensed and open on GitHub at github.com/rendercv/rendercv.

How RenderCV Works: From a YAML File to a Typeset PDF