by

in

From Python-CLI to Pure Prove: Our FFI Migration Story

*How we moved from a Python bootstrap compiler to native Prove binaries without losing access to the Python ecosystem*

Presenting proof V1.1.0! This is the start of our road to a self-hosted compiler!

curl -sSf https://code.botwork.se/Botwork/prove/raw/branch/main/scripts/install.sh | sh

When we started Prove, we faced a classic bootstrapping challenge: we needed to build a compiler in a language that didn’t yet exist. Our solution was pragmatic — write the compiler in Python, ship it as a CLI tool, and target C as the first production-ready backend.

This worked. For a while.

But the Python CLI had real costs:

  • Deployment friction — Users needed Python 3.11+ installed and in their PATH
  • Cold start overhead — Every `prove check` paid the Python interpreter startup tax
  • The philosophical mismatch — A language called Prove shouldn’t need Python to prove things

We wanted native binaries. We wanted to ship prove as a single static executable. But we also needed to keep the Python ecosystem accessible — our LSP uses python-lsp-server, our NLP tooling uses spaCy and NLTK, and we didn’t want (it’s really not realistic) to reimplement these in C.

The answer: foreign function interfaces (FFI).

FFI How is its geared in prove?

The Solution: Embedding Python, Not Depending on It

Instead of calling Python as a subprocess or requiring it as a runtime dependency, we embed the Python interpreter inside the Prove binary. This gives us:

  • Single-file distribution — No Python installation required for end users
  • Fast startup — The Prove runtime initializes Python once at startup, not per-command
  • Controlled environment — We bundle exactly the Python packages we need
  • A ton of possibilities – Python is one of the most flexible languages!

1. Declare Foreign Functions in Prove

Prove’s foreign blocks let you bind C functions directly:

module Lsp
  foreign "libpython3"
    py_initialize() Unit
    py_finalize() Unit
    py_run_string(code String) Integer

The foreign "libpython3" block tells the compiler to link against libpython3 and include Python.h. The functions inside follow snake_case naming — Prove enforces this even for C bindings.

2. Wrap Raw FFI in Typed Verbs

Raw foreign calls have no type safety. We wrap them in Prove verbs:

outputs pylsp() Integer
from
    py_initialize()
    result as Integer = py_run_string(PYTHON_LSP)
    py_finalize()
    result

Now callers get compile-time checked, failable functions with contracts — almost indistinguishable from native Prove code.

3. Bundle Python Packages Automatically

The tricky part: embedding Python packages without shipping the entire Python standard library.

Our _python_bundle.py compiler module handles this:

  • Detects when a module uses foreign "libpython3" with .py comptime dependencies
  • Traces imports transitively via AST analysis
  • Filters to only third-party site-packages (skips stdlib)
  • Zips the reachable files into a C byte array
  • Injects the zip into sys.path at runtime via PyRun_SimpleString
void py_initialize(void) {
#ifdef PYTHON_HOME
  /* Tell the embedded interpreter where its standard library lives.
   * Without this, Python searches relative to the proof binary and
   * fails to find os, sys, etc. PYTHON_HOME is set via c_flags in prove.toml.
   */
  static wchar_t _home[] = L"" PYTHON_HOME;
  Py_SetPythonHome(_home);
#endif
  Py_Initialize();
  /* Write the bundled prove package to a temp zip and prepend to sys.path
   * so `import prove` works regardless of the working directory. */
  char tmp[] = "/tmp/prove_bundle_XXXXXX.zip";
  int fd = mkstemps(tmp, 4);
  if (fd >= 0) {
    write(fd, prove_bundle_zip, prove_bundle_zip_len);
    close(fd);
    PyObject *sys_mod = PyImport_ImportModule("sys");
    PyObject *path = PyObject_GetAttrString(sys_mod, "path");
    PyObject *zip_str = PyUnicode_FromString(tmp);
    PyList_Insert(path, 0, zip_str);
    Py_DECREF(zip_str);
    Py_DECREF(path);
    Py_DECREF(sys_mod);
  }
}

The result: when you build prove with the LSP module, you get a binary that includes python-lsp-server and all its dependencies — no pip install required.

Setting Python Variables from Prove

We also expose Python’s namespace for interop:

void py_set_string(Prove_String *name, Prove_String *value);
void py_set_integer(Prove_String *name, int64_t value);
void py_set_bool(Prove_String *name, bool value);

This lets Prove code inject configuration into Python’s __main__ module before running scripts — useful for passing compiler options or configuration to bundled Python tools.

Thin C Wrappers: Bridging Naming Conventions

Python’s C API uses PascalCase (Py_Initialize, PyRun_SimpleString), but Prove’s foreign blocks require snake_case names. We solve this with thin C wrappers:

// py_wrappers.c — Thin C wrappers for libpython3
void py_initialize(void) { Py_Initialize(); }
void py_finalize(void) { Py_Finalize(); }
int64_t py_run_string(Prove_String *code) {
    return (int64_t)PyRun_SimpleString(code->data);
}

The Prove source stays clean with snake_case while the C wrappers handle the naming translation. This pattern also lets us safely wrap Prove_String* pointers.

The SystemExit Trap

One gotcha: Python’s sys.exit() calls exit() directly, which crashes the Prove runtime. Our py_run_string handles this:

int64_t py_run_string(Prove_String *code) {
    int64_t rc = (int64_t)PyRun_SimpleString(code->data);
    if (rc != 0 && PyErr_Occurred()) {
        PyObject *exc_type, *exc_value, *exc_tb;
        PyErr_Fetch(&exc_type, &exc_value, &exc_tb);
        if (PyErr_GivenExceptionMatches(exc_type, PyExc_SystemExit)) {
            // Extract exit code and return without calling exit()
            int64_t exit_code = 1;
            // ... extract code attribute ...
            return exit_code;
        }
        PyErr_Restore(exc_type, exc_value, exc_tb);
        PyErr_Print();
    }
    return rc;
}

The Build Configuration

The prove.toml controls how Python gets embedded:

[build]
target = "native"
c_sources = ["src/python/py_wrappers.c”] # Add Python includes via c_flags in prove.toml

[optimize]
enabled = true
pgo = true
strip = true
gc_sections = true

With gc_sections = true, the linker strips any unused Python symbols — no bloat from unused stdlib modules.

The Architecture

Prove Runtime (C)Embedded Python (libpython3)
Memory managementPython interpreter
FFI bindingsBundled packages (.zip)
Prove stdlibNLP tools (spaCy, NLTK)
LSP server

Modules that don’t use Python compile to bare-metal C. Only the tools that need Python embed it — and only the packages those tools import get bundled.

What We Gained

Before (Python CLI)After (Pure Prove)
pip install proveDownload one binary
Python 3.11+ requiredZero runtime deps
~200ms cold start< 5ms startup
Python interpreter in PATHSelf-contained
pip upgrade for updatesReplace binary

The transition was gradual. We started by keeping Python as an optional accelerator — if Python was available, use the CLI; if not, fall back to the native binary. Now that the native binary has feature parity, we’ve made Python an implementation detail, not a requirement.

What We Learned

FFI is a Bridge, Not a Destination

Foreign blocks are escape hatches, not the idiomatic path. The goal is to wrap C/Python interop in typed Prove verbs so callers never see the raw FFI. Every py_run_string() call should be hidden behind a function with contracts, error handling, and documentation.

Use thin C wrappers to bridge naming conventions — Prove uses snake_case for everything, including foreign bindings, while Python’s C API uses PascalCase. Keep the C layer minimal and dumb; all the smarts belong in Prove.

Of course our end goal is still pure Prove, but this will be a good middle-path to move without duplication maintenance, and we do not need to rush things because of risk of getting out of sync.

Bundle Only What You Need

The naive approach is to embed everything. We learned to be surgical — transitive import tracing means a 50MB binary instead of 200MB. The rule: if a Python package isn’t transitively reachable from a comptime .py file, it doesn’t get bundled.

Linker gc_sections (-Wl,--gc-sections) handles the rest — unused Python symbols get stripped automatically.

Static vs. Dynamic Tradeoffs

Embedding libpython3 statically (with --enable-shared and -Wl,--whole-archive) gives true standalone binaries but requires matching Python version configurations. Dynamic linking is easier but reintroduces a Python dependency. We chose static for the core tools, dynamic for development builds.

The Current State

Today (V1.1.0), prove check, prove build, prove test, and prove format are all pure Prove binaries. The Python CLI (prove-py) still exists as a development bootstrap — you need Python to build the compiler initially. But end users download a single native binary.

Our LSP, NLP store setup, and advanced tooling use embedded Python via FFI. They’re indistinguishable from native tools from the user’s perspective.

What’s Next

The goal is to eliminate the bootstrap Python dependency entirely. We’re working on a self-hosting Prove compiler — a Prove compiler written in Prove, compiled to C, then to a native binary. When that lands, prove will truly compile itself.

Until then, FFI is our bridge: from the pragmatic Python implementation to the principled Prove runtime, without abandoning the ecosystem we’ve built.

Could we add you to our mailing list!

We don’t spam! Read our privacy policy for more info.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

The Botwork Security Logo
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Please notice that this is only for https://botwork.se not our tolling systems.