a medium level embedding of Python into C

2013.Apr.26

The Python documentation discusses a very high level embedding of Python into a C executable.

We can actually embed Python a little bit lower with the following trick.

I came up with the following examples after seeing a very interesting presentation on PyPy that made the erroneous claim that Python is not compilable given the dynamic nature of its type system.

The following is fairly trivial, but it that CPython is fundamentally expressed in C, and all of our dynamic type tricks are just as easily expressed in compiled or interpreted form. Our biggest constraint in compiling Python is that the language is built upon a runtime, and we need to include this runtime to simulate all behaviours. As a consequence, our embeddings of Python, such as the below, tend to be either trivial or arduous (and generally fairly unpleasant.)

We want a programme that displays the following behaviour.

When passed four arguments on the command line, two types and two values, construct two values of the specified type and + them.

$ prog str 20 str 13
2013
$ prog int 20 int 13
33

In Python, this would just be:

1
2
3
4
5
6
def foo(type1, val1, type2, val2):
	print getattr(__builtins__,type1)(val1) + getattr(__builtins__,type2)(val2)

if __name__ == '__main__':
	from sys import argv
	foo(*argv[1:])

This is similarly trivial in our very high-level embedding:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <stdlib.h>
#include <string.h>
#include <Python.h>

int
main(int argc, char* argv[]) {
	char *program_text = malloc(255);

	/* for simplicity's sake */
	int program_len = strlen(argv[1]) + strlen(argv[2])
	                + strlen(argv[3]) + strlen(argv[4]) + 18;
	if(program_len > 255)
		exit(EXIT_FAILURE);

	sprintf(program_text, "print %s('%s') + %s('%s')", argv[1], argv[2],
	                                                   argv[3], argv[4]);

	Py_SetProgramName(argv[0]);
	Py_Initialize();
	PyRun_SimpleString(program_text);
	Py_Finalize();
	exit(EXIT_SUCCESS);
	return 0;
}

Compile with the following, using your python-config to figure out flags:

$ gcc $(python-config --includes) -o prog prog.c $(python-config --libs)

Try it out:

$ ./prog str 20 str 13
2013
$ ./prog int 20 int 13
33

Now, let’s use Cython to try a slightly deeper embedding.

Start with a new (but familiar looking) module:

1
2
def foo(type1, val1, type2, val2):
	print getattr(__builtins__,type1)(val1) + getattr(__builtins__,type2)(val2)

Let’s use Cython to compile it to C, giving us mod.c from mod.py.

$ cython mod.py

Let’s compile this into a shared object first, and test it with our Python interpreter.

Again, we’ll use python-config to make things a bit easier.

$ gcc $(python-config --cflags) -fPIC -shared -o mod.so mod.c

Let’s check in our interpreter:

Python 2.7.3 (default, Apr 10 2013, 05:13:16) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mod

>>> # make sure that we are looking at the .so
... #  note that Python defaults to importing an .so over a same-named .py
...

>>> mod.__file__ 
'mod.so'

>>> # test our module
>>> test.foo('str', '20', 'str', '13')
2013
>>> test.foo('int', '20', 'int', '13')
33

Okay, now some don’t use this code-brand magic!

Go into the mod.c file and remove visibility modifiers for the function foo.

It will have been named something like __pyx_pf_3mod_foo.

Remove the modifiers for the prototype and the function itself, so that when we link the .so, we can see and call this function externally.

An example diff:

--- mod.c 2013-04-26 01:30:37.374492209 -0400
+++ mod.c 2013-04-26 01:30:58.794491452 -0400
@@ -506,7 +506,7 @@ static int __Pyx_InitStrings(__Pyx_Strin
 int __pyx_module_is_main_mod = 0;
 
 /* Implementation of 'mod' */
-static PyObject *__pyx_pf_3mod_foo(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_type1, PyObject *__pyx_v_val1, PyObject *__pyx_v_type2, PyObject *__pyx_v_val2); /* proto */
+PyObject *__pyx_pf_3mod_foo(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_type1, PyObject *__pyx_v_val1, PyObject *__pyx_v_type2, PyObject *__pyx_v_val2); /* proto */
 static char __pyx_k_3[] = "/for/the/love/of/guido/dont/use/this/code/mod.py";
 static char __pyx_k__foo[] = "foo";
 static char __pyx_k__mod[] = "mod";
@@ -624,7 +624,7 @@ static PyObject *__pyx_pw_3mod_1foo(PyOb
  * 
  */
 
-static PyObject *__pyx_pf_3mod_foo(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_type1, PyObject *__pyx_v_val1, PyObject *__pyx_v_type2, PyObject *__pyx_v_val2) {
+PyObject *__pyx_pf_3mod_foo(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_type1, PyObject *__pyx_v_val1, PyObject *__pyx_v_type2, PyObject *__pyx_v_val2) {
   PyObject *__pyx_r = NULL;
   __Pyx_RefNannyDeclarations
   PyObject *__pyx_t_1 = NULL;

Now, let’s create our main application which will directly call the Python function foo that has been translated to C for us by Cython.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <Python.h>
#include <stdlib.h>

/* copy the prototype, since Cython doesn't generate a header for us */
/* remove the CYTHON_UNUSED symbol */
PyObject *__pyx_pf_3mod_foo(PyObject *__pyx_self, PyObject *__pyx_v_type1, PyObject *__pyx_v_val1, PyObject *__pyx_v_type2, PyObject *__pyx_v_val2);

int main(int argc, char* argv[]) {
	Py_Initialize(); /* initialise a Python runtime */
	initmod(); /* initialise our module */

	/* call the module with arguments */
	PyObject* w = PyString_FromString(argv[1]);
	PyObject* x = PyString_FromString(argv[2]);
	PyObject* y = PyString_FromString(argv[3]);
	PyObject* z = PyString_FromString(argv[4]);
	__pyx_pf_3mod_foo(NULL, w, x, y, z);

	Py_Finalize();
	exit(EXIT_SUCCESS);
}

Let’s recompile our module and compile our main.

$ gcc $(python-config --cflags) -fPIC -shared -o mod.so mod.c
$ gcc $(python-config --includes) -o prog prog.c mod.so $(python-config --libs)

And test it:

$ # we need to make sure we can find the .so we linked against:
$ export LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH"
$ ./prog str 20 str 13
2013
$ ./prog int 20 int 13
33

There are a couple of ways we could increase the depth of this embedding, which I’ll blog about next time.

Next time: * initialising the interpreter and initialising modules manually * constructing PyObjects directly (without relying on the garbage collector) * places where we cannot avoid the runtime