.. Copyright 2015 David Malcolm Copyright 2015 Red Hat, Inc. This is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Implementing a "brainf" compiler -------------------------------- In this example we use libgccjit to construct a compiler for an esoteric programming language that we shall refer to as "brainf". The compiler can run the generated code in-process (JIT compilation), or write the generated code as a machine code executable (classic ahead-of-time compilation). The "brainf" language ********************* brainf scripts operate on an array of bytes, with a notional data pointer within the array. brainf is hard for humans to read, but it's trivial to write a parser for it, as there is no lexing; just a stream of bytes. The operations are: ====================== ============================= Character Meaning ====================== ============================= ``>`` ``idx += 1`` ``<`` ``idx -= 1`` ``+`` ``data[idx] += 1`` ``-`` ``data[idx] -= 1`` ``.`` ``output (data[idx])`` ``,`` ``data[idx] = input ()`` ``[`` loop until ``data[idx] == 0`` ``]`` end of loop Anything else ignored ====================== ============================= Unlike the previous example, we'll implement an ahead-of-time compiler, which reads ``.bf`` scripts and outputs executables (though it would be trivial to have it run them JIT-compiled in-process). Here's what a simple ``.bf`` script looks like: .. literalinclude:: ../../examples/emit-alphabet.bf :lines: 1- .. note:: This example makes use of whitespace and comments for legibility, but could have been written as:: ++++++++++++++++++++++++++ >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++< [>.+<-] It's not a particularly useful language, except for providing compiler-writers with a test case that's easy to parse. Converting a brainf script to libgccjit IR ****************************************** We write simple code to populate a :py:class:`gccjit.Context`. .. literalinclude:: ../../examples/bf.py :start-after: import gccjit :end-before: def compile_to_file(self, output_path): :language: python Compiling a context to a file ***************************** In previous examples, we compiled and ran the generated machine code in-process. We can do that: .. literalinclude:: ../../examples/bf.py :start-after: # Running the generated code in-process :end-before: # Entrypoint :language: python but this time we'll also provide a way to compile the context directly to an executable, using :py:meth:`gccjit.Context.compile_to_file`. To do so, we need to export a ``main`` function. A helper function for doing so is provided by the JIT API: .. literalinclude:: ../../gccjit/__init__.py :start-after: # Make it easy to make a "main" function: :language: python which we can use (as ``gccjit.make_main``) to compile the function to an executable: .. literalinclude:: ../../examples/bf.py :start-after: # Compiling to an executable :end-before: # Running the generated code in-process :language: python Finally, here's the top-level of the program: .. literalinclude:: ../../examples/bf.py :start-after: # Entrypoint :language: python The overall script `examples/bf.py` is thus a bf-to-machine-code compiler, which we can use to compile .bf files, either to run in-process, .. code-block:: console $ PYTHONPATH=. python examples/bf.py \ emit-alphabet.bf ABCDEFGHIJKLMNOPQRSTUVWXYZ or to compile into machine code executables: .. code-block:: console $ PYTHONPATH=. python examples/bf.py \ emit-alphabet.bf \ -o a.out which we can run independently: .. code-block:: console $ ./a.out ABCDEFGHIJKLMNOPQRSTUVWXYZ Success! We can also inspect the generated executable using standard tools: .. code-block:: console $ objdump -d a.out |less which shows that libgccjit has managed to optimize the function somewhat (for example, the runs of 26 and 65 increment operations have become integer constants 0x1a and 0x41): .. code-block:: console 0000000000400620
: 400620: 80 3d 39 0a 20 00 00 cmpb $0x0,0x200a39(%rip) # 601060 40062b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 400630: 48 83 ec 08 sub $0x8,%rsp 400634: 0f b6 05 26 0a 20 00 movzbl 0x200a26(%rip),%eax # 601061 40063b: c6 05 1e 0a 20 00 1a movb $0x1a,0x200a1e(%rip) # 601060 400642: 8d 78 41 lea 0x41(%rax),%edi 400645: 40 88 3d 15 0a 20 00 mov %dil,0x200a15(%rip) # 601061 40064c: 0f 1f 40 00 nopl 0x0(%rax) 400650: 40 0f b6 ff movzbl %dil,%edi 400654: e8 87 fe ff ff callq 4004e0 400659: 0f b6 05 01 0a 20 00 movzbl 0x200a01(%rip),%eax # 601061 400660: 80 2d f9 09 20 00 01 subb $0x1,0x2009f9(%rip) # 601060 400667: 8d 78 01 lea 0x1(%rax),%edi 40066a: 40 88 3d f0 09 20 00 mov %dil,0x2009f0(%rip) # 601061 400671: 75 dd jne 400650 400673: 31 c0 xor %eax,%eax 400675: 48 83 c4 08 add $0x8,%rsp 400679: c3 retq 40067a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) We also set up debugging information (via :py:meth:`gccjit.Context.new_location` and :py:data:`gccjit.BoolOption.DEBUGINFO`), so it's possible to use ``gdb`` to singlestep through the generated binary and inspect the internal state ``idx`` and ``data_cells``: .. code-block:: console (gdb) break main Breakpoint 1 at 0x400790 (gdb) run Starting program: a.out Breakpoint 1, 0x0000000000400790 in main (argc=1, argv=0x7fffffffe448) (gdb) stepi 0x0000000000400797 in main (argc=1, argv=0x7fffffffe448) (gdb) stepi 0x00000000004007a0 in main (argc=1, argv=0x7fffffffe448) (gdb) stepi 9 >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++< (gdb) list 4 5 cell 0 = 26 6 ++++++++++++++++++++++++++ 7 8 cell 1 = 65 9 >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++< 10 11 while cell#0 != 0 12 [ 13 > (gdb) n 6 ++++++++++++++++++++++++++ (gdb) n 9 >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++< (gdb) p idx $1 = 1 (gdb) p data_cells $2 = "\032", '\000' (gdb) p data_cells[0] $3 = 26 '\032' (gdb) p data_cells[1] $4 = 0 '\000' (gdb) list 4 5 cell 0 = 26 6 ++++++++++++++++++++++++++ 7 8 cell 1 = 65 9 >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++< 10 11 while cell#0 != 0 12 [ 13 > Other forms of ahead-of-time-compilation **************************************** The above demonstrates compiling a :py:class:`gccjit.Context` directly to an executable. It's also possible to compile it to an object file, and to a dynamic library. See the documentation of :py:meth:`gccjit.Context.compile_to_file` for more information.