Copyright © 2004-2009 by Marcin 'Qrczak' Kowalczyk (QrczakMK@gmail.com)

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included here.

Kokogut, a compiler of Kogut

Kokogut is an implementation of Kogut in Kogut itself. It translates Kogut source into C source, which is then compiled using a C compiler to produce an excecutable. It’s a work in progress, in particular the documentation is being written now.

Kokogut is hosted on SourceForge.net.

Here you can download the current version. It includes the contents of these web pages. It is developed under Linux, and tested under various Unix-like systems. Please contact me if you encounter building or porting problems, so I can make it more portable.

Choose one of two kinds of the package:

Building instructions:

  1. If you got kokogut from CVS: autoconf && autoheader
  2. Optional: see ./configure --help for building options.
  3. ./configure
  4. make
  5. Become root.
  6. make install

The compiler proper and example programs are licensed under the GPL, and the libraries under the LGPL, with a “linking exception”.

The linking exception allows you to link a “work that uses the Library” wit a publicly distributed version of the Library to produce an executable file containing portions of the Library, and distribute that executable file under terms of your choice, without any of the additional requirements listed in section 6 of LGPL version 2 or section 4 of LGPL version 3.

Kokogut principles

Note: these are principles of the current implementation, not of the language.

  1. There are no arbitrary limits on the nesting levels of expressions, magnitude of integers, number of parameters of a function, object size, recursion depth etc. Stack overflow is checked and the stack is resized as needed, the heap also grows as needed.
  2. Passing and receiving a known number of arguments between 0 and 8 is efficient. In other cases it’s equivalent to building and deconstructing a list of arguments.
  3. Integers which fit in a machine word are tagged in the lowest bit. All other values are represented by pointers. Objects are allocated statically when possible, otherwise they are on the heap.
  4. The garbage collector accurately traces pointers to objects, without conservatively assuming that some random memory locations (like the system stack) might point to Kogut objects. It is a copying collector with two generations and a software write barrier.
  5. Compilation doesn’t stop when an error is detected, all detected errors are reported.
  6. You can embed C code fragments directly in Kogut source to implement primitive types and operations.

Current limitations, to be lifted in future

  1. User macros are not implemented (and not fully designed).
  2. There are no companion programs like a profiler, a debugger, or an interactive interpreter.
  3. There is no portable FFI. The current integration with C relies on compilation to C and on various low-level details.
  4. Arithmetic on very large numbers can crash the program. This is caused by the GMP library which allocates temporary objects on the stack without overflow checking.
  5. Only some builtin functions are expanded inline.

Non-portable assumptions in the generated C code

Kokogut aims at producing quite portable C code, but assuming some reasonable properties of the environment allows to generate good quality code easier. Please tell me if some of these assumptions are not reasonable and it would make sense to port Kokogut to platforms where they are not satisfied.

  1. Signed integers use two’s complement arithmetic and don’t signal errors on overflow (overflow is checked after the fact); for new versions of gcc the -fwrapv flag is used to obtain this behavior. Integer division and remainder round towards zero (this is unspecified by C89 but required by C99). Right-shitfing a negative number preserves the sign.
  2. All pointers have the same size. There is an integer type of the same size as a pointer. Odd integers can be cast to pointers and back. Pointers into arrays of pointers are even. Incrementing a pointer past the end of an array doesn’t cause errors.
  3. There is no unexpected padding in structures consisting of pointers and pointer-sized integers, and types other than double don’t have stricter alignment requirements than a pointer.
  4. Pointers have at least 32 bits. While it would be easy to lift this restriction, as it deals only with making integer literals and with the default stack and heap sizes, Kokogut libraries would not fit on a 16-bit platform anyway, the generated code is too large.
  5. Sometimes an object is accessed using a different type than it was created with. This mostly deals with pointer types and with structs having a similar layout. I believe this happens only in places where it does not make harm in practice to apply C99’s type-based aliasing rules (option -fstrict-aliasing in GCC, turned on by default).
  6. Limits of the C compiler regarding issues like the number of significant characters in identifiers, the level of nesting of blocks, the number of external definitions in a translation unit etc. may affect the limits of Kogut code. Kokogut doesn’t impose artificial limits itself but it can inherit them from the C compiler used.
  7. The generated code uses the GMP library for big integers so it must be available for the target platform. It’s assumed that GMP limbs have the same size as pointers and that values of type mpz_t may be moved (not copied) using memcpy.

Kogut and kokogut history

There is no definite starting time point of developing Kogut. Its first incarnation, an interpreter in Haskell, was written in August 2001. It was much different from Kogut though.

The second incarnation, a compiler in Haskell which generates OCaml code, was written in November 2001 and touched in June 2002.

The third incarnation, an interpreter in OCaml, was abandoned in Sempember 2002. Parts of it were reused in the fourth incarnation, another interpreter in OCaml. This interpreter started working in January 2003 and was used to bootstrap further implementations written in Kogut. It was being tweaked during 2003 to follow small changes in the language, and now implements mostly a subset of Kogut.

The fourth implementation, a compiler in Kogut which generates C code, was abandoned at the beginning of November 2003. Large parts of it were reused in the fifth incarnation, which started on November 2, 2003 and is the current Kokogut.

On January 12, 2004 Kokogut started producing executable programs. The compiler was mostly finished then, but the library was nearly non-existent.

On March 13, 2004 Kokogut was able to compile itself for the first time. Later effort concentrated on improving the libraries and the building system.

Since May 26, 2004 Kokogut is hosted on SourceForge.