Kogut language
Kokogut compiler
Copyright © 2004-2005 by
Marcin
'Qrczak' Kowalczyk
(qrczak@knm.org.pl)
Permission is granted to copy, distribute and/or modify this
document under the terms of the
GNU Free
Documentation License, Version 1.2 or any later version
published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included
here.
The Kogut Programming Language
Kogut is an experimental programming language which supports
impurely functional programming and a non-traditional flavor of
object-oriented programming. Its semantics is most similar to
Scheme or Dylan, but the syntax looks more like ML or Ruby.
The name “Kogut” means “Rooster”
(“Cock”) in Polish and is pronounced like
[KOH-goot].
The paradigm
-
The language is dynamically typed: consistent usage of types
is not enforced statically.
-
The language is mostly functional: object contents and
name bindings are immutable by default. You can request a
variable binding explicitly. The standard library includes
both immutable and mutable collections, with most important
compound types (lists, strings, tuples) being immutable.
-
The language doesn’t prevent arbitrary side effects from
occurring during evaluation of an expression (it’s not
purely functional) and the evaluation order is deterministic.
It’s not purely functional.
-
Objects are deallocated implicitly when no longer referenced
(garbage collection).
Names, definitions and scopes
-
The language is lexically scoped: an occurrence of a name
refers to a definition determined statically from program
source, not dynamically by control flow.
-
The same syntax and semantics of definitions is used globally
in a module scope, locally in a function, and for specifying
the fields of an object.
-
There is a single namespace: each identifier in a given scope
has one meaning, independent of the context of usage.
-
Definitions are evaluated and names are defined in the order
the definitions are written. Expressions may refer to names
defined above or below, as long as names defined below are
used only inside functions which are not called before the
names are defined.
Errors
-
There is no undefined behavior (an error can’t trash
memory nor make the processor execute unpredictable code),
and there is a little unspecified behavior. The meaning of a
program is almost deterministic. In particular strings and
lists are immutable, so they can be freely shared without
problems with modifying literals.
-
On errors generally exceptions are thrown, instead of implicit
conversion of an argument to another type, returning a null or
unspecified value, ignoring excess arguments, or guessing what
the programmer could possibly mean. In particular a condition
must be either
True or False, and
trying to get a non-existent element of a collection throws
an exception.
Execution
-
An object conceptually consists of three parts:
-
behavior, which is how it reacts to application to
arguments,
-
type, a tag which helps to identify behavioral protocols
it uses,
-
and sometimes some hidden type-specific magic.
-
A function (or generally any object) takes a list of
arguments and either returns a single result or throws
an exception. Keyword parameters and multiple results are
simulated in terms of this model.
-
Tail calls are properly implemented: before execution of a
tail call, the memory which was implicitly allocated for the
caller’s execution state is dealloated.
Miscellaneous issues with functions
-
Data objects are primarily used by applying functions to them,
rather than by sending them some messages. Objects themselves
are applied only to access their core functionality, e.g. to
access fields of a record.
-
In case of a generic interface common to several types with
different implementations, the functions specified by the
interface are realized by generic functions, which dispatch
their implementation on the types of arguments. In particular
many operators are generic functions.
-
New objects are constructed by applying functions which are
designed to return new objects, rather than by using some
distinct syntactic notion of constructors.
-
There is a single most important equality operator
==, which generally compares values of immutable
objects and identity of mutable objects, and can also be
defined manually for particular types.
-
The comparison used for sorting, for dictionary lookup, and
the corresponding hash function, are generally specified once
per type, not once per sorting operation or once per dictionary.
Instead, these operations take a transformation function which
extracts or transforms the part of the key used for comparison.
-
Locking and unlocking synchronization objects, blocking and
unblocking asynchronous signals, changing values of dynamic
variables (usually), installing signal handlers (usually)
are done for the duration of execution of given code, not as
a permanent effect of an imperative operation.
Syntax
-
Names are case-sensitive.
-
Function application is denoted by separating the function
from the arguments and the arguments from one another with
spaces, but functions are not curried; all arguments are
passed at once and the function can determine how many of
them were given.
-
Breaking of program text into lines and indentation are
insignificant. Definitions and statements are separated
by semicolons.
-
Mutable variables are first-class objects. The meaning of
accessing particular variables or accessing fields of an
object can be programmed.
-
The set of operators with their priorities is fixed, which
makes possible to parse a module independently of the contents
of other modules. You can make your own binary operators
from ordinary names with a fixed priority (they look like
%Foo or Foo%).
-
The only keyword is the underscore. Other identifier-like
names which are used in core syntactic constructs are macros,
and these names can be redefined.
-
Parentheses
() are used for grouping
subexpressions and subpatterns. Brackets [] are
used for making and matching lists. Braces {}
are used for delimiting other parts of the syntax (function
bodies, if and case branches,
object definitions etc.).
Conventions
-
Most global names are written
LikeThis, except
type names which are LIKE_THIS (because they
often coexist with a function or constant of similar name)
and names or important macros like let and
if. Local names and field names are usually
written likeThis.
-
In the author’s opinion the indentation width of
3 spaces looks nice. Since the standard tab width, 8, is
not even divisible by 3, tabs are better avoided. Using a
non-standard tab width would be evil.