A discussion of binding modules, the principles behind the tool, and a
discussion of related work can be found in a research paper located at http://www.cse.unsw.edu.au/~chak/papers/papers.html#c2hs. All
features described in the paper, except enum define
hooks
are implemented in the tool, but since the publication of the paper, the tool
has been extended further. The library interface essentially consists of the
new Haskell FFI Marshalling Library. More details about this library are
provided in the next section.
The remainder of this section describes the hooks that are available in binding modules.
{#import [qualified] modid
#}
Is translated into the same syntactic form in Haskell, which implies that
it may be followed by an explicit import list. Moreover, it implies that
the module modid
is also generated by C➔Haskell and
instructs the tool to read the file
modid
.chi
.
If an explicit output file name is given (--output
option), this name determines the basename for the .chi
file of the currently translated module.
Currently, only pointer hooks generate information that is stored in a
.chi
file and needs to be incorporated into any client
module that makes use of these pointer types. It is, however, regarded as
good style to use import hooks for any module generated by C➔Haskell.
C➔Haskell does not use qualified names. This can be a problem, for example, if two pointer hooks are defined to have the same unqualified Haskell name in two different modules, which are then imported by a third module. To partially work around this problem, it is guaranteed that the declaration of the textually later import hook dominates.
{#context [lib =lib
] [prefix =prefix
]#}
Context hooks define a set of global configuration options. Currently, there are two parameters which are both strings
lib
is a dynamic library that contains
symbols needed by the present binding.
prefix
is an identifier prefix that
may be omitted in the lexemes of identifiers referring to C definitions
in any binding hook. The is useful as C libraries often use a prefix,
such as gtk_
, as a form of poor man's name spaces.
Any occurrence of underline characters between a prefix and the main
part of an identifier must also be dropped. Case is not relevant in a
prefix. In case of a conflict of the abbreviation with an explicitly
defined identifier, the explicit definition takes preference.
Both parameters are optional. An example of a context hook is the following:
{#context prefix = "gtk"#}
If a binding module contains a binding hook, it must be the first hook in the module.
{#type ident
#}
A type hooks maps a C type to a Haskell type. As an example, consider
type GInt = {#type gint#}
The type must be a defined type, primitive types, such as
int
, are not admissible.
{#sizeof ident
#}
A sizeof hooks maps a C type to its size in bytes. As an example, consider
gIntSize :: Int gIntSize = {#sizeof gint#}
The type must be a defined type, primitive types, such as
int
, are not admissible. The size of primitive types can
always be obtained using Storable.sizeOf
.
{#enumcid
[ashsid
] {alias1
, ... ,aliasn
} [with prefix =pref
] [deriving (clid1
, ... ,clidn
)]#}
Rewrite the C enumeration called cid
into a
Haskell data type declaration, which is made an instance of
Enum
such that the ordinals match those of the
enumeration values in C. This takes explicit enumeration values in the C
definitions into account. If hsid
is given, this
is the name of the Haskell data type. The identifiers
clid1
to clidn
are
added to the deriving clause of the Haskell type.
By default, the names of the C enumeration are used for the constructors in
Haskell. If alias1
is
underscoreToCase
, the original C names are capitalised
and the use of underscores is rewritten to caps. If it is
upcaseFirstLetter
or
downcaseFirstLetter
, the first letter of the original C
name changes case correspondingly. It is also possible to combine
underscoreToCase
with one of
upcaseFirstLetter
or
downcaseFirstLetter
. Moreover,
alias1
to aliasn
may
be aliases of the form cid
as
hsid
, which map individual C names to Haskell
names. Instead of the global prefix introduced by a context hook, a local
prefix pref
can optionally be specified.
As an example, consider
{#enum WindowType {underscoreToCase} deriving (Eq)#}
Many C libraries do not use enum types, but macro definitions to implement constants.
c2hs provides enum define
hooks generate a haskell datatype from a collection of macro definitions.
{#enum definehsid
{alias1
, ... ,aliasn
} [deriving (clid1
, ... ,clidn
)]#}
Create a haskell datatype hsid
, with nullary constructors as given by the aliases alias1
through aliasn
. Each alias has to be of the form macrodef as hsid
, where hsid
is the name of the nullary haskell constructor, and macrodef
the C macro which the haskell constructor should map to. The deriving part is handled as in ordinary enum
hooks.
Here's an example
#define X 0 #define Y 1
{#enum define Axis {X as Axis0, Y as Axis1} deriving (Eq,Ord) #}
{#call [pure] [unsafe] [interruptible]cid
[as (hsid
| ^)]#}
A call hook rewrites to a call to the C function
cid
and also ensures that the appropriate foreign
import declaration is generated. The tags pure
and
unsafe
specify that the external function is purely
functional and cannot re-enter the Haskell runtime, respectively. The
interruptible
flag is intended to be used in conjunction
with the InterruptibleFFI extension. If hsid
is
present, it is used as the identifier for the foreign declaration, which
otherwise defaults to the cid
. When instead of
hsid
, the symbol ^
is given,
the cid
after conversion from C's underscore
notation to a capitalised identifier is used.
As an example, consider
sin :: Float -> Float sin = {#call pure sin as "_sin"#}
{#fun [pure] [unsafe] [interruptible]cid
[as (hsid
| ^)] [ctxt
=>] {parm1
, ... ,parmn
} ->parm
Function hooks are call hooks including parameter marshalling. Thus, the
components of a function hook up to and including the as
alias are the same as for call hooks. However, an as
alias has a different meaning; it specifies the name of the generated
Haskell function. The remaining components use literals enclosed in
backwards and foward single quotes (`
and
'
) to denote Haskell code fragments (or more precisely,
parts of the Haskell type signature for the bound function). The first one
is the phrase ctxt
preceding
=>
, which denotes the type context. This is followed by
zero or more type and marshalling specifications
parm1
to parmn
for the
function arguments and one parm
for the function
result. Each such specification parm
has the
form
[inmarsh
[* | -]]hsty
[&] [outmarsh
[*] [-]]
where hsty
is a Haskell code fragment denoting a
Haskell type. The optional information to the left and right of this type
determines the marshalling of the corresponding Haskell value to and from C;
they are called the in and out
marshaller, respectively.
Each marshalling specification parm
corresponds
to one or two arguments of the C function, in the order in which they are
given. A marshalling specification in which the symbol
&
follows the Haskell type corresponds to two C
function arguments; otherwise, it corresponds only to one argument. The
parm
following the left arrow
->
determines the marshalling of the result of the C
function and may not contain the symbol &
.
The *-
output marshal specification is for monadic
actions that must be executed but whose results are discarded. This is very
useful for e.g. checking an error value and throwing an exception if needed.
Both inmarsh
and
outmarsh
are identifiers of Haskell marshalling
functions. By default they are assumed to be pure functions; if they have
to be executed in the IO
monad, the function name needs
to be followed by a star symbol *
. Alternatively, the
identifier may be followed by a minus sign -
, in which
case the Haskell type does not appear as an argument
(in marshaller) or result (out marshaller) of the generated Haskell
function. In other words, the argument types of the Haskell function is
determined by the set of all marshalling specifications where the in
marshaller is not followed by a minus sign. Conversely, the result tuple of
the Haskell function is determined by the set of all marshalling
specifications where the out marshaller is not followed by a minus sign.
The order of function arguments and components in the result tuple is the
same as the order in which the marshalling specifications are given, with
the exception that the value of the result marshaller is always the first
component in the result tuple if it is included at all.
For a set of commonly occuring Haskell and C type combinations,
default marshallers are provided by C➔Haskell if no
explicit marshaller is given. The out marshaller for function arguments
is by default void-
. The defaults for the in marshallers
for function arguments are as follows:
Bool
and integral C type (including chars):
cFromBool
Integral Haskell and integral C type: cIntConv
Floating Haskell and floating C type: cFloatConv
String
and char*
:
withCString*
String
and char*
with
explicit length: withCStringLen*
T
and
T
*
:
with*
T
and
T
*
where
T
is an integral type:
withIntConv*
T
and
T
*
where
T
is a floating type:
withFloatConv*
Bool
and
T
*
where
T
is an integral type:
withFromBool*
The defaults for the out marshaller of the result are the converse of the
above; i.e., instead of the with
functions, the
corresponding peek
functions are used. Moreover, when
the Haskell type is ()
, the default marshaller is
void-
.
As an example, consider
{#fun notebook_query_tab_label_packing as ^ `(NotebookClass nb, WidgetClass cld)' => {notebook `nb' , widget `cld' , alloca- `Bool' peekBool*, alloca- `Bool' peekBool*, alloca- `PackType' peekEnum*} -> `()'#}
which results in the Haskell type signature
notebookQueryTabLabelPacking :: (NotebookClass nb, WidgetClass cld) => nb -> cld -> IO (Bool, Bool, PackType)
which binds the following C function:
void gtk_notebook_query_tab_label_packing (GtkNotebook *notebook, GtkWidget *child, gboolean *expand, gboolean *fill, GtkPackType *pack_type);
{#get apath
#}
A get hook supports accessing a member value of a C structure. The hook
itself yields a function that, when given the address of a structure of the
right type, performs the structure access. The member that is to be
extracted is specified by the access path apath
.
Access paths are formed as follows (following a subset of the C expression
syntax):
The root of any access path is a simple identifier, which denotes either
a type name or struct
tag.
An access path of the form
*
apath
denotes
dereferencing of the pointer yielded by accessing the access path
apath
.
An access path of the form
apath
.
cid
specifies that the value of the struct
member called
cid
should be accessed.
Finally, an access path of the form
apath
->
cid
,
as in C, specifies a combination of dereferencing and member selection.
For example, we may have
visualGetType :: Visual -> IO VisualType visualGetType (Visual vis) = liftM cToEnum $ {#get Visual->type#} vis
{#set apath
#}
Set hooks are formed in the same way as get hooks, but yield a function that assigns a value to a member of a C structure. These functions expect a pointer to the structure as the first and the value to be assigned as the second argument. For example, we may have
{#set sockaddr_in.sin_family#} addr_in (cFromEnum AF_NET)
{#pointer [*]cid
[ashsid
] [foreign | stable] [newtype | ->hsid2
] [nocode]#}
A pointer hook facilitates the mapping of C to Haskell pointer types. In
particular, it enables the use of ForeignPtr
and
StablePtr
types and defines type name translations for
pointers to non-basic types. In general, such a hook establishes an
association between the C type cid
or
*
cid
and the Haskell type
hsid
, where the latter defaults to
cid
if not explicitly given. The identifier
cid
will usually be a type name, but in the case
of *
cid
may also be a struct,
union, or enum tag. If both a type name and a tag of the same name are
available, the type name takes precedence. Optionally, the Haskell
representation of the pointer can be by a ForeignPtr
or
StablePtr
instead of a plain Ptr
. If
the newtype
tag is given, the Haskell type
hsid
is defined as a newtype
rather than a transparent type synonym. In case of a
newtype
, the type argument to the Haskell pointer type
will be hsid
, which gives a cyclic definition,
but the type argument is here really only used as a unique type tag.
Without newtype
, the default type argument is
()
, but another type can be specified after the symbol
->
.
For example, we may have
{#pointer *GtkObject as Object newtype#}
This will generate a new type Object
as follows:
newtype Object = Object (Ptr Object)
which enables exporting Object
as an abstract type and
facilitates type checking at call sites of imported functions using the
encapsulated pointer. The latter is achieved by C➔Haskell as follows. The
tool remembers the association of the C type *GtkObject
with the Haskell type Object
, and so, it generates for
the C function
void gtk_unref_object (GtkObject *obj);
the import declaration
foreign import gtk_unref_object :: Object -> IO ()
This function can obviously only be applied to pointers of the right type, and thus, protects against the common mistake of confusing the order of pointer arguments in function calls.
However, as the Haskell FFI does not permit to directly pass
ForeignPtr
s to function calls or return them, the tool
will use the type Ptr HsName
in this case, where
HsName
is the Haskell name of the type. So, if we modify
the above declaration to be
{#pointer *GtkObject as Object foreign newtype#}
the type Ptr Object
will be used instead of a plain
Object
in import declarations; i.e., the previous
import
declaration will become
foreign import gtk_unref_object :: Ptr Object -> IO ()
To simplify the required marshalling code for such pointers, the tool automatically generates a function
withObject :: Object -> (Ptr Object -> IO a) -> IO a
As an example that does not represent the pointer as an abstract type, consider the C type declaration:
typedef struct {int x, y;} *point;
We can represent it in Haskell as
data Point = Point {x :: Int, y :: Int} {#pointer point as PointPtr -> Point#}
which will translate to
data Point = Point {x :: Int, y :: Int} type PointPtr = Ptr Point
and establish a type association between point
and
PointPtr
.
If the keyword nocode
is added to the end of a pointer
hook, C➔Haskell will not emit a type declaration. This is useful when a C➔Haskell
module wants to make use of an existing type declaration in a binding not
generated by C➔Haskell (i.e., where there are no .chi
files).
The name cid
cannot be a basic C type (such as
int
), it must be a defined name.
{#class [hsid1
=>]hsid2
hsid3
#}
Class hooks facilitate the definition of a single inheritance class hierachy for external pointers including up and down cast functionality. This is meant to be used in cases where the objects referred to by the external pointers are order in such a hierachy in the external API - such structures are encountered in C libraries that provide an object-oriented interface. Each class hook rewrites to a class declaration and one or more instance declarations.
All classes in a hierarchy, except the root, will have a superclass
identified by hsid1
. The new class is given by
hsid2
and the corresponding external pointer is
identified by hsid3
. Both the superclass and the
pointer type must already have been defined by binding hooks that precede
the class hook.
The pointers in a hierachy must either all be foreign pointers or all be
normal pointers. Stable pointers are not allowed. Both pointer defined as
newtype
s and those defined by type synonyms may be used
in class declarations and they may be mixed. In the case of synonyms,
Haskell's usual restrictions regarding overlapping instance declarations
apply.
The newly defined class has two members whose names are derived from the
type name hsid3
. The name of first member is
derived from hsid3
by converting the first
character to lower case. This function casts from any superclass to the
current class. The name of the second member is derived by prefixing
hsid3
with the from
. It casts
from the current class to any superclass. A class hook generates an
instance for the pointer in the newly defined class as well as in all its
superclasses.
As an example, consider
{#pointer *GtkObject newtype#} {#class GtkObjectClass GtkObject#} {#pointer *GtkWidget newtype#} {#class GtkObjectClass => GtkWidgetClass GtkWidget#}
The second class hook generates an instance for GtkWidget
for both the GtkWidgetClass
as well as for the
GtkObjectClass
.
A Haskell binding module may include arbitrary C pre-processor directives using the standard C syntax. The directives are used in two ways: Firstly, they are included in the C header file generated by C➔Haskell in exactly the same order in which they appear in the binding module. Secondly, all conditional directives are honoured by C➔Haskell in that all Haskell binding code in alternatives that are discarded by the C pre-processor are also discarded by C➔Haskell. This latter feature is, for example, useful to maintain different bindings for multiple versions of the same C API in a single Haskell binding module.
In addition to C pre-processor directives, vanilla C code can be maintained
in a Haskell binding module by bracketing this C code with the pseudo
directives #c
and #endc
. Such inline
C code is emitted into the C header generated by C➔Haskell at exactly the same
position relative to CPP directives as it occurs in the binding module.
Pre-processor directives may encompass the #include
directive, which can be used instead of specifying a C header file as an
argument to c2hs
. In particular, this enables the
simultaneous use of multiple header files without the need to provide a
custom header file that binds them together. If a header file
lib
.h
is specified as an
argument to c2hs
, the tool will emit the directive
#include"
lib
.h"
into the generated C header before any other CPP directive or inline C code.
As an artificial example of these features consider the following code:
#define VERSION 2 #if (VERSION == 1) foo :: CInt -> CInt foo = {#call pure fooC#} #else foo :: CInt -> CInt -> CInt foo = {#call pure fooC#} #endif #c int fooC (int, int); #endc
One of two versions of the Haskell function foo
(having
different arities) is selected in dependence on the value of the CPP macro
VERSION
, which in this example is defined in the same
file. In realistic code, VERSION
would be defined in
the header file supplied with the C library that is made accessible from
Haskell by a binding module. The above code fragment also includes one
line of inline C code that declares a C prototype for
fooC
.
Inline C code can currently not contain any code blocks; i.e., only declarations as typically found in header files may be included.
The following grammar rules define the syntax of binding hooks:
hook -> `{#' inner `#}' inner -> `import' ['qualified'] ident | `context' ctxt | `type' ident | `sizeof' ident | `enum' idalias trans [`with' prefix] [deriving] | `call' [`pure'] [`unsafe'] [`interruptible'] idalias | `fun' [`pure'] [`unsafe'] [`interruptible'] idalias parms | `get' apath | `set' apath | `pointer' ['*'] idalias ptrkind | `class' [ident `=>'] ident ident ctxt -> [`lib' `=' string] [prefix] idalias -> ident [(`as' ident | `^')] prefix -> `prefix' `=' string deriving -> `deriving' `(' ident_1 `,' ... `,' ident_n `)' parms -> [verbhs `=>'] `{' parm_1 `,' ... `,' parm_n `}' `->' parm parm -> [ident_1 [`*' | `-']] verbhs [`&'] [ident_2 [`*'] [`-']] apath -> ident | `*' apath | apath `.' ident | apath `->' ident trans -> `{' alias_1 `,' ... `,' alias_n `}' alias -> `underscoreToCase' | `upcaseFirstLetter' | `downcaseFirstLetter' | ident `as' ident ptrkind -> [`foreign' | `stable'] ['newtype' | '->' ident]
Identifier ident
follow the lexis of Haskell. They may
be enclosed in single quotes to disambiguate them from C->Haskell keywords.