The Caml-IDL mapping

3 The Caml-IDL mapping

This section describes how IDL types, function declarations, and interfaces are mapped to Caml types, functions and classes.

3.1 Base types

IDL type ty	Caml type [[ty]]
`byte`, `short`	`int`
`int`, `long` with `[camlint]` attribute	`int`
`int`, `long` with `[nativeint]` attribute	`nativeint`
`int`, `long` with `[int32]` attribute	`int32`
`int`, `long` with `[int64]` attribute	`int64`
`hyper`, `long long`, `__int64`	`int64`
`char`	`char`
`float`, `double`	`float`
`boolean`	`bool`

(For integer types, signed and unsigned variants of the same IDL integer type translate to the same Caml type.)

Depending on the attributes, the int and long integer types are converted to one of the Caml integer types int, nativeint, int32, or int64. Values of Caml type int32 are exactly 32-bit wide and values of type int64 are exactly 64-bit wide on all platforms. Values of type nativeint have the natural word size of the platform, and are large enough to accommodate any C int or long int without loss of precision. Values of Caml type int have the natural word size of the platform minus one bit of tag, hence the conversion from IDL types int and long loses the most significant bit on 32-bit platforms. On 64-bit platforms, the conversion from int is exact, but the conversion from long loses the most significant bit.

If no explicit integer attribute is given for an int or long type, the int_default or long_default attribute of the enclosing interface, if any, determines the kind of the integer. If no int_default or long_default attribute is in scope, the kind camlint is assumed, which maps IDL int and long types to the Caml int type.

3.2 Pointers

The mapping of IDL pointer types depends on their kinds. Writing [[ty]] for the Caml type corresponding to the IDL type ty, we have:

       [ref] ty *  Þ  [[ty]]
    [unique] ty *  Þ  [[ty]] option
       [ptr] ty *  Þ  [[ty]] Com.opaque

In other terms, IDL pointers of kind ref are ignored during the mapping: [ref] ty * is mapped to the same Caml type as ty. A pointer p to a C value c = *p is translated to the Caml value corresponding to c.

IDL pointers of kind unique are mapped to an option type. The option value is None for a null pointer, and Some(v) for a non-null pointer to a C value c that translates to the ML value v.

IDL pointers of kind ptr are mapped to a Com.opaque type. This is an abstract type that encapsulates the C pointer without attempting to convert it to an ML data structure.

IDL pointers of kind ignore denote struct fields and function parameters that need not be exposed in the Caml code. Those pointers are simply set to null when converting from Caml to C, and ignored when converting from C to Caml. They cannot occur elsewhere.

If no explicit pointer kind is given, the pointer_default attribute of the enclosing interface, if any, determines the kind of the pointer. If no pointer_default attribute is in scope, the kind unique is assumed.

3.3 Arrays

IDL arrays of characters that carry the [string] attribute are mapped to the Caml string type:

IDL type ty	Caml type [[ty]]
`[string] char []`	`string`
`[string] unsigned char []`	`string`
`[string] signed char []`	`string`
`[string] byte []`	`string`

Caml string values are translated to standard null-terminated C strings. Be careful about embedded null characters in the Caml string, which will be recognized as end of string by C functions.

IDL arrays carrying the [bigarray] attribute are translated to Caml ``big arrays'', as described in the next section.

All other IDL arrays are translated to ML arrays:

        ty []  Þ  [[ty]] array

For instance, double [] becomes float array. Consequently, multi-dimensional arrays are translated to Caml arrays of arrays. For instance, int [][] becomes int array array.

If the unique attribute is given, the IDL array is translated to an ML option type:

        [string,unique] char []  Þ  string option
        [unique] ty []     Þ  [[ty]] array option

As in the case of pointers of kind unique, the option value is None for a null C pointer, and Some(v) for a non-null C pointer to a C array that translates to the ML string or array v.

Conversion between a C array and an ML array proceed element by element. For the conversion from C to ML, the number of elements of the ML array is determined as follows (in the order presented):

By the length_is attribute, if present.
By the size_is attribute, if present.
By the bound written in the array type, if any.
By searching the first null element of the C array, if the null_terminated attribute is present.

For instance, C values of IDL type [length_is(n)] double[] are mapped to Caml float array of n elements. C values of IDL type double[10] are mapped to Caml float array of 10 elements.

The length_is and size_is attributes take as argument one or several limited expressions. Each expression applies to one dimension of the array. For instance, [size_is(*dimx, *dimy)] double d[][] specifies a matrix of double whose first dimension has size *dimx and the second has size *dimy.

3.4 Big arrays

IDL arrays of integers or floats that carry the [bigarray] attribute are mapped to one of the Caml Bigarray types: Array1.t for one-dimensional arrays, Array2.t for 2-dimensional arrays, Array3.t for 3-dimensional arrays, and Genarray.t for arrays of 4 dimensions or more.

If the [fortran] attribute is given, the big array is accessed from Caml using the Fortran conventions (array indices start at 1; column-major memory layout). By default, the big array is accessed from Caml using the C conventions (array indices start at 0; row-major memory layout).

If the [managed] attribute is given on a big array type that is result type or out parameter type of a function, Caml assumes that the corresponding C array was allocated using malloc(), and is not referenced anywhere else; then, the Caml garbage collector will free the C array when the corresponding Caml big array becomes unreachable. By default, Caml assumes that result or out C arrays are statically or permanently allocated, and keeps a pointer to them during conversion to Caml big arrays, and does not free them when the Caml bigarrays become unreachable.

Finally, the [unique] attribute applies to bigarrays as to arrays, that is, it maps a null C pointer to None, and a non-null C pointer p to Some(v) where v is the ML bigarray resulting from the translation of p.

3.5 Structs

IDL structs are mapped to Caml record types. The names and types of the IDL struct fields determine the names and types of the Caml record type:

struct s { ... ; ty_i id_i ; ... }  becomes  type s = { ... ; id_i : [[ty_i]] ; ... }

Example: struct s { int n; double d[4]; } becomes type s = {n: int; d: float array}.

Exceptions to this rule are as follows:

Fields of the IDL struct that are pointers with the [ignore] attribute do not appear in the Caml record type. Example: struct s { double x,y; [ignore] void * data; } becomes type struct_s = {x : float; y: float}. Those ignored pointer fields are set to NULL when converting from a Caml record to a C struct.
Integer fields of the IDL struct that appear in a length_is, size_is or switch_is attribute of another field also do not appear in the Caml record type. (We call those fields dependent fields.) Example: struct s { int idx; int len; [size_is(len)] double d[]; } is translated to the Caml record type type struct_s = {idx: int; d: float array}. The value of len is recovered from the size of the Caml array d, and thus doesn't need to be represented explicitly in the Caml record.
If, after elimination of ignored pointer fields and dependent fields as described above, the IDL struct has only one field ty id, we avoid creating a one-field Caml record type and translate the IDL struct type directly to the Caml type [[ty]]. Example: struct s { int len; [size_is(len)] double d[]; } is translated to the Caml type abbreviation type struct_s = double array.
The names of labels in the Caml record type can be changed by using the mlname attribute on struct field declarations. For instance,
```
struct s { int n; [mlname(p)] int q; }
         becomes type s = { n : int; p : int }
```
The Caml type system makes it difficult to use two record types defined in the same module and having some label names in common. Thus, if CamlIDL encounters two or more structs having identically-named fields, it prefixes the Caml label names by the names of the structs in order to distinguish them. For instance:
```
struct s1 { int x; int y; }
struct s2 { double x; double t; }
struct s3 { int z; }
         becomes type s1 = { s1_x: int; s1_y: int }
                 and s2 = { s2_x: float; s2_t: float }
                 and s3 = { z: int }
```
The labels for s1 and s2 have been prefixed by s1_ and s2_ respectively, to avoid ambiguity on the x label. However, the label z for s3 is not prefixed, since it is not used elsewhere.

The prefix added in front of multiply-defined labels is taken from the struct name, if any, and otherwise from the name of the nearest enclosing struct, union or typedef. For instance:
```
typedef struct { int x; } t;
struct s4 { struct { int x; } z; };
         becomes type t = { t_x: int }
                 and s4 = { z: struct_1 }
                 and struct_1 = { s4_x: int }
```
The ``minimal prefixing'' strategy described above is the default behavior of camlidl. If the -prefix-all-labels option is given, all record labels are prefixed, whether they occur several times or not. If the -keep-labels option is given, no automatic prefixing takes place; the naming of record labels is left entirely under the user's control, via mlname annotations.

3.6 Unions

IDL discriminated unions are translated to Caml sum types. Each case of the union corresponds to a constructor of the sum type. The constructor is constant if the union case has no associated field, otherwise has one argument corresponding to the union case field. If the union has a default case, an extra constructor Default_unionname is added to the Caml sum type, carrying an int argument (the value of the discriminating field), and possibly another argument corresponding to the default field. Examples:

union u1 { case A: int x; case B: case C: double d; case D: ; }
         becomes type u1 = A of int | B of float | C of float | D
union u2 { case A: int x; case B: double d; default: ; }
         becomes type u2 = A of int | B of float | Default_u of int
union u3 { case A: int x; default: double d; }
         becomes type u3 = A of int | Default_v of int * double

All IDL unions must be discriminated, either via the special syntax union name switch(int discr)..., or via the attribute switch_is(discr), where discr is a C l-value built from other parameters of the current function, or other fields of the current struct. Both the discriminant and the case labels must be of an integer type. Unless a default case is given, the value of the discriminant must be one of the cases of the union.

3.7 Enums

IDL enums are translated to Caml enumerated types (sum types with only constant constructors). The names of the constructors are determined by the names of the enum labels. The values attached to the enum labels are ignored. Example: enum e { A, B = 2, C = 4 } becomes type enum_e = A | B | C.

The set attribute can be applied to a named enum to denote a bitfield obtained by logical ``or'' of zero, one or several labels of the enum. The corresponding ML value is a list of zero, one or several constructors of the Caml enumerated type. Consider for instance:

enum e { A = 1, B = 2, C = 4 };
typedef [set] enum e eset;

The Caml type eset is equal to enum_e list. The C integer 6 (= B | C) is translated to the ML list [B; C]. The ML list [A; C] is translated to the C integer A | C, that is 5.

3.8 Type definitions

An IDL typedef statement is normally translated to a Caml type abbreviation. For instance, typedef [string] char * str becomes type str = string.

If the abstract attribute is given, a Caml abstract type is generated instead of a type abbreviation, thus hinding from Caml the representation of the type in question. For instance, typedef [abstract] void * handle becomes type handle. In this case, the IDL type in the typedef is ignored.

If the mltype ( " caml-type-expr " ) attribute is given, the Caml type is made equal to caml-type-expr. This is often used in conjunction with the ml2c and c2ml attributes to implement custom translation of data structures between C and ML. For instance, typedef [mltype("int list")] struct mylist_struct * mylist becomes type mylist = int list.

If the c2ml(funct-name) and ml2c(funct-name) attributes are given, the user-provided C functions given as attributes will be called to perform Caml to C and C to Caml conversions for values of the typedef-ed type, instead of using the camlidl-generated conversion functions. This allows user-controlled translation of data structures. The prototypes of the conversion functions must be

        value c2ml(ty * input);
        void ml2c(value input, ty * output);

where ty is the name of the type defined by typedef. In other terms, the c2ml function is passed a reference to a ty and returns the corresponding Caml value, while the ml2c function is passed a Caml value as first argument and stores the corresponding C value in the ty reference passed as second argument.

If the finalize(final-fn) attribute is given in combination with the abstract attribute, the function final-fn is called when the Caml block representing a value of this typedef becomes unreachable from Caml and is reclaimed by the Caml garbage collector. Similarly, compare(compare-fn) and hash(hash-fn) attach a comparison function and a hashing function (respectively) to Caml values for this typedef. The comparison function is called when two Caml values of this typedef are compared using the generic comparisons compare, =, <, etc. The hashing function is called when Hashtbl.hash is applied to a Caml value of this typedef. The prototype of the finalization, comparison and hashing functions are:

        value final-fn(ty * x);
        int compare-fn(ty * x, ty * y);
        long hash-fn(ty * x);

That is, their arguments are passed by reference. The comparison function must return an integer that is negative, zero, or positive depending on whether its first argument is smaller, equal or greater than its second argument. The hashing function returns a suitable hash value for its argument.

If the errorcheck(fn) attribute is provided for the typedef ty, the error checking function fn is called each time a function result of type ty is converted from C to Caml. The function can then check the ty value for values indicating an error condition, and raise the appropriate exception. If in addition the errorcode attribute is provided, the conversion from C to Caml is suppressed: values of type ty are only passed to fn for error checking, then discarded.

3.9 Functions

IDL function declarations are translated to Caml functions. The parameters and results of the Caml function are determined from those of the IDL function according to the following rules:

First, dependent parameters (parameters that are size_is, length_is or switch_is of other parameters) as well as parameters that are ignored pointers are removed.
The remaining parameters are split into Caml function inputs and Caml function outputs. Parameters with the [in] attribute are added to the inputs of the function. Parameters with the [out] attribute are added to the outputs of the function. Parameters with the [in,out] attribute are added both to the inputs and to the outputs of the function, unless they are of type string or big array, in which case they are added to the inputs of the function only. (The reason for this exception is that strings and big arrays are shared between Caml and C, thus allowing true in,out behavior on the Caml function parameter, while other data types are copied during Caml/C conversion, thus turning a C in,out parameter into a Caml copy in, copy out parameter, that is, one parameter and one result.)
The return value of the IDL function is added to the outputs of the Caml function (in first position), unless it is of type void or of a type name that carries the errorcode attribute. In the latter two cases, the return value of the IDL function is not transmitted to Caml.
The Caml function is then given type in₁ -> ... -> in_p -> out₁ * ... * out_q where in₁ ... in_p are the types of its inputs and out₁ ... out_q are the types of its outputs. If there are no inputs, a unit parameter is added. If there are no outputs, a unit result is added.

Examples:

int f([in] double x, [in] double y)             f : float -> float -> int

Two double input, one int output

void g([in] int x)                              g : int -> unit

One int input, no output

int h()                                         h : unit -> int

No input, one int result

void i([in] int x, [out] double * y)            i : int -> double

One int input, one double output (as an out parameter)

int j([in] int x, [out] double * y)             j : int -> int * double

One int input, one int output (in the result), one double output (as an out parameter)

void k([in,out,ref] int * x)                    k : int -> int

The in,out parameter is both one int input and one int output.

HRESULT l([in] int x, [out] int * res1, [out] int * res2)
                                                l : int -> int * int

HRESULT is a predefined type with the errorcode attribute, hence it is ignored. It remains one int input and two int outputs (out parameters)

void m([in] int len, [in,size_is(len)] double d[])
                                                m : float array -> int

len is a dependent parameter, hence is ignored. The only input is the double array

void n([in] int inputlen, [out] int * outputlen, 
       [in,out,size_is(inputlen),length_is(*outputlen)] double d[])
                                                n : float array -> float array

The two parameters inputlen and outputlen are dependent, hence ignored. The double array is both an input and an output.

void p([in] int dimx, [in] int dimy,
       [in,out,bigarray,size_is(dimx,dimy)] double d[][])
p : (float, Bigarray.float64_elt, Bigarray.c_layout) Bigarray.Array2.t -> unit

The two parameters dimx and dimy are dependent (determined from the dimensions of the big array argument), hence ignored. The two-dimensional array d, although marked [in,out], is a big array, hence passed as an input that will be modified in place by the C function p. The Caml function has no outputs.

Error checking:

For every output that is of a named type with the errorcheck(fn) attribute, the error checking function fn is called after the C function returns. That function is assumed to raise a Caml exception if it finds an output denoting an error.

Custom calling and deallocation sequences:

The IDL declaration for a function can optionally specify a custom calling sequence and/or a custom deallocation sequence, via quote clauses following the function declaration:

function-decl ::= attributes type-spec {*} ident ( params ) { quote( ident , string ) }

The general shape of a camlidl-generated stub function is as follows:

value caml_wrapper(value camlparam1, ..., value camlparamK)

  /* Convert the function parameters from Caml to C */
  param1 = ...;
  ...
  paramN = ...;
  /* Call the C function 'ident' */
  _res = ident(param1, ..., paramN);
  /* Convert the function result and out parameters to Caml values */
  camlres = ...;
  /* Return result to Caml */
  return camlres;

A quote(call, string ) clause causes the C statements in string to be inserted in the generated stub code instead of the default calling sequence _res = ident(param1, ..., paramN). Thus, the statements in string find the converted parameters in local variables that have the same names as the parameters in the IDL declaration, and should leave the result of the function, if any, in the local variable named _res.

A quote(dealloc, string ) clause causes the C statements in string to be inserted in the generated stub code just before the stub function returns, hence after the conversion of the C function results to Caml values. Again, the statements in string have access to the function result in the local variable named _res, and to out parameters in local variables having the same names as the parameters. Since the function results and out parameters have already been converted to Caml values, the code in string can safely deallocate the data structures they point to.

Custom calling sequences are typically used to rearrange or combine function parameters, and to perform extra error checks on the arguments and results. For instance, the Unix write system call can be specified in IDL as follows:

        int write([in] int fd,
                  [in,string,length_is(len)] char * data,
                  [in] int len,
                  [in] int ofs,
                  [in] int towrite)
          quote(call,
            " /* Validate the arguments */
              if (ofs < 0 || ofs + towrite >= len) failwith(\"write\");
              /* Perform the write */
              _res = write(fd, data + ofs, towrite);
              /* Validate the result */
              if (_res == -1) failwith(\"write\"); ");

Custom deallocation sequences are useful to free data structures dynamically allocated and returned by the C function. For instance, a C function f that returns a malloc-ed string can be specified in IDL as follows:

        [string] char * f([in] int x)
          quote(dealloc, "free(_res); ");

If the string is returned as an out parameter instead, we would write:

        void f ([in] int x, [out, string*] char ** str)
          quote(dealloc, "free(*str); ");

3.10 Interfaces

IDL interfaces that do not have the object attribute are essentially ignored. That is, the declarations contained in the interface are processed as if they occurred at the top-level of the IDL file. The pointer_default, int_default and long_default attributes to the interface can be used to specify the default pointer kind and integer mappings for the declarations contained in the interface. Other attributes, as well as the name of the super-interface if any, are ignored.

IDL interfaces having the object attribute specify COM-style object interfaces. The function declarations contained in the interface specify the methods of the COM interface. Other kinds of declarations (type declarations, import statements, etc) are treated as if they occurred at the top-level of the IDL file. An optional super-interface can be given, in which case the COM interface implements the methods of the super-interface in addition to those specified in the IDL interface. Example:

[object, uuid(...)] interface IA { typedef int t; int f(int x); }
[object] interface IB : IA { import "foo.idl"; void g([string] char * s); }

This defines a type t and imports the file foo.idl as usual. In addition, two interfaces are declared: IA, containing one method f from int to int, and IB, containing two methods, f from int to int and g from string to unit.

The definition of an object interface i generates the following Caml definitions:

An abstract type i identifying the interface. COM interfaces of type i are represented in Caml with type i Com.interface.
If a super-interface s is given, a conversion function s_of_i of type i Com.interface -> s Com.interface.
If the uuid(iid) attribute is given, a value iid_i of type i Com.iid holding the given interface identifier.
A Caml class i_class, with the same methods as the COM interface.
A function use_i of type i Com.interface -> i_class, to transform a COM object into a Caml object. This allows the methods of the COM object to be invoked from Caml.
A function make_i of type #i_class -> i Com.interface, to transform a Caml object into a COM object with interface i. This allows the methods of the Caml object to be invoked from any COM client.

Example: in the IA and IB example above, the following Caml definitions are generated for IA:

type iA
val iid_iA : iA Com.iid
class iA_class : iA Com.interface -> object method f : int -> int end
val use_iA : iA Com.interface -> iA_class
val make_iA : #iA_class -> iA Com.interface

For IB, we get:

type iB
val iA_of_iB : iB Com.interface -> iA Com.interface
class iB_class :
  iB Com.interface -> object inherit iA_class method g : string -> unit end
val use_iB : iB Com.interface -> iB_class
val make_iB : #iB_class -> iB Com.interface

Error handling in interfaces:

Conventionally, methods of COM interfaces always return a result of type HRESULT that says whether the method succeeded or failed, and in the latter case returns an error code to its caller.

When calling an interface method from Caml, if the method returns an HRESULT denoting failure, the exception Com.Error is raised with a message describing the error. Successful HRESULT return values are ignored. To make them available to Caml, camlidl defines the types HRESULT_bool and HRESULT_int. If those types are used as return types instead of HRESULT, failure results are mapped to Com.Error exceptions as before, but successful results are mapped to the Caml types bool and int respectively. (For HRESULT_bool, the S_OK result is mapped to true and other successful results are mapped to false. For HRESULT_int, the low 16 bits of the result code are returned as a Caml int.)

When calling a Caml method from a COM client, any exception that escapes the Caml method is mapped back to a failure HRESULT. A textual description of the uncaught exception is saved using SetLastError, and can be consulted by the COM client using GetLastError (this is the standard convention for passing extended error information in COM).

If the IDL return type of the method is not one of the HRESULT types, any exception escaping the Caml method aborts the whole program after printing a description of the exception. Hence, programmers of Caml components should either use HRESULT as result type, or make very sure that all exceptions are properly caught by the method.