** Documentation for Tyger.
**
** Copyright: (c) 2017-2023 Jacco van Schaik (jacco@jaccovanschaik.net)
**
** This software is distributed under the terms of the MIT license. See
** http://www.opensource.org/licenses/mit-license.php for details.

Tyger

Introduction

Tyger is a program that generates code to manipulate data types. Its primary mission is to create serializers and deserializers so these types can be sent across a network, or saved and retrieved from binary files. Additional functions can be generated for various other tasks, such as printing them in a human-readable fashion, or creating, modifying and deleting them. Currently, Tyger can generate code in C or Python (versions 2.x and 3.x).

The generated code can be incorporated into your own project. In addition, you'll need to use a run-time library that comes with Tyger. This library contains functions that are called by the generated code. These functions take care of handling the basic types thay Tyger uses to build its data structures out of (integers, floating point numbers, strings etc.)

The types to generate are defined in an input file with a very simple syntax. Tyger uses the C pre-processor (cpp) to read these input files, so all facilities that cpp offers (includes, pragmas, defines) are available in Tyger input files.

There is a Git repository for Tyger at https://github.com/jaccovanschaik/Tyger.

Tyger requires the GNU C pre-processor (cpp) to be available when generating code (but not when running the generated code). Tyger also requires libjvs, both to compile Tyger itself and to compile the generated code.

Operation

When called without any parameters, Tyger displays its usage, which looks like this:

Usage: tyger <options> <input-file>

Options:
        -V, --version                   Print version and exit.
        -c, --c-src <C-source-output>   Output C source file here.
        -h, --c-hdr <C-header-output>   Output C header file here.
        -p, --python <python-output>    Output python code here.
        -i, --indent <indent-string>    Use this string as indent.

        Switches accepted by the C code generator
          --c-packsize  Generate packsize functions
          --c-pack      Generate pack functions
          --c-unpack    Generate unpack functions
          --c-clear     Generate clear functions
          --c-destroy   Generate destroy functions
          --c-print     Generate print functions

        Switches accepted by the Python code generator
          --py-pack     Generate pack functions
          --py-unpack   Generate unpack functions
          --py-recv     Generate recv functions
          --py-mx-send  Generate MX send functions
          --py-mx-bcast Generate MX broadcast functions

Normally, Tyger will be called with the name of an input file containing type descriptions, one or more options telling it what output files to generate, and a number of options telling it which functions to generate for each type and each language. Tyger tries to be smart about which header files it includes (when generating C code) or modules it imports (when generating Python code) so it will only add #include or import statements to the generated code if they are required.

Languages

Tyger can generate C or Python code. For C code it will generate:

a C header file that contains typedefs for all the defined types and prototypes for all the generated functions;
a C source file that contains the bodies of all the generated functions.

For Python code it will generate a single Python file that contains, for each defined type:

A class that defines the type (only for compound types, i.e. structures and unions);
A Packer class that knows how to pack and unpack the type.

Data types

Tyger supports the following data types:

Booleans
Integers
Floating point numbers
Strings
Enums
Structures
Tagged unions
Arrays
Constants

Booleans

Booleans represent a true or false value.

In C and Python, the type used for these is bool.

Booleans are serialized as a single byte, where 0 represents false and 1 represents true.

Integers

Integers are simple whole numbers. Their name is either int or uint (to indicate a signed or an unsigned integer), followed by 8, 16, 32 or 64 (to indicate the number of bits they contain).

In C, the type used for these is one of the stdint.h types (uint8_t etc). In Python the type for all of these is int.

Integers are serialized as one or more bytes, most-significant byte first.

Floating point numbers

Floating point numbers are standard IEEE-754 floating point numbers. Their name is float followed by either 32 (to indicate a standard, single precision float with 32 bits) or 64 (to indicate a double precision float with 64 bits).

In C, the type for these is float or double. In Python, the type for both of these is float.

Floating point numbers are serialized as a 4 or 8-byte sequence, most significat byte first.

Strings

There are two types of strings: astring is a string of single-byte ASCII characters, ustring is a string of multibyte Unicode characters.

In C, these are arrays of char or wchar_t characters. In Python 2, str objects are used for astrings and unicode objects for ustrings. In Python 3, str objects are used for both astrings and ustrings.

Strings are serialized using a 4-byte unsigned big-endian byte (not character!) count, followed by the character data (encoded as UTF-8 in the case of a ustring).

Enumerations

Enumerations are simple mappings of symbolic names to integers, as used in many programming languages.

In C these are translated to enums. In Python, they are translated to a class that contains class variables with the given names and the given value.

They are serialized as unsigned big-endian integers. The number of bytes used is dependent on their maximum value (one byte if the maximum value is 255 or less, two bytes if it is 65535 or less, etc.) The maximum size allowed is four bytes, for a maximum enum value of 4,294,967,295 or 2³² - 1.

Structures

These are abstract types containing multiple internal fields. Fields may be designated optional by prefixing the field with the keyword "opt".

In C these are translated to typedef struct definitions. For each field the struct contains, a struct member is added. If a field is optional, a pointer to a struct member is added. If an optional field is not present this pointer is set to NULL.

In Python, structs are translated to classes that have a member variable for each field.

They are serialized by simply serializing each field in turn. Optional fields are preceded by a byte set to 1 if the field is present (in which case it follows immediately) or 0 if it isn't.

Unions

These are abstract types that contain one of a number of possible internal fields, as indicated by a "discriminator". A discriminator must be an integer type. A field may be void, indicating that there is no additional data required for that discriminator value.

In C, these are translated to a typedef struct that contains the discriminator and a union u that contains the internal fields. In Python, it is translated to a class that contains the discriminator and a member named u that is set to the indicated internal field.

Unions are serialized by first serializing the discriminator as an unsigned big-endian 32-bit integer and then the internal field indicated by the discriminator.

Arrays

Arrays consist of a number of identically-typed fields.

In C these are translated to standard arrays. In Python they are translated to lists.

They are serialized by first serializing the member count as an unsigned big-endian 32-bit integer and then the contained members.

Constants

A constant has a name and a value. The value can be any of the built-in types: int, float, astring and ustring.

In C a constant is defined as a const in the generated .c file and declared as an extern const in the accompanying .h file. In Python, it is set as a global variable in the generated .py file.

Constants are not serialized. They are available to both the sender and receiver (or writer and reader) in the generated code.

Input files

This section describes the input files for Tyger. These are used to specify, using a high-level description, the types to be generated.

The examples shown here are derived from the test code that comes with Tyger. So if you want to see them in their "natural habitat", see some more examples, and see the C and Python code that is generated based on these definitions: go to the top-level directory of the software, type "make test" and look at the generated code (Objects.c, Objects.h and Objects.py).

Aliases

New types can be aliased from existing ones with the following syntax:

<new_type_name> = <existing_type_name>

For example, to define a new type called Coordinate and make it a signed 32-bit integer you would use:

Coordinate = int32

Defining a struct type

Structs are defined as follows:

<new_type_name> = struct {
    <type_of_first_element> <name_of_first_element>
    <type_of_second_element> <name_of_second_element>
    ...
}

You can add an arbitrary number of fields to a struct.

So if you wanted to define a type called Object that contains an ASCII string name, an optional Unicode string creator, a boolean visible and a field named shape with type Shape, you'd do that as follows:

Object = struct {
    astring     name
    opt ustring creator
    bool        visible
    Shape       shape
}

Defining an enumeration

This is how an enumeration is defined:

<new_type_name> = enum {
    <first_symbolic_name> [ = <first_value> ]
    <second_symbolic_name> [ = <second_value> ]
    ...
}

Again, you can add arbitrary number of entries, as long as the greatest value is 2³² - 1 or less.

As indicated, values are optional. If a value is not specified, it is set to the previous one incremented by 1. If the first value is not specified it is set to 0.

For example, if we wanted to create an enum called ShapeType to indicate different types of shapes, we might do it like this:

ShapeType = enum {
    ST_NONE    = 0
    ST_LINE    = 1
    ST_POLYGON = 2
    ST_PLANE   = 3
    ST_SPHERE  = 4
}

Defining a union type

A union is defined like this:

<new_type_name> = union(<discriminator_type> <discriminator_name>) {
    <discriminator_value_for_first_choice>: <type_of_first_choice> <name_of_first_choice>
    <discriminator_value_for_second_choice>: <type_of_second_choice> <name_of_second_choice>
    ...
}

Based on the value of the discriminator (i.e. variable with name discriminator_name) the associated field will be active.

For example, let's say we wanted to define a union name Shape, which would contain one of a number of internal fields based on shape_type (which is a discriminator field of type ShapeType as shown above). This is how we would do that:

Shape = union(ShapeType shape_type) {
    ST_NONE:    void
    ST_LINE:    Line    line
    ST_POLYGON: Polygon polygon
    ST_PLANE:   Plane   plane
    ST_SPHERE:  Sphere  sphere
}

Defining a constant

Constants are defined like this:

<constant_name> = const <constant_type> <constant_value>

For example, let's say you wanted to define a constant unsigned 32-bit int named Dimensions with the value 3, you'd do this like so:

Dimensions = const uint32 3

Options

This section describes the command line options for the tyger command. First we'll talk about the general options, then we'll look at the options specific to the C and Python code generators.

General options

This section describes Tygers general options.

-V or --version

Print version and exit.
-c or --c-src <C-source-output>

Output C source file to the file given in <C-source-output>.
-h or --c-hdr <C-header-output>

Output C header file to the file given in <C-header-output>.
-p or --python <python-output>

Output python code to the file given in <python-output>.
-i or --indent <indent-string>

Use the string given in <indent-string> as one level of indent.

C code generator options

This section explains the switches accepted by the C code generator.

--c-packsize

Generate packsize functions.
--c-pack

Generate pack functions
--c-unpack

Generate unpack functions
--c-wrap

Generate wrap functions
--c-unwrap

Generate unwrap functions
--c-read-fd

Generate functions to read from an fd
--c-write-fd

Generate functions to write to an fd
--c-read-fp

Generate functions to read from an FP
--c-write-fp

Generate functions to write to an FP
--c-print-fp

Generate print functions
--c-create

Generate create functions
--c-set

Generate set functions
--c-copy

Generate copy functions
--c-clear

Generate clear functions
--c-destroy

Generate destroy functions
--c-mx-send

Generate MX send functions
--c-mx-bcast

Generate MX broadcast functions

Python code generator options

This section explains the switches accepted by the Python code generator. In Python it is much easier to get added functionality by combining generated functions and existing language features, so we need far fewer generated functions.

--py-pack

Generate pack functions.
--py-unpack

Generate unpack functions
--py-recv

Generate recv functions
--py-mx-send

Generate MX send functions
--py-mx-bcast

Generate MX broadcast functions