** Documentation for Tyger. ** ** Copyright: (c) 2017-2023 Jacco van Schaik (jacco@jaccovanschaik.net) ** ** This software is distributed under the terms of the MIT license. See ** http://www.opensource.org/licenses/mit-license.php for details.
Tyger is a program that generates code to manipulate data types. Its primary mission is to create serializers and deserializers so these types can be sent across a network, or saved and retrieved from binary files. Additional functions can be generated for various other tasks, such as printing them in a human-readable fashion, or creating, modifying and deleting them. Currently, Tyger can generate code in C or Python (versions 2.x and 3.x).
The generated code can be incorporated into your own project. In addition, you'll need to use a run-time library that comes with Tyger. This library contains functions that are called by the generated code. These functions take care of handling the basic types thay Tyger uses to build its data structures out of (integers, floating point numbers, strings etc.)
The types to generate are defined in an input file with a very simple syntax. Tyger uses the C pre-processor (cpp) to read these input files, so all facilities that cpp offers (includes, pragmas, defines) are available in Tyger input files.
There is a Git repository for Tyger at https://github.com/jaccovanschaik/Tyger.
Tyger requires the GNU C pre-processor (cpp) to be available when generating code (but not when running the generated code). Tyger also requires libjvs, both to compile Tyger itself and to compile the generated code.
When called without any parameters, Tyger displays its usage, which looks like this:
Usage: tyger <options> <input-file> Options: -V, --version Print version and exit. -c, --c-src <C-source-output> Output C source file here. -h, --c-hdr <C-header-output> Output C header file here. -p, --python <python-output> Output python code here. -i, --indent <indent-string> Use this string as indent. Switches accepted by the C code generator --c-packsize Generate packsize functions --c-pack Generate pack functions --c-unpack Generate unpack functions --c-clear Generate clear functions --c-destroy Generate destroy functions --c-print Generate print functions Switches accepted by the Python code generator --py-pack Generate pack functions --py-unpack Generate unpack functions --py-recv Generate recv functions --py-mx-send Generate MX send functions --py-mx-bcast Generate MX broadcast functions
Normally, Tyger will be called with the name of an input file containing type descriptions,
one or more options telling it what output files to generate, and a number of options telling
it which functions to generate for each type and each language. Tyger tries to be smart about
which header files it includes (when generating C code) or modules it imports (when generating
Python code) so it will only add #include
or import
statements to
the generated code if they are required.
Tyger can generate C or Python code. For C code it will generate:
For Python code it will generate a single Python file that contains, for each defined type:
Packer
class that knows how to pack and unpack the type.
Tyger supports the following data types:
Booleans represent a true or false value.
In C and Python, the type used for these is bool.
Booleans are serialized as a single byte, where 0 represents false and 1 represents true.
Integers are simple whole numbers. Their name is either int
or uint
(to indicate a signed or an unsigned integer), followed by 8, 16, 32 or 64 (to indicate the
number of bits they contain).
In C, the type used for these is one of the stdint.h
types (uint8_t
etc). In Python the type for all of these is int
.
Integers are serialized as one or more bytes, most-significant byte first.
Floating point numbers are standard IEEE-754 floating point numbers. Their name is
float
followed by either 32 (to indicate a standard, single precision float with
32 bits) or 64 (to indicate a double precision float with 64 bits).
In C, the type for these is
float
or double
. In Python, the type for both of these is
float
.
Floating point numbers are serialized as a 4 or 8-byte sequence, most significat byte first.
There are two types of strings: astring
is a string of single-byte ASCII
characters, ustring
is a string of multibyte Unicode characters.
In C, these are arrays of char
or wchar_t
characters. In Python 2,
str
objects are used for astrings
and unicode
objects
for ustrings
. In Python 3, str
objects are used for both
astrings
and ustrings
.
Strings are serialized using a 4-byte unsigned big-endian byte (not character!)
count, followed by the character data (encoded as UTF-8 in the case of a
ustring
).
Enumerations are simple mappings of symbolic names to integers, as used in many programming languages.
In C these are translated to enums
. In Python, they are translated to a class
that contains class variables with the given names and the given value.
They are serialized as unsigned big-endian integers. The number of bytes used is dependent on their maximum value (one byte if the maximum value is 255 or less, two bytes if it is 65535 or less, etc.) The maximum size allowed is four bytes, for a maximum enum value of 4,294,967,295 or 232 - 1.
These are abstract types containing multiple internal fields. Fields may be designated optional by prefixing the field with the keyword "opt".
In C these are translated to typedef struct
definitions. For each field the
struct contains, a struct member is added. If a field is optional, a pointer to a struct
member is added. If an optional field is not present this pointer is set to NULL.
In Python, structs are translated to classes that have a member variable for each field.
They are serialized by simply serializing each field in turn. Optional fields are preceded by a byte set to 1 if the field is present (in which case it follows immediately) or 0 if it isn't.
These are abstract types that contain one of a number of possible internal fields, as indicated by a "discriminator". A discriminator must be an integer type. A field may be void, indicating that there is no additional data required for that discriminator value.
In C, these are translated to a typedef struct
that contains the discriminator
and a union
u that contains the internal fields. In Python, it is
translated to a class that contains the discriminator and a member named u that is
set to the indicated internal field.
Unions are serialized by first serializing the discriminator as an unsigned big-endian 32-bit integer and then the internal field indicated by the discriminator.
Arrays consist of a number of identically-typed fields.
In C these are translated to standard arrays. In Python they are translated to lists.
They are serialized by first serializing the member count as an unsigned big-endian 32-bit integer and then the contained members.
A constant has a name and a value. The value can be any of the built-in types:
int
, float
, astring
and ustring
.
In C a constant is defined as a const
in the generated .c file and declared as an
extern const
in the accompanying .h file. In Python, it is set as a global
variable in the generated .py file.
Constants are not serialized. They are available to both the sender and receiver (or writer and reader) in the generated code.
This section describes the input files for Tyger. These are used to specify, using a high-level description, the types to be generated.
The examples shown here are derived from the test code that comes with Tyger. So if you want
to see them in their "natural habitat", see some more examples, and see the C and Python code
that is generated based on these definitions: go to the top-level directory of the software,
type "make test" and look at the generated code (Objects.c
,
Objects.h
and Objects.py
).
New types can be aliased from existing ones with the following syntax:
<new_type_name> = <existing_type_name>
For example, to define a new type called Coordinate
and make it a signed 32-bit
integer you would use:
Coordinate = int32
Structs are defined as follows:
<new_type_name> = struct { <type_of_first_element> <name_of_first_element> <type_of_second_element> <name_of_second_element> ... }
You can add an arbitrary number of fields to a struct.
So if you wanted to define a type called Object
that contains an ASCII string
name
, an optional Unicode string creator
, a boolean
visible
and a field named shape
with type Shape
, you'd
do that as follows:
Object = struct { astring name opt ustring creator bool visible Shape shape }
This is how an enumeration is defined:
<new_type_name> = enum { <first_symbolic_name> [ = <first_value> ] <second_symbolic_name> [ = <second_value> ] ... }
Again, you can add arbitrary number of entries, as long as the greatest value is 232 - 1 or less.
As indicated, values are optional. If a value is not specified, it is set to the previous one incremented by 1. If the first value is not specified it is set to 0.
For example, if we wanted to create an enum called ShapeType
to indicate
different types of shapes, we might do it like this:
ShapeType = enum { ST_NONE = 0 ST_LINE = 1 ST_POLYGON = 2 ST_PLANE = 3 ST_SPHERE = 4 }
A union is defined like this:
<new_type_name> = union(<discriminator_type> <discriminator_name>) { <discriminator_value_for_first_choice>: <type_of_first_choice> <name_of_first_choice> <discriminator_value_for_second_choice>: <type_of_second_choice> <name_of_second_choice> ... }
Based on the value of the discriminator (i.e. variable with name
discriminator_name
) the associated field will be active.
For example, let's say we wanted to define a union name Shape
, which would
contain one of a number of internal fields based on shape_type
(which is a
discriminator field of type ShapeType
as shown above). This is how we would do
that:
Shape = union(ShapeType shape_type) { ST_NONE: void ST_LINE: Line line ST_POLYGON: Polygon polygon ST_PLANE: Plane plane ST_SPHERE: Sphere sphere }
Constants are defined like this:
<constant_name> = const <constant_type> <constant_value>
For example, let's say you wanted to define a constant unsigned 32-bit int named
Dimensions
with the value 3, you'd do this like so:
Dimensions = const uint32 3
This section describes the command line options for the tyger
command. First
we'll talk about the general options, then we'll look at the options specific to the C and
Python code generators.
This section describes Tygers general options.
-V
or --version
Print version and exit.
-c
or --c-src <C-source-output>
Output C source file to the file given in <C-source-output>
.
-h
or --c-hdr <C-header-output>
Output C header file to the file given in <C-header-output>
.
-p
or --python <python-output>
Output python code to the file given in <python-output>
.
-i
or --indent <indent-string>
Use the string given in <indent-string>
as one level of indent.
This section explains the switches accepted by the C code generator.
--c-packsize
Generate packsize functions.
--c-pack
Generate pack functions
--c-unpack
Generate unpack functions
--c-wrap
Generate wrap functions
--c-unwrap
Generate unwrap functions
--c-read-fd
Generate functions to read from an fd
--c-write-fd
Generate functions to write to an fd
--c-read-fp
Generate functions to read from an FP
--c-write-fp
Generate functions to write to an FP
--c-print-fp
Generate print functions
--c-create
Generate create functions
--c-set
Generate set functions
--c-copy
Generate copy functions
--c-clear
Generate clear functions
--c-destroy
Generate destroy functions
--c-mx-send
Generate MX send functions
--c-mx-bcast
Generate MX broadcast functions
This section explains the switches accepted by the Python code generator. In Python it is much easier to get added functionality by combining generated functions and existing language features, so we need far fewer generated functions.