DBToaster - C++ Code Generation

< DBToaster Adaptors Reference | Scala Code Generation >

C++ Code Generation

Warning: This API is subject to changes in future releases.

Note: To compile and run queries using the C++ backend requires g++ 4.8 or above. Please refer to Installation for more details.

1. Quickstart Guide

DBToaster generates C++ code for incrementally maintaining the results of a given set of queries when cpp is set as the output language (using the -l flag). In this case the compiler produces a C++ header file that contains a set of datastructures (tlq_t, data_t and Program) required for linking with the driver program.

Let's consider the following SQL query:

$> cat examples/queries/simple/rs_example1.sql
CREATE TABLE R(A int, B int) 
  FROM FILE 'examples/data/tiny/r.dat' LINE DELIMITED
  CSV (fields := ',');

CREATE STREAM S(B int, C int) 
  FROM FILE 'examples/data/tiny/s.dat' LINE DELIMITED
  CSV (fields := ',');

SELECT SUM(r.A*s.C) as RESULT FROM R r, S s WHERE r.B = s.B;

The corresponding C++ header file can be obtained by running:

$> bin/dbtoaster examples/queries/simple/rs_example1.sql -l cpp -o rs_example1.hpp

Alternatively, DBToaster can build a standalone binary (if the -c [binary name] flag is present) by compiling the generated header file against the driver program lib/dbt_c++/main.cpp, which executes the generated code and prints the results.

Running the compiled binary will result in the following output:

$> ./rs_example1
<snap>
        <RESULT>156</RESULT>
</snap>

If the generated binary is run with the --async flag, it will also print intermediary results as frequently as possible while the generated program is running in a separate thread.

$> ./rs_example1 --async
Initializing program:
Running program:
<snap>
        <RESULT>0</RESULT>
</snap>
<snap>
        <RESULT>0</RESULT>
</snap>
<snap>
        <RESULT>0</RESULT>
</snap>
<snap>
        <RESULT>0</RESULT>
</snap>
<snap>
        <RESULT>9</RESULT>
</snap>
<snap>
        <RESULT>74</RESULT>
</snap>
<snap>
        <RESULT>141</RESULT>
</snap>
Printing final result:
<snap>
        <RESULT>156</RESULT>
</snap>

2. C++ API Guide

The DBToaster C++ codegenerator produces a header file containing three main type definitions in the dbtoaster namespace: tlq_t, data_t and Program. Additionally snapshot_t is pre-defined as a garbage collected pointer to tlq_t. What follows is a brief description of these types, while a more detailed presentation can be found in the Reference section.

tlq_t encapsulates the materialized views directly needed for computing the results and offers functions for retrieving them.
data_t extends tlq_t with auxiliary materialized views needed for maintaining the results and offers trigger functions for incrementally updating them.
Program represents the execution engine of the generated program. It encapsulates a data_t object and provides implementations to a set of abstract functions of the IProgram class used for running the program. Default implementations for some of these functions are inherited from the ProgramBase class while others are generated depending on the previously defined tlq_t and data_t types.

2.1. Executing the Program

The execution of a program can be controlled through the functions: IProgram::init(), IProgram::run(), IProgram::is_finished(), IProgram::process_streams() and IProgram::process_stream_event().

virtual void IProgram::init(): Loads the tuples of static tables and performs initialization of materialized views based on that data. The definition of this functions is generated as part of the Program class.
void IProgram::run( bool async = false ): Executes the program by invoking the Program::process_streams() function. If parameter async is set to true the execution takes place in a separate thread. This is a standard function defined by the IProgram class.
bool IProgram::is_finished(): Tests whether the program has finished or not. Especially relevant when the program is run in asynchronous mode. This is a standard function defined by the IProgram class.
virtual void IProgram::process_streams(): Reads stream events from various sources and invokes the IProgram::process_stream_event() on each event. Default implementation of this function (ProgramBase::process_streams()) reads events from the sources specified in the SQL program.
virtual void IProgram::process_stream_event(event_t& ev): Processes each stream event passing through the system. Default implementation of this function (ProgramBase::process_stream_event()) does incremental maintenance work by invoking the trigger function corresponding to the event type ev.type for stream ev.id with the arguments contained in ev.data.

2.2. Retrieving the Results

The snapshot_t IProgram::get_snapshot() function returns a snapshot of the results of the program. The query results can then be obtained by calling the appropriate get_TLQ_NAME() function on the snapshot object as described in the reference of tlq_t. If the program is running in asynchronous mode it is guaranteed that the taken snapshot is consistent.

Currently, the mechanism for taking snapshots is trivial, in that a snapshot consists of a full copy of the tlq_t object associated with the program. Consequently, the time required to obtain such a snapshot is linear in the size of the results set.

2.3. Basic Example

We will use as an example the C++ code generated for the rs_example1.sql SQL program introduced above. In the interest of clarity some implementation details are omitted.

$> bin/dbtoaster examples/queries/simple/rs_example1.sql -l cpp -o rs_example1.hpp
#include <lib/dbt_c++/program_base.hpp>

namespace dbtoaster {

    /* Definitions of auxiliary maps for storing materialized views. */
    ...
    ...
    ...

    /* Type definition providing a way to access the results of the SQL */
    /* program */
    struct tlq_t{
        tlq_t()
        {}
    
        ...
        
        /* Functions returning / computing the results of top level */
        /* queries */
        long get_RESULT(){
            ...
        }

    protected:

        /* Data structures used for storing/computing top level queries */
        ...
    };
    
    /* Type definition providing a way to incrementally maintain the */
    /* results of the SQL program */
    struct data_t : tlq_t{
        data_t()
        {}
    
        /* Registering relations and trigger functions */
        void register_data(ProgramBase<tlq_t>& pb) {
            ...
        }

        /* Trigger functions for table relations */
        void on_insert_R(long R_A, long R_B) {
            ...
        }
        
        /* Trigger functions for stream relations */
        void on_insert_S(long S_B, long S_C) {
            ...
        }
        
        void on_delete_S(long S_B, long S_C) {
            ...
        }
        
        void on_system_ready_event() {
            ...
        }

    private:

        /* Data structures used for storing materialized views */
        ...
    };

    /* Type definition providing a way to execute the SQL program */
    class Program : public ProgramBase<tlq_t>
    {
    public:
        Program(int argc = 0, char* argv[] = 0) : 
                ProgramBase<tlq_t>(argc,argv) 
        {
            data.register_data(*this);

            /* Specifying data sources */
            ...
        }

        /* Imports data for static tables and performs view */
        /* initialization based on it. */
        void init() {
            process_tables();
            data.on_system_ready_event();
        }
    
        /* Saves a snapshot of the data required to obtain the results */
        /* of top level queries. */
        snapshot_t take_snapshot(){
            return snapshot_t( new tlq_t((tlq_t&)data) );
        }
    
    private:
        data_t data;
    };

}
}

Below is an example of how the API can be used to execute the generated program and print its results:

#include "rs_example1.hpp"

int main(int argc, char* argv[]) {
    bool async = argc > 1 && !strcmp(argv[1],"--async");
    
    dbtoaster::Program p;
    dbtoaster::Program::snapshot_t snap;

    cout << "Initializing program:" << endl;
    p.init();

    cout << "Running program:" << endl;
    p.run( async );
    while( !p.is_finished() )
    {
       snap = p.get_snapshot();
       cout << "RESULT: " << snap->get_RESULT() << endl;
    }

    cout << "Printing final result:" << endl;
    snap = p.get_snapshot();
    cout << "RESULT: " << snap->get_RESULT() << endl;

    return 0;
}

2.4. Custom Execution

Custom event processing can be performed on each stream event if the virtual function void IProgram::process_stream_event(event_t& ev) is overriden while still delegating the basic processing task of an event to Program::process_stream_event().

Example: Custom event processing.

namespace dbtoaster{
    class CustomProgram_1 : public Program
    {
    public:        
        void process_stream_event(event_t& ev) {
            cout << "on_" << event_name[ev.type] << "_";
            cout << get_relation_name(ev.id) << "(" << ev.data << ")" << endl;

            Program::process_stream_event(ev);
        }        
    };
}

Stream events can be manually read from custom sources and fed into the system by overriding the virtual function void IProgram::process_streams() and calling process_stream_event() for each event read.

Example: Custom event sourcing.

namespace dbtoaster{
    class CustomProgram_2 : public Program
    {
    public:        
        void process_streams() {
            
            for( long i = 1; i <= 10; i++ ) {
                event_args_t ev_args;
                ev_args.push_back(i);
                ev_args.push_back(i+10);
                event_t ev( insert_tuple, get_relation_id("S"), ev_args);

                process_stream_event(ev);
            }
        }        
    };
}

3. C++ Generated Code Reference

3.1. `struct tlq_t`

The tlq_t contains all the relevant datastructures for computing the results of the SQL program, also called the top level queries. It provides a set of functions named get_TLQ_NAME that return the top level query result labeled TLQ_NAME. For our example the tlq_t produced has a function named get_RESULT that returns the query result corresponding to SELECT SUM(r.A*s.C) as RESULT ... in rs_example1.sql.

3.1.1. Queries computing collections

In the example above the result consisted of a single value. If however our query has a GROUP BY clause its result is a collection and the corresponding get_RESULT function will return either a MultiHashMap.

Let's consider the following example:

$> cat examples/queries/simple/rs_example2.sql
CREATE STREAM R(A int, B int) 
  FROM FILE 'examples/data/tiny/r.dat' LINE DELIMITED
  CSV (fields := ',');

CREATE STREAM S(B int, C int) 
  FROM FILE 'examples/data/tiny/s.dat' LINE DELIMITED
  CSV (fields := ',');

SELECT r.B, SUM(r.A*s.C) as RESULT_1, SUM(r.A+s.C) as RESULT_2 FROM R r, S s WHERE r.B = s.B GROUP BY r.B;

The generated code defines two collection types RESULT_1_map and RESULT_2_map and two corresponding entry types: RESULT_1_entry and RESULT_2_entry. These entry structures have a set of key fields corresponding to the GROUP BY clause, in our case R_B, and an additional value field, __av, storing the aggregated value of the top level query for each key in the collection. Finally, tlq_t contains two functions get_RESULT_1 and get_RESULT_2 returning the top level query results as RESULT_1_map and RESULT_2_map objects.

    /* Definitions of auxiliary maps for storing materialized views. */
    struct RESULT_1_entry {
        long R_B; long __av;
        ...
    };
    typedef multi_index_container<RESULT_1_entry, ... > RESULT_1_map;

    ...
    
    struct RESULT_2_entry {
        long R_B; long __av;
        ...
    };
    typedef multi_index_container<RESULT_2_entry, ... > RESULT_2_map;
    
    ...
    
    /* Type definition providing a way to access the results of the SQL program */
    struct tlq_t{
        tlq_t()
        {}
    
        /* Serialization Code */
        ...

        /* Functions returning / computing the results of top level queries */
        RESULT_1_map& get_RESULT_1(){
            ...
        }
        RESULT_2_map& get_RESULT_2(){
            ...
        }

    protected:

        /* Data structures used for storing / computing top level queries */
        RESULT_1_map RESULT_1;
        RESULT_2_map RESULT_2;

    };

If the given query has no aggregates the COUNT(*) aggregate will be computed by default and consequently the resulting collections will be guaranteed not to have any duplicate keys.

3.2. `struct data_t`

The data_t contains all the relevant datastructures and trigger functions for incrementally maintaining the results of the SQL program.

For each stream based relation STREAM_X, present in the SQL program, it provides a pair of trigger functions named on_insert_STREAM_X() and on_delete_STREAM_X() that incrementally maintain the query results in the event of an insertion/deletion of a tuple in STREAM_X. If generating code for the query presented above (rs_example1.sql) the data_t produced has the trigger functions void on_insert_S(long S_B, long S_C) / void on_delete_S(long S_B, long S_C).

For static table based relations only the insertion trigger is required and will get called when processing the static tables in the initialization phase of the program.

3.3. `class Program`

Finally, Program is a class that implements the IProgram interface and provides the basic functionalities for reading static table tuples and stream events from their sources, initializing the relevant datastructures, running the SQL program and retrieving its results.