DBToaster - DBToaster SQL Reference

< Command-Line Reference | DBToaster StdLib Reference >

DBToaster SQL Reference

1. CREATE FUNCTION

Declare a forward reference to a user defined function in the target language.

create_function :=
  CREATE FUNCTION <name> ( <arguments> ) RETURNS <type> AS <definition>

arguments := [<var_1> <type_1> [, <var_2> <type_2> [, ...]]]

definition := EXTERNAL '<external_name>';

Use create function to declare a user-defined primitive-valued function in the target language. At this time, DBToaster does not create target-language specific declarations are created, so the function must be in-scope within the generated code. Once declared, a UDF may be used in any arithmetic expression within DBToaster. For example, the following block illustrates use of the math.h cos() and sin() functions for C++ targetted code.

CREATE FUNCTION cos ( x double ) RETURNS double AS EXTERNAL 'sin';
CREATE FUNCTION sin ( x double ) RETURNS double AS EXTERNAL 'cos';

SELECT r.distance * cos(r.angle) AS x, 
       r.distance * sin(r.angle) AS y,
FROM RadialMeasurements r;

2. CREATE TABLE/STREAM

Declare a relation for use in the query.

create_statement := 
  CREATE { TABLE | STREAM } <name> ( <schema> ) 
         [<source_declaration>]

schema := [<var_1> <type_1> [, <var_2> <type_2> [, ...]]]

source_declaration := source_stream source_adaptor

source_stream := 
  FROM FILE '<path>' {
      FIXEDWIDTH <bytes_per_row>
    | LINE DELIMITED
    | '<delim_string>' DELIMITED
  } 

source_adaptor := 
  <adaptor_name> (
    [<param_1> := '<value>' [, <param_2> := '<value>' [, ...]]]
  )

A create statement defines a relation named name with the indicated schema and declares a method for automatically populating/updating rows of that relation.

Each relation may be declared to be either a Stream or a Table:

Tables are static data sources. A table is read in prior to query monitoring, and must remain constant once monitoring has started.
Streams are dynamic data sources. Stream updates are read in one tuple at a time as data becomes available, and query views are updated after every update to a stream.

The source declaration allows DBToaster (either in the interpreter, or the generated source code) to automatically update the relation. The source declaration is optional when using DBToaster to generate source code. User programs may manially inject updates to relations, or manually declare sources during initialization of the DBToaster-genertaed source code.

A source declaration consists of stream and adaptor components. The stream component defines where data should be read from, and how records in the data are delimited. At present, DBToaster only supports reading tuples from files.

If the same file is referenced multiple times, the file will only be scanned once, and events will be generated in the order in which they appear in the file.

The adaptor declares how to parse fields out of each record. See below for documentation on DBToaster's standard adaptors package.

2.1. Example

CREATE STREAM R(a int, b date)
FROM FILE 'examples/data/r.dat' LINE DELIMITED 
CSV (fields := '|')

3. INCLUDE

Import a secondary SQL file.

include_statement := INCLUDE 'file'

Import the contents of the selected file into DBToaster. The file path is interpreted relative to the current working directory.

4. SELECT

Declare a query to monitor.

select_statement := 
  SELECT <target_1> [, <target_2> [, ...]] 
  FROM <source_1> [, <source_2> [, ...]]
  WHERE <condition>
  [GROUP BY <group_vars>]

target := <expression> [[AS] <target_name>] | * | *.* 
        | <source_name>.*

source := <relation_name> [[AS] <source_name>]
  | (<select_statement>) [AS] <source_name>
  | <source> [NATURAL] JOIN <source> [ON <condition>]

expression :=  (<expression>) | <int> | <float> | '<string>' 
  | <var> | <source>.<var>
  | <expression> { + | - | * | / } <expression>
  | -<expression>
  | (SELECT <expression> FROM ...)
  | SUM(<expression>) | COUNT(* | <expression>) 
  | AVG(<expression>) | COUNT(DISTINCT [var1, [var2, [...]]])
  | <inline_function>([<expr_1> [, <expr_2> [, ...]]])
  | DATE('yyyy-mm-dd')
  | EXTRACT({year|month|day} FROM <date>)
  | CASE <expression> WHEN <expression> THEN <expression> [, ...] 
                      [ELSE <expression>] END
  | CASE WHEN <condition> THEN <expression> [, ...] 
         [ELSE <expression>] END

condition := (<condition>) | true | false | not (<condition>)
  | <expression> { < | <= | > | >= | = | <> } <expression>
  | <expression> { < | <= | > | >= | = | <> } { SOME | ALL } 
                   <select_statement>
  | <condition> AND <condition> | <condition> OR <condition>
  | EXISTS <select_statement>
  | <expression> BETWEEN <expression> AND <expression>
  | <expression> IN <select_statement>
  | <expression> LIKE <matchstring>
  

DBToaster SQL's SELECT operation differs from the SQL-92 standard. Full support for the SQL-standard SELECT is planned, and will be part of a future release.

Aggregates

DBToaster currently has support for the SUM, COUNT, COUNT DISTINCT, and AVG aggregates. MIN and MAX are not presently supported. Also, see the note on NULL values below.

Types

DBToaster presently supports integer, floating point, string, and date types. char and varchar types are treated as strings of unbounded length.

Conditional Predicates

DBToaster presently supports boolean expressions over arithmetic comparisons (=, <>, <, <=, >, >=), existential/universal quantification (SOME/ALL/EXISTS), BETWEEN, IN, and LIKE.

SELECT syntax

SELECT [FROM] [WHERE] [GROUP BY] queries are supported. The DISTINCT, UNION, LIMIT, ORDER BY, and HAVING clauses are not presently supported. The HAVING operator may be simulated by use of nested queries:

SELECT A, SUM(B) AS sumb FROM R HAVING SUM(C) > 2

is equivalent to

SELECT A, sumb FROM (
  SELECT A, SUM(B) AS sumb, SUM(C) as sumc FROM R
)
WHERE sumc > 2

NULL values

DBToaster does not presently support NULL values. The SUM or AVG of an empty table is 0, and not NULL. OUTER JOINS are not supported.

Incremental Computation with Floating Point Numbers

There are several subtle issues that arise when performing incremental computations with floating point numbers:

When using division in conjunction with aggregates, be aware that SUM and AVG return 0 for empty tables. Once a result value becomes NAN or INFTY, it will no longer be incrementally maintainable. We are working on a long-term fix. In the meantime, there are two workarounds for this problem. For some queries, you can coerce the aggregate value to be nonzero using the LISTMAX standard function (See the example query tpch/query8.sql in the DBToaster distribution for an example of how to do this). For most queries, the -F EXPRESSIVE-TLQs optimization flag will typically materialize the divisor as a separate map (the division will be evaluated when accessing results).
The floating point standards for most target languages (including OCaml, Scala, and C++) do not have well-defined semantics for equality tests over floating point numbers. Consequently, queries with floating-point group-by variables might produce non-unique groups (since two equivalent floating point numbers are not considered to be equal). We are working on a long-term fix. In the meantime, the issue can be addressed by using CAST_INT, or CAST_STRING to convert floating point numbers into canonical forms.

Other Notes

DBToaster does not allow non-aggregate queries to evaluate to singleton values. That is, the query
SELECT 1 FROM R WHERE R.A = (SELECT A FROM S)
is a compile-time error in DBToaster (while such a query would instead produce a run time error if it returned more than one tuple in SQL-92). An equivalent, valid query would be:

SELECT 1 FROM R WHERE R.A IN (SELECT A FROM S)
Variable scoping rules are slightly stricter than the SQL standard (you may need to use fully qualified names in some additional cases).

See DBToaster StdLib Reference for the documentation on DBToaster's standard function library.

DBToaster maintains query results in the form of either multi-key dictionaries (a.k.a., maps, hashmaps, etc...), or singleton primitive-typed values. Each query result is assigned a name based on the query (see documentation for your target language's code generator for details on how to access the results).

Non-aggregate queries produce a dictionary named "COUNT". Each entry in the dictionary has a key formed from the target fields of the SELECT. Values are the number of times the tuple occurs in the output (i.e., the query includes an implicit group-by COUNT(*) aggregate).
Singleton (non-grouping) aggregate queries produce a primitive-typed result for each aggregate target in the SELECT. The result names are assigned based on the name of each target (i.e., using the name following the optional AS clause, or a procedurally generated name otherwise).
Group-by aggregate queries produce a dictionary for each aggregate target. The non-aggregate (group-by) targets are used as keys for each entry (as for non-aggregate queries), and the value is the aggregate value for each group. The dictionaries are named based on the name of each aggregate target (as for singleton aggregate queries)

If multiple SELECT statements occur in the same file, the result names of each query will be prefixed with "QUERY#_", where # is an integer.

5. EXAMPLES

CREATE STREAM R(A int, B int);
CREATE STREAM S(B int, C int);

5.1. Non-aggregate query

SELECT * FROM R;

Generates a single dictionary named COUNT, mapping from the tuple "<R.A, R.B>" to the number of time each tuple occurs in R.

5.2. Aggregate query

SELECT SUM(R.A * S.C) AS sum_ac FROM R NATURAL JOIN S;

Generates a single constant integer named SUM_AC containing the query result.

5.3. Aggregate group-by query (one group-by var)

SELECT S.C, SUM(R.A) AS sum_a 
FROM R NATURAL JOIN S 
GROUP BY S.C;

Generates a dictionary named SUM_A mapping from values of S.C to the sums of R.A.

5.4. Aggregate group-by query (multiple group-by vars)

SELECT R.A, R.B, COUNT(*) AS foo FROM R GROUP BY R.A, R.B;

Generates a single dictionary named FOO, mapping from the tuple "<R.A, R.B>" to the number of time each tuple occurs in R.

5.5. Query with multiple aggregates

SELECT SUM(R.A) AS sum_a, SUM(S.C) AS sum_c 
FROM R NATURAL JOIN S 
GROUP BY S.C;

Generates two dictionaries named SUM_A and SUM_C, respectively containing the sums of R.A and S.C.

5.6. Multiple Queries

SELECT SUM(R.A) AS SUM_A FROM R;
SELECT SUM(S.C) SUM_C FROM S;

Generates two dictionaries named QUERY_1_SUM_A and QUERY_2_SUM_C, respectively containing the sums of R.A and S.C.

< Command-Line Reference | DBToaster StdLib Reference >