1. CREATE FUNCTION
Declare a forward reference to a user defined function in the target language.Use create function to declare a user-defined primitive-valued function in the target language. At this time, DBToaster does not create target-language specific declarations are created, so the function must be in-scope within the generated code. Once declared, a UDF may be used in any arithmetic expression within DBToaster. For example, the following block illustrates use of the math.h cos() and sin() functions for C++ targetted code.
2. CREATE TABLE/STREAM
Declare a relation for use in the query.A create statement defines a relation named name with the indicated schema and declares a method for automatically populating/updating rows of that relation.
Each relation may be declared to be either a Stream or a Table:
- Tables are static data sources. A table is read in prior to query monitoring, and must remain constant once monitoring has started.
- Streams are dynamic data sources. Stream updates are read in one tuple at a time as data becomes available, and query views are updated after every update to a stream.
The source declaration allows DBToaster (either in the interpreter, or the generated source code) to automatically update the relation. The source declaration is optional when using DBToaster to generate source code. User programs may manially inject updates to relations, or manually declare sources during initialization of the DBToaster-genertaed source code.
A source declaration consists of stream and adaptor components. The stream component defines where data should be read from, and how records in the data are delimited. At present, DBToaster only supports reading tuples from files.
If the same file is referenced multiple times, the file will only be scanned once, and events will be generated in the order in which they appear in the file.
The adaptor declares how to parse fields out of each record. See below for documentation on DBToaster's standard adaptors package.
2.1. Example
3. INCLUDE
Import a secondary SQL file.4. SELECT
Declare a query to monitor.DBToaster SQL's SELECT operation differs from the SQL-92 standard. Full support for the SQL-standard SELECT is planned, and will be part of a future release.
- Aggregates
- DBToaster currently has support for the SUM, COUNT, COUNT DISTINCT, and AVG aggregates. MIN and MAX are not presently supported. Also, see the note on NULL values below.
- Types
- DBToaster presently supports integer, floating point, string, and date types. char and varchar types are treated as strings of unbounded length.
- Conditional Predicates
- DBToaster presently supports boolean expressions over arithmetic comparisons (=, <>, <, <=, >, >=), existential/universal quantification (SOME/ALL/EXISTS), BETWEEN, IN, and LIKE.
- SELECT syntax
- SELECT [FROM] [WHERE] [GROUP BY] queries are supported. The DISTINCT, UNION, LIMIT, ORDER BY, and HAVING clauses are not presently supported. The HAVING operator may be simulated by use of nested queries:
SELECT A, SUM(B) AS sumb FROM R HAVING SUM(C) > 2is equivalent toSELECT A, sumb FROM ( SELECT A, SUM(B) AS sumb, SUM(C) as sumc FROM R ) WHERE sumc > 2
- NULL values
- DBToaster does not presently support NULL values. The SUM or AVG of an empty table is 0, and not NULL. OUTER JOINS are not supported.
- Incremental Computation with Floating Point Numbers
- There are several subtle issues that arise when performing incremental computations with floating point numbers:
- When using division in conjunction with aggregates, be aware that SUM and AVG return 0 for empty tables. Once a result value becomes NAN or INFTY, it will no longer be incrementally maintainable. We are working on a long-term fix. In the meantime, there are two workarounds for this problem. For some queries, you can coerce the aggregate value to be nonzero using the LISTMAX standard function (See the example query tpch/query8.sql in the DBToaster distribution for an example of how to do this). For most queries, the -F EXPRESSIVE-TLQs optimization flag will typically materialize the divisor as a separate map (the division will be evaluated when accessing results).
- The floating point standards for most target languages (including OCaml, Scala, and C++) do not have well-defined semantics for equality tests over floating point numbers. Consequently, queries with floating-point group-by variables might produce non-unique groups (since two equivalent floating point numbers are not considered to be equal). We are working on a long-term fix. In the meantime, the issue can be addressed by using CAST_INT, or CAST_STRING to convert floating point numbers into canonical forms.
- Other Notes
-
- DBToaster does not allow non-aggregate queries to evaluate to singleton values. That is, the query
SELECT 1 FROM R WHERE R.A = (SELECT A FROM S)is a compile-time error in DBToaster (while such a query would instead produce a run time error if it returned more than one tuple in SQL-92). An equivalent, valid query would be:
SELECT 1 FROM R WHERE R.A IN (SELECT A FROM S) - Variable scoping rules are slightly stricter than the SQL standard (you may need to use fully qualified names in some additional cases).
- DBToaster does not allow non-aggregate queries to evaluate to singleton values. That is, the query
See DBToaster StdLib Reference for the documentation on DBToaster's standard function library.
DBToaster maintains query results in the form of either multi-key dictionaries (a.k.a., maps, hashmaps, etc...), or singleton primitive-typed values. Each query result is assigned a name based on the query (see documentation for your target language's code generator for details on how to access the results).
- Non-aggregate queries produce a dictionary named "COUNT". Each entry in the dictionary has a key formed from the target fields of the SELECT. Values are the number of times the tuple occurs in the output (i.e., the query includes an implicit group-by COUNT(*) aggregate).
- Singleton (non-grouping) aggregate queries produce a primitive-typed result for each aggregate target in the SELECT. The result names are assigned based on the name of each target (i.e., using the name following the optional AS clause, or a procedurally generated name otherwise).
- Group-by aggregate queries produce a dictionary for each aggregate target. The non-aggregate (group-by) targets are used as keys for each entry (as for non-aggregate queries), and the value is the aggregate value for each group. The dictionaries are named based on the name of each aggregate target (as for singleton aggregate queries)
If multiple SELECT statements occur in the same file, the result names of each query will be prefixed with "QUERY#_", where # is an integer.