Storing Basic Value Types¶
parquet-writer currently has support for storing boolean and numeric
data types.
The following table describes the supported value types for data
to be written to an output Parquet file, with the parquet-writer name
that would be used in the JSON file providing the layout declaration
for parquet-writer.
Value Types |
|
|---|---|
Boolean |
|
Signed Integers |
|
Unsigned Integers |
|
Floating Point |
|
In addition to writing flat data columns of these basic value types,
parquet-writer supports writing data columns that are
nested data structures composed of fields whose data is comprised
of these basic value types.
More specifically, parquet-writer supports:
1, 2, and 3 dimensional lists of these value types
Struct data types having any number of named fields (like a C/C++
struct)1, 2, and 3 dimensional lists of struct data type
More information on how to declare and write Parquet files containing these nested structures is contained in later sections.
Declaring Columns of Basic Value Types¶
Declaring a column layout for storing values of the basic data types above is done using JSON as follows:
{
"fields": [
{"name": "column0", "type": "float"},
{"name": "column1", "type": "int32"}
]
}
That is, one must specify a fields array containing JSON objects
of the form:
{"name": "<string>", "type": "<value-type>"}
where the name field
can be any arbitrary string. The type field must be one of the
parquet-writer names for the supported basic value types appearing in the
second column in the table above.
Each element in the top-level fields array in a given JSON
layout configuration will have a one-to-one correspondence with a data column
appearing in the output Parquet file.
Writing Columns of Basic Value Types¶
Assuming we have the file layout from above,
one would simply need to have variables of the corresponding C++
type to provide to the Writer class’ fill function, along with
the name of the column to which you want to write the data:
#include "parquet_writer.h"
...
float field0_data = 42.5;
int32_t field1_data = 42;
...
writer.fill("column0", field0_data);
writer.fill("column1", field1_data);
...
writer.end_row();
Note that the order in which the columns are filled is not important. One could also do the filling in this order:
writer.fill("column1", field1_data);
writer.fill("column0", field0_data);