Storing Basic Value Types¶
parquet-writer
currently has support for storing boolean and numeric
data types.
The following table describes the supported value types for data
to be written to an output Parquet file, with the parquet-writer
name
that would be used in the JSON file providing the layout declaration
for parquet-writer
.
Value Types |
|
---|---|
Boolean |
|
Signed Integers |
|
Unsigned Integers |
|
Floating Point |
|
In addition to writing flat data columns of these basic value types,
parquet-writer
supports writing data columns that are
nested data structures composed of fields whose data is comprised
of these basic value types.
More specifically, parquet-writer
supports:
1, 2, and 3 dimensional lists of these value types
Struct data types having any number of named fields (like a C/C++
struct
)1, 2, and 3 dimensional lists of struct data type
More information on how to declare and write Parquet files containing these nested structures is contained in later sections.
Declaring Columns of Basic Value Types¶
Declaring a column layout for storing values of the basic data types above is done using JSON as follows:
{
"fields": [
{"name": "column0", "type": "float"},
{"name": "column1", "type": "int32"}
]
}
That is, one must specify a fields
array containing JSON objects
of the form:
{"name": "<string>", "type": "<value-type>"}
where the name
field
can be any arbitrary string. The type
field must be one of the
parquet-writer
names for the supported basic value types appearing in the
second column in the table above.
Each element in the top-level fields
array in a given JSON
layout configuration will have a one-to-one correspondence with a data column
appearing in the output Parquet file.
Writing Columns of Basic Value Types¶
Assuming we have the file layout from above,
one would simply need to have variables of the corresponding C++
type to provide to the Writer
class’ fill
function, along with
the name of the column to which you want to write the data:
#include "parquet_writer.h"
...
float field0_data = 42.5;
int32_t field1_data = 42;
...
writer.fill("column0", field0_data);
writer.fill("column1", field1_data);
...
writer.end_row();
Note that the order in which the columns are filled is not important. One could also do the filling in this order:
writer.fill("column1", field1_data);
writer.fill("column0", field0_data);