Storing Lists of Basic Value Types¶
Storing one, two, and three dimensional lists whose elements
are any of the basic value types is supported
by parquet-writer
.
The table below describes the naming convention for the list type columns:
List Type |
|
---|---|
One dimensional list |
|
Two dimensional list |
|
Three dimensional list |
|
Declaring List Type Columns¶
Declaring columns whose data type is a list of the basic value types is done using JSON as follows:
{
"fields": [
{"name": "column0", "type": "list1d", "contains": {"type": "float"}},
{"name": "column1", "type": "list2d", "contains": {"type": "uint32"}},
{"name": "column2", "type": "list3d", "contains": {"type": "double"}}
]
}
As can be seen in the above, declaring list types for output data columns
requires an additional contains
object in the JSON declaration of the column.
The contains
object defines the data type to be stored in the output,
variable-lengthed list.
Writing List Type Columns¶
Writing to list type columns is done by using instances of std::vector
containing the C++ type associated with the storage type declared in the contains
field of the list type column.
For example, taking the layout declaration from the previous section:
// data for the 1D list column:
// {"name": "column0", "type": "list1d", "contains": {"type": "float"}}
std::vector<float> column0_data{1.2, 2.3, 3.4, 4.3};
// data for the 2D list column:
// {"name": "column1", "type": "list2d", "contains": {"type": "uint32"}}
std::vector<std::vector<uint32_t>> column1_data{
{1}, {2, 2}, {3, 3, 3}
};
// data for the 3D list column:
// {"name": "column2", "type": "list3d", "contains": {"type": "double"}}
std::vector<std::vector<std::vector<double>>> column2_data{
{ {1.1}, {2.2, 2.2}, {3.3, 3.3, 3.3} },
{ {3.1, 3.1, 3.1}, {2.2, 2.2}, {1.1} }
};
// fill using the usual "fill" method
writer.fill("column0", column0_data);
writer.fill("column1", column1_data);
writer.fill("column2", column2_data);
Columns of list type can be variably lengthed. That is, rows of list type columns do not all have to have the same number of contained elements (no padding is necessary). Indeed, one row of a given list column can have many elements while the next row is empty. For example:
// fill a row's "column0" field with length 3 1D list
std::vector<float> column0_data{1.2, 2.3, 3.4};
writer.fill("column0", column0_data);
writer.end_row();
// fill a second row's "column0" field with a length 0 1D list
column0_data.clear();
writer.fill("column0", column_data);
writer.end_row();