Storing Lists of Structs¶
Storing lists containing elements that are of type struct is supported.
Declaring Lists of Structs¶
Declaring columns that contain lists whose elements are of type struct
is done by composing the list type
and struct type declarations.
For example, the following declares a one-dimensional list containing struct-type elements that have three named fields:
{
"fields": [
{
"name": "structlist", "type": "list1d",
"contains": { "type": "struct",
"fields": [
{"name": "field0", "type": "float"},
{"name": "field1", "type": "int32"},
{"name": "field2", "type": "list1d", "contains": {"type": "float"}}
]
}
}
]
}
To declare two- or three-dimensional lists,
one would simply swap the type field for the structlist column
from list1d to either list2d or list3d.
Writing Lists of Structs¶
Writing to columns that contain lists of struct-type elements is done by
building up instances of std::vector containing elements of either
field_map_t or field_buffer_t.
For example, writing a one-dimensional list containing the three-field struct elements described above would be done as follows:
namespace pw = parquetwriter;
// 1D vector of struct elements
std::vector<pw::field_map_t> structlist_data;
// fill the 1D vector with struct data elements
for(...) {
// generate struct field data
float field0_data{42.42};
int32_t field1_data{42};
std::vector<float> field2_data{42.0, 42.1, 42.2};
// create the struct element
pw::field_map_t struct_data{
{"field0", field0_data},
{"field1", field1_data},
{"field2", field2_data}
};
// append to the struct list
structlist_data.push_back(struct_data);
}
// call "fill" as usual
writer.fill("structlist", structlist_data);
The two-dimensional case:
namespace pw = parquetwriter;
// 2D vector of struct elements
std::vector<std::vector<pw::field_map_t>> structlist_data;
// fill the 2D vector with struct data elements
for(...) {
std::vector<pw::field_map_t> inner_structlist_data;
for(...) {
pw::field_map_t struct_data{
{"field0", field0_data},
{"field1", field1_data},
{"field2", field2_data}
};
inner_structlist_data.push_back(struct_data);
}
structlist_data.push_back(inner_structlist_data);
}
// call "fill" as usual
writer.fill("structlist", structlist_data);
And the three-dimensional case:
namespace pw = parquetwriter;
// 3D vector of struct elements
std::vector<std::vector<std::vector<pw::field_map_t>>> structlist_data;
// fill the 3D vector with struct data elements
for(...) {
std::vector<std::vector<pw::field_map_t>> inner_structlist_data;
for(...) {
std::vector<pw::field_map_t> inner_inner_structlist_data;
for(...) {
pw::field_map_t struct_data{
{"field0", field0_data},
{"field1", field1_data},
{"field2", field2_data}
};
inner_inner_structlist_data.push_back(struct_data);
}
inner_structlist_data.push_back(inner_inner_structlist_data);
}
structlist_data.push_back(inner_structlist_data);
}
// call "fill" as usual
writer.fill("structlist", structlist_data);
Constraints¶
Warning
The struct type elements contained in lists of struct cannot
themselves contain fields that are of type struct.
For simplicity, any list type data column whose elements are of type struct,
cannot contain struct type elements that have
fields that are themselves of type struct.
For example, the following Parquet file layout declaration is not allowed:
{
"fields": [
{
"name": "structlist",
"type": "list1d",
"contains": {
"type": "struct",
"fields": [
{"name": "field0", "type": "float"},
{
"name": "inner_struct", "type": "struct",
"fields": [{"name": "inner_field0", "type": "float"}]
}
]
}
]
}
Note
The above list1d type column is not allowd since its struct typed
elements are declared as having an internal struct typed column (the field named
inner_struct).