Storing Structs That Have Struct Fields¶
Storing
struct-type columns that contain fields that are themselves
of type struct
is supported.
Declaring Structs that have Struct Fields¶
Specifying a struct-type column that contains a named field
that is itself of type struct
(with its own additional set of
named fields) is done as follows:
{
"fields": [
{
"name": "outer_struct", "type": "struct",
"fields": [
{"name": "outer_field0", "type": "float"},
{
"name": "inner_struct", "type": "struct",
"fields": [
{"name": "inner_field0", "type": "float"},
{"name": "inner_field1", "type": "int32"},
{"name": "inner_field2", "type": "list1d", "contains": {"type": "float"}}
]
}
]
}
]
}
The above describes a struct-type column named outer_struct
which has
two named fields outer_field0
and inner_struct
.
The named field outer_field0
is a field having a basic value type float
.
The named field inner_struct
is a field of type struct
that
has three named fields inner_field0
, inner_field1
,
and inner_field2
of type float
, int32
, and list1d[float]
,
respectively.
Writing Structs with Struct Fields¶
Writing to struct-type columns having fields that are of type struct
is done
as follows (assuming the layout declaration from the previous section):
namespace pw = parquetwriter;
// data for the non-struct fields of the struct "outer_struct"
float outer_field0_data{42.0};
pw::field_map_t outer_struct_data{
{"outer_field0", outer_field0_data}
};
// data for the non-struct fields of the struct "inner_struct"
float inner_field0_data{42.0};
int32_t inner_field1_data{42};
std::vector<float> inner_field2_data{42.0, 42.1, 42.2};
pw::field_map_t inner_struct{
{"inner_field0", inner_field0_data},
{"inner_field1", inner_field1_data},
{"inner_field2", inner_field2_data}
};
// call "fill" for each struct
writer.fill("outer_struct", outer_struct_data);
writer.fill("outer_struct.inner_struct", inner_struct_data);
As can be seen, for each level of nesting of struct-typed columns/fields,
one provides a field_map_t
(or field_buffer_t
) instance containing
the data for all fields that are not of type struct
.
Internal named fields that are of type struct
are written to using the dot (.
)
notation in the call to fill
, with the
convention <outer_struct_name>.<inner_struct_name>
as seen
in the above: writer.fill("outer_struct.inner_struct", ...)
.
Constraints¶
Warning
A column of type struct
cannot itself contain named fields of
type struct
that have fields of type struct
.
For simplicity, any named field of type struct
of a struct-type column
is not itself allowed to have a field of type struct
.
For example, the following Parquet file layout declaration is not allowed:
{
"fields": [
{
"name": "struct0", "type": "struct",
"fields": [
{"name": "field0", "type": "float"},
{"name": "struct1", "type": "struct",
"fields": [
{"name": "inner_field0", "type": "float"},
{"name": "struct2", "type": "struct",
"fields": [
{"name": "inner_inner_field0", "type": "float"}
]
}
]
}
]
]
}
Note
The above is not allowed since the inner struct struct1
contains
a struct-typed field (the field named struct2
).