Storing Lists of Structs

Storing lists containing elements that are of type struct is supported.

Declaring Lists of Structs

Declaring columns that contain lists whose elements are of type struct is done by composing the list type and struct type declarations.

For example, the following declares a one-dimensional list containing struct-type elements that have three named fields:

{
  "fields": [
    {
      "name": "structlist", "type": "list1d",
      "contains": { "type": "struct",
                    "fields": [
                      {"name": "field0", "type": "float"},
                      {"name": "field1", "type": "int32"},
                      {"name": "field2", "type": "list1d", "contains": {"type": "float"}}
                    ]
                  }
    }
  ]
}

To declare two- or three-dimensional lists, one would simply swap the type field for the structlist column from list1d to either list2d or list3d.

Writing Lists of Structs

Writing to columns that contain lists of struct-type elements is done by building up instances of std::vector containing elements of either field_map_t or field_buffer_t.

For example, writing a one-dimensional list containing the three-field struct elements described above would be done as follows:

namespace pw = parquetwriter;

// 1D vector of struct elements
std::vector<pw::field_map_t> structlist_data;

// fill the 1D vector with struct data elements
for(...) {
    // generate struct field data
    float field0_data{42.42};
    int32_t field1_data{42};
    std::vector<float> field2_data{42.0, 42.1, 42.2};

    // create the struct element
    pw::field_map_t struct_data{
        {"field0", field0_data},
        {"field1", field1_data},
        {"field2", field2_data}
    };

    // append to the struct list
    structlist_data.push_back(struct_data);
}

// call "fill" as usual
writer.fill("structlist", structlist_data);

The two-dimensional case:

namespace pw = parquetwriter;

// 2D vector of struct elements
std::vector<std::vector<pw::field_map_t>> structlist_data;

// fill the 2D vector with struct data elements
for(...) {
    std::vector<pw::field_map_t> inner_structlist_data;
    for(...) {
        pw::field_map_t struct_data{
            {"field0", field0_data},
            {"field1", field1_data},
            {"field2", field2_data}
        };
        inner_structlist_data.push_back(struct_data);
    }
    structlist_data.push_back(inner_structlist_data);
}

// call "fill" as usual
writer.fill("structlist", structlist_data);

And the three-dimensional case:

namespace pw = parquetwriter;

// 3D vector of struct elements
std::vector<std::vector<std::vector<pw::field_map_t>>> structlist_data;

// fill the 3D vector with struct data elements
for(...) {
    std::vector<std::vector<pw::field_map_t>> inner_structlist_data;
    for(...) {
        std::vector<pw::field_map_t> inner_inner_structlist_data;
        for(...) {
            pw::field_map_t struct_data{
                {"field0", field0_data},
                {"field1", field1_data},
                {"field2", field2_data}
            };
            inner_inner_structlist_data.push_back(struct_data);
        }
        inner_structlist_data.push_back(inner_inner_structlist_data);
    }
    structlist_data.push_back(inner_structlist_data);
}

// call "fill" as usual
writer.fill("structlist", structlist_data);

Constraints

Warning

The struct type elements contained in lists of struct cannot themselves contain fields that are of type struct.

For simplicity, any list type data column whose elements are of type struct, cannot contain struct type elements that have fields that are themselves of type struct.

For example, the following Parquet file layout declaration is not allowed:

{
  "fields": [
    {
      "name": "structlist",
      "type": "list1d",
      "contains": {
        "type": "struct",
        "fields": [
          {"name": "field0", "type": "float"},
          {
           "name": "inner_struct", "type": "struct",
           "fields": [{"name": "inner_field0", "type": "float"}]
          }
        ]
      }
  ]
}

Note

The above list1d type column is not allowd since its struct typed elements are declared as having an internal struct typed column (the field named inner_struct).