Miscellaneous¶
Adding File Metadata¶
Arbitrary metadata can be stored in JSON format and added to the
output Parquet file using the parquet-writer library.
This is done by first creating a JSON object containing whatever
arbitrary information you wish and providing it to your
parquetwriter::Writer instance.
Suppose we have the file metadata.json containing the following JSON:
{
"dataset_name": "example_dataset",
"foo": "bar",
"creation_date": "2021/10/11",
"bar": {
"faz": "baz"
}
}
We would pass this to our writer instance as follows:
std::ifstream metadata_file("metadata.json");
writer.set_metadata(metadta_file);
The above stores the contained JSON to the Parquet file as an instance of
key:value pairs.
The example Python script dump-metadata.py
(requires pyarrow) that extracts the metadata stored
by parquet-writer shows how to extract the metadata and can be run as follows:
$ python examples/python/dump-metadta.py <file>
where <file> is a Parquet file written by parquet-writer.
Running dump-metadata.py on a file with the metadata from above woudl look like:
$ python examples/python/dump-metadata.py example_file.parquet
{
"dataset_name": "example_dataset",
"foo": "bar",
"creation_date": "2021/10/11",
"bar": {
"faz": "baz"
}
}