Miscellaneous¶
Adding File Metadata¶
Arbitrary metadata can be stored in JSON format and added to the
output Parquet file using the parquet-writer
library.
This is done by first creating a JSON object containing whatever
arbitrary information you wish and providing it to your
parquetwriter::Writer
instance.
Suppose we have the file metadata.json
containing the following JSON:
{
"dataset_name": "example_dataset",
"foo": "bar",
"creation_date": "2021/10/11",
"bar": {
"faz": "baz"
}
}
We would pass this to our writer instance as follows:
std::ifstream metadata_file("metadata.json");
writer.set_metadata(metadta_file);
The above stores the contained JSON to the Parquet file as an instance of
key:value
pairs.
The example Python script dump-metadata.py
(requires pyarrow) that extracts the metadata stored
by parquet-writer
shows how to extract the metadata and can be run as follows:
$ python examples/python/dump-metadta.py <file>
where <file>
is a Parquet file written by parquet-writer
.
Running dump-metadata.py on a file with the metadata from above woudl look like:
$ python examples/python/dump-metadata.py example_file.parquet
{
"dataset_name": "example_dataset",
"foo": "bar",
"creation_date": "2021/10/11",
"bar": {
"faz": "baz"
}
}