Issue
I am trying to write to a parquet file in GO. While writing to this file, I can get NaN
values. Since NaN
is neither defined in the primitive types nor in logical type then how do I handle this value in GO? Does any existing schema work for it?
I am using the parquet GO library from here. You can find an example of the code using JSON schema for writing to parquet here using this library.
Solution
The isse was discussed at lenght in xitongsys/parquet-go
issue 281, with the recommandation being to
use
OPTIONAL
type.
Even you don’t assign a value (like you code), the non-point value will be assigned a default value.
Soparquet-go
don’t know it’s null or default value.
However:
What is comes down to is that I cannot use the
OPTIONAL
type, in other words I cannot convert my structure to use pointers.
I have tried to userepetitiontype=OPTIONAL
as a tag, but this leads to some weird behavior.
I would expect that tag to behave the same way that theomitempty
tag in the Golang standard library, i.e. if the value is not present then it is not put into the JSON.The reason this is important is that if the field is missing or not set, when it is encoded to parquet then there is no way of telling if the value was 0 or just not set in the case of int64.
This illustrates the issue:
package main
import (
"encoding/json"
"io/ioutil"
)
type Salary struct {
Basic, HRA, TA float64 `json:",omitempty"`
}
type Employee struct {
FirstName, LastName, Email string `json:",omitempty"`
Age int
MonthlySalary []Salary `json:",omitempty"`
}
func main() {
data := Employee{
Email: "mark@gmail.com",
MonthlySalary: []Salary{
{
Basic: 15000.00,
},
},
}
file, _ := json.MarshalIndent(data, "", " ")
_ = ioutil.WriteFile("test.json", file, 0o644)
}
with a JSON produced as:
{
"Email": "mark@gmail.com",
"Age": 0,
"MonthlySalary": [
{
"Basic": 15000
}
]
}
As you can see, the item in the struct that have the
omit empty
tag and that are not assigned do no appear in the JSON, i.e.HRA TA
.
But on the other handAge
does not have this tag and hence it is still included in the JSON.This is problematic as all fields in the struct are assigned memory when this golang library writes to
parquet-
so if you have a big struct that is only sparsely populated it will still take the full amount of memory.
It is a bigger problem when the file is read again as there is no way of know if the value that was put in the parquet file was the empty value or it is was just not assigned.I am happy to help implement an
omitempty
tag for this library if I can convince you of the value of having it.
That echoes issue 403 "No option to omitempty when not using pointers".
Answered By – VonC
Answer Checked By – Marilyn (GoLangFix Volunteer)