How to parse general yaml in golang with comments preserved?

Issue

I am playing with golang yaml v3 library. The goal is to parse any yaml (that means that I don’t have predefined structure) from file with comments, be able to set or unset any value in the resulting tree and write it back to file.

However, I have encountered quite strange behavior. As you can see in the code below, if the main type passed to the Unmarshal function is interface{}, no comments are preserved and library uses maps and slices to represent the structure of yaml. On the other hand, if I use (in this case) []yaml.Node structure, it does represent all nodes internally as yaml.Node or []yaml.Node. This is more or less what I want, because it allows comment preservation. However, it is not a general solution because there are at least two distinct scenarios – either the YAML starts with an array or with a map and I am not sure how to elegantly deal with both situations.

Could you possibly point me in the right direction and elaborate on why does the library behaves this way?

package main

import (
    "fmt"
    "reflect"
    "gopkg.in/yaml.v3"
)

type Document interface{} // change this to []yaml.Node and it will work with comments // change it to yaml.Node and it will not work

var data string = ` # Employee records
-  martin:
    name: Martin D'vloper
    job: Developer
    skills:
      - python
      - perl
      - pascal
-  tabitha:
    name: Tabitha Bitumen
    job: Developer
    skills:
      - lisp
      - fortran
      - erlang
`

func toSlice(slice interface{}) []interface{} {
    s := reflect.ValueOf(slice)
    if s.Kind() != reflect.Slice {
        panic("InterfaceSlice() given a non-slice type")
    }

    ret := make([]interface{}, s.Len())

    for i:=0; i<s.Len(); i++ {
        ret[i] = s.Index(i).Interface()
    }

    return ret
}

func main() {
    var d Document
    err := yaml.Unmarshal([]byte(data), &d)
    if err != nil {
        panic(err)
    }

    slice := toSlice(d)
    fmt.Println(reflect.ValueOf(slice[0]).Kind())

    fmt.Println(reflect.TypeOf(d))
    fmt.Println(reflect.ValueOf(d).Kind())
    output, err := yaml.Marshal(&d)
    if err != nil {
        panic(err)
    }
    fmt.Println(string(output))

}

Solution

On the other hand, if I use (in this case) []yaml.Node structure, it does represent all nodes internally as yaml.Node or []yaml.Node.

That is not accurate. go-yaml lets you leave any sub-tree of your structure as yaml.Node possibly for later processing. Inside this node, everything is represented as a yaml.Node, and a node that is a collection (sequence or mapping) just happens to store its children as []yaml.Node. But no node is directly represented as []yaml.Node.

When you deserialize into []yaml.Node, you deserialize the top-level node into a native structure (a slice) while leaving the children unconstructed (the process of loading a YAML node into a native structure is called construction in the spec).

go-yaml does not really support

type Document yaml.Node

but if you just do

var d yaml.Node

the comment will be preserved as well (toSlice will not work anymore obviously):

- # Employee records
  martin:
      name: Martin D'vloper
      job: Developer
      skills:
        - python
        - perl
        - pascal
- tabitha:
      name: Tabitha Bitumen
      job: Developer
      skills:
        - lisp
        - fortran
        - erlang

Now as we can see, the position of the comment differs. This is because go-yaml just stores in the yaml.Node that represents the list item that „there has been a comment before this list item“. The information about where exactly the comment has been located was lost. You should be thankful that you have any information about the comment because most YAML implementations scrap them far earlier since the spec says that comments must not convey content information.

You may want to read I want to load a YAML file, possibly edit the data, and then dump it again. How can I preserve formatting? which goes into detail about why, when and how information is lost during loading of a YAML file. TL;DR: It is impossible (without basically doing parsing yourself) to load a YAML file and dump it back while preserving all formatting and if that is your goal, YAML is the wrong tool for you.

Answered By – flyx

Answer Checked By – Mildred Charles (GoLangFix Admin)

Leave a Reply

Your email address will not be published.