In Go, how to parse XML with mixed elements/chardata/elements/chardata content?

Issue

Let’s say I have a structure, that can reference elements
multiple times:

    <?xml version="1.0" encoding="UTF-8"?>
    <book category="cooking">
      <title lang="en">Everyday Italian</title>
      <author>Giada De Laurentiis</author>
      <year>2005</year>
      <price>30.00</price>
      Blah Blah Blah Bleh Blah of <year/> written by <author/>
    </book>

How can I parse this XML (or better to say, how can I describe the structure),
so that I can have these internal references to it?

    type Book struct{
       t string `xml:"book>title"`
       p string `xml:"book>price"`
       y string `xml:"book>year"`
       a string `xml:"book>author"`
       blah string ???????
    }

The naïve approach (https://go.dev/play/p/JVM98pCcI0D), just to describe blah as cdata is obviously wrong, because the references <year/> and <author/> are getting lost.

What is the right way to define blah here, so that the internal structure of it, is still available after parsing?

Solution

A solution based on icza’s comment:

func (b *Book) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    for {
        t, err := d.Token()
        if err != nil {
            if err != io.EOF {
                return err
            }
            return nil
        }

        switch t := t.(type) {
        case xml.StartElement:
            var f interface{} // field
            var r string      // replace
            switch t.Name.Local {
            case "title":
                f = &b.Title
            case "author":
                if len(b.Author) > 0 { // if "author" was already decoded then assume this is the element in the "blah chardata"
                    r = b.Author // if you want <author/> to appear in Text then do `r = "<author/>"` instead
                } else {
                    f = &b.Author
                }
            case "year":
                if len(b.Year) > 0 { // same logic as for author above
                    r = b.Year
                } else {
                    f = &b.Year
                }
            case "price":
                f = &b.Price
            }
            if f != nil {
                if err := d.DecodeElement(f, &t); err != nil {
                    return err
                }
            }
            if len(r) > 0 {
                b.Text += " " + r + " " // add empty space for padding the replacement string
            }
        case xml.CharData:
            s := strings.TrimSpace(string(t))
            if len(s) > 0 {
                b.Text += s
            }
        }
    }
    return nil
}

Answered By – mkopriva

Answer Checked By – Pedro (GoLangFix Volunteer)

Leave a Reply

Your email address will not be published.