Ty Scales

XML in Go

Decoding

In this section we’ll examine the rules of the xml decoder and provide examples for each.

1. If the struct has a field type []byte or string with tag “,innerxml”, Unmarshal accumulates the raw XML nested inside in that field. The rest of the rules still apply.
x := `<address>
        <street>123 Main St</street>
    </address>`

type Address struct {
    Contents string `xml:",innerxml"`
}
var addr Volume
err := xml.Unmarshal([]byte(x), &addr)
if err != nil {
    log.Fatal(err)
}
fmt.Println(addr.Contents)
//  <street>123 Main St</street>
2. If the struct has a field named XMLName of type Name, Unmarshal records the element name in that field.
x := `<address>
        <street>123 Main St</street>
    </address>`

type Address struct {
    XMLName xml.Name
}
var addr Address
err := xml.Unmarshal([]byte(x), &addr)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("%+v", addr.XMLName.Local)
// address
3. If the XMLName field has an associated tag of the form “name” or “namespace-URL name”, the XML element must have the given name (and, optionally, name space) or else Unmarshal returns an error.
x := `<address>
        <street>123 Main St</street>
    </address>`

type Address struct {
    XMLName xml.Name `xml:"city"`
}
var addr Address
err := xml.Unmarshal([]byte(x), &addr)
if err != nil {
    log.Fatal(err)
    //expected element type <city> but have <address>
}
4. If the XML element has an attribute whose name matches a struct field name with an associated tag containing “,attr” or the explicit name in a struct field tag of the form “name,attr”, Unmarshal records the attribute value in that field.
x := `<address id="999">
        <street>123 Main St</street>
    </address>`

type Address struct {
    Id string `xml:"id,attr"`
}
var addr Address
err := xml.Unmarshal([]byte(x), &addr)
if err != nil {
    log.Fatal(err)
}
fmt.Println(addr.Id)
//999
5. If the XML element has an attribute not handled by the previous rule and the struct has a field with an associated tag containing “,any,attr”, Unmarshal records the attribute value in the first such field.
x := `<address id="999" span="888">
        <street>123 Main St</street>
    </address>`

type Address struct {
    Id        string `xml:"id,attr"`
    OtherAttr string `xml:",any,attr"`
}
var addr Address
err := xml.Unmarshal([]byte(x), &addr)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("%+v", addr)
// {Id:999, OtherAttr:888}
6. If the XML element contains character data, that data is accumulated in the first struct field that has tag “,chardata”. The struct field may have type []byte or string. If there is no such field, the character data is discarded.
x := `<address>
        <street>123 Main St</street>
    </address>`

type Address struct {
    Street struct {
        Text string `xml:",chardata"`
    } `xml:"street"`
}
var addr Address
err := xml.Unmarshal([]byte(x), &addr)
if err != nil {
    log.Fatal(err)
}
fmt.Println(addr.Street.Text)
7. If the XML element contains comments, they are accumulated in the first struct field that has tag “,comment”. The struct field may have type []byte or string. If there is no such field, the comments are discarded.
x := `<address>
        <!-- an xml comment -->
        <street>123 Main St</street>
    </address>`

type Address struct {
    Comment string `xml:",comment"`
}
var addr Address
err := xml.Unmarshal([]byte(x), &addr)
if err != nil {
    log.Fatal(err)
}
fmt.Println(addr.Comment)
// an xml comment
8. If the XML element contains a sub-element whose name matches the prefix of a tag formatted as “a” or “a>b>c”, unmarshal will descend into the XML structure looking for elements with the given names, and will map the innermost elements to that struct field. A tag starting with “>” is equivalent to one starting with the field name followed by “>”.

This one is worth dividing in to several examples: The most basic is when a sub-element name that matches a tag.

x := `<address>
            <street>123 Main St</street>
    </address>`

type Address struct {
    Street string `xml:"street"`
}
var addr Address
err := xml.Unmarshal([]byte(x), &addr)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("%+v", addr.Street)

Now let’s see how we can deal with deeper nestings.

<address>
    <street>
        <value> 123 Main St</value>
    </street>
</address>

Option 1 is to follow the rules we have alrady seen and use a struct.

x := `<address>
        <street>
            <value> 123 Main St</value>
        </street>
    </address>`

type Street struct {
    Value string `xml:"value"`
}
type Address struct {
    Street Street `xml:"street"`
}
var addr Address
err := xml.Unmarshal([]byte(x), &addr)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("%+v", addr.Street)

Option 2 is to is to descend the xml using >.

x := `<address>
        <street>
            <value>123 Main St</value>
        </street>
    </address>`

type Address struct {
    Street string `xml:"street>value"`
}
var addr Address
err := xml.Unmarshal([]byte(x), &addr)
if err != nil {
    log.Fatal(err)
}
fmt.Println(addr.Street)

The same example works for qualified names. You might see xml like this

<address>
  <ns:street>123 Main St</ns:street>
</address>

and be tempted to write a struct like this:

type Address struct {
    Street string `xml:"ns:street"`
}

That won’t decode correctly! You can leave the xml tag as street, or if for some reason you need the prefix, replace the colon with a space

type Address struct {
    Street string `xml:"ns street"`
}
9. If the XML element contains a sub-element whose name matches a struct field’s XMLName tag and the struct field has no explicit name tag as per the previous rule, unmarshal maps the sub-element to that struct field.

The xml struct tag can be left off for nested structs that contain a XMLName tag

x := `<address>
        <street>
            <value> 123 Main St</value>
        </street>
    </address>`

type Street struct {
    XMLName xml.Name `xml:"street"`
    Value   string   `xml:"value"`
}
type Address struct {
    Street Street //no xml tag needed
}
var addr Address
err := xml.Unmarshal([]byte(x), &addr)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("%+v", addr.Street)

Writing a customer Unmarshaller

This example takes the value of an xml field, parses the date, and separates it in to three struct fields.

type Time struct {
	Date Date `xml:"date"`
}

type Date struct {
	Year  int
	Month time.Month
	Day   int
}

func (date *Date) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
	var s string
	err := d.DecodeElement(&s, &start)
	if err != nil {
		return err
	}
	t, err := time.Parse("2006-01-02", s)
	if err != nil {
		return err
	}
	date.Year = t.Year()
	date.Month = t.Month()
	date.Day = t.Day()
	return nil
}
func main() {
	x := `<time>
			<date>2025-02-14</date>
		</time>`
	var addr Time
	err := xml.Unmarshal([]byte(x), &addr)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%+v", addr.Date)
}

Debugging

Manually inspecting how an element is tokenized can be helpful when a document isn’t decoding the way you expect.

	x := `<address>
			<street>123 Main St</street>
		</address>`
	d := xml.NewDecoder(bytes.NewBuffer([]byte(x)))
	for {
		token, err := d.Token()
		if err != nil {
			if err == io.EOF {
				break
			}
		}
		fmt.Printf("%T, %+v", token, token)
	}

Dealing with large xml files

The honeymoon phase of xml in Go ends when you start dealing with deeply nested xml. do You write a struct per layer, do you write overly long struct tags to get the one field you need?

x := `<layer1>
        <layer2>
            <layer3>
                <layer4>Hello, World!</layer4>
            </layer3>
        </layer2>
    </layer1>`
type Layer1 struct {
    Layer2 Layer2 `xml:"layer2"`
}
type Layer2 struct {
    Layer3 Layer3 `xml:"layer3"`
}
type Layer3 struct {
    Layer4 string `xml:"layer4"`
}

// alternatively
type PathedLayer4 struct {
    Msg string `xml:"layer2>layer3>layer4"`
}

There are libraries that can simplify either approach.

#Golang