Tracing errors in Go using custom error types.

Michael Francis
7 min readAug 23, 2023
Photo by Brett Jordan on Unsplash

While I had previously sworn off writing more about errors in Go, it might help to share one of my strategies for making Go error handling more informational.

Firstly, I am sharing my overarching position; the Go language choice of explicit error handling at the point of error is straightforward and makes for unambiguous implementations and easy-to-reason code, even when you return months later to a code base. Conversely, handling an error at a distance’ exception style’ can lead to deceptively more straightforward-looking code that hides harder-to-debug errors and, unfortunately, can result in exceptions as flow control.

There are issues with Go’s errors, though; the problems occur when an error is received and passed through a relatively long call chain. We need to answer the question, where did the error come from, and how do I set a breakpoint in my debugger to break at the error? Exceptions provide this for free.

I’ve been prototyping some reasonably complex transactional ledger-related code over the past few months where I wish to roll back the transaction on a violated precondition. The code looks like the following.

err := db.trans( func (t *Trans) error { if } ) 

If the lambda returns an error, the system must roll back the entire set of operations. You can see this pattern used by database drivers in Go. My model allows the nesting of transactions; hence, a precondition a few levels down the call tree will trigger the rollback. For example, the precondition errors all look identical and don’t have much context.

insufficient. ‘X’ in ‘Y’ requested 100 found 10 

Since the prerequisite is data-driven and relies on the system’s state, I need more context to figure this out. The ‘good’ news about this code is that we evaluate the constraint in a few locations. It is easy to place breakpoints in each of them, stop the debugger, take a meander through the stack, and figure out what is going wrong.

Using a breakpoint strategy works for me as the library developer, but what about a library user? They probably want to avoid digging into the inner logic to determine what is going wrong. The library has rich tests, so we can assume that the library is correctly functioning unless we have missed something. How do we give more context to the end user?

Custom error types to the rescue!

Quick reminder: the go error interface is simple, a single-function interface.

type error interface {
Error() string
}

While you can return a generic error with functions like fmt. ErrorF, you can also, and are encouraged to, create your own errors. Since an Error is a type that implements the Error() String method, you can pass any payload along with your error message.

Using this strategy I can enrich my type with useful information; the user can now safely cast to the custom type and inspect the information

if me, ok := err.(MyError); ok {
// me is of type MyError and I can subscript into it
more := me.moreData
// Do something with the moreData
} else {
// me is another type of error, I can only call the Error() String method as this is still an interface.
}

The ability to carry state information in the error message can be advantageous. However, it still doesn’t help with the question, where did that error come from?

Now we can use a little go magic. Could I squirrel away the call stack on the first generation of an error? In the package Go debug, a routine exists to print the call stack to a buffer. What if we want to return a slice of stack information, including the file, line, and perhaps some additional details? Examining the debug package leads to the runtime package, and with some experimentation, we can come up with the following.

type Trace struct {
Function string `json:"function"`
File string `json:"file"`
Line int `json:"line"`
}

func getStack() []Trace {
pcs := make([]uintptr, 32)
// Skip 4 stack frames
npcs := runtime.Callers(4, pcs)
traces := make([]Trace, 0, npcs)
callers := pcs[:npcs]
ci := runtime.CallersFrames(callers)
for {
frame, more := ci.Next()
traces = append(traces, Trace{
File: frame.File,
Line: frame.Line,
Function: frame.Function,
})
if !more || frame.Function == "main.main" {
break
}
}
return traces
}

A close observer will notice the magic number 4 in the code, runtime.Callers. This number defines the number of stack frames to skip. In this instance, we chose 4 since the getStack function is 1, and we are calling the function from an outer function. We don’t want to include information about the error handling; just start at the location of the error.

Now, we change our custom error to be a type that wraps an existing error and carries information about the stack.


type HostInfo struct {
HostName string `json:"hostName"`
Pid int `json:"pid"`
Stack []Trace `json:"stack"`
}

type WrappedError struct {
Err error `json:"err"`
HostInfo HostInfo `json:"hostInfo"`
}

func (w WrappedError) Error() string {
return w.Err.Error()
}

// Force errors to string form when converted to JSON
type wrappedErrorStream struct {
Err string `json:"err"`
HostInfo HostInfo `json:"hostInfo"`
}

The new type WrappedError additionally includes some information about the process. While not essential for our use case, the hostname and the process ID are generally valuable for distributed systems to find the location of an error.

An implementation like this works but is cumbersome to use. Thankfully, Go allows us to define convenience functions.


func hostInfo() HostInfo {
hn, err := os.Hostname()
if err != nil {
panic(err)
}
return HostInfo{
HostName: hn,
Pid: os.Getpid(),
Stack: getStack(),
}
}

// Wrapper functions

func ErrorW(err error) WrappedError {
// Return directly if this is already wrapped
if unwrapped, ok := err.(WrappedError); ok {
return unwrapped
}
return WrappedError{Err: err, HostInfo: hostInfo()}
}

Note that I check whether the error to be wrapped is already one of our wrapped errors. Creating a call stack is costly, so we only want to do this occasionally. Additionally, the stack implicitly includes the next level, which would be duplicated information.

To make our lives easier, we can also define a few helpers that allow the direct creation of a wrapped error.

func Error(err string) WrappedError {
return WrappedError{
Err: fmt.Errorf(err),
HostInfo: hostInfo(),
}
}

func Errorf(format string, i ...interface{}) WrappedError {
return WrappedError{Err: fmt.Errorf(format, i...), HostInfo: hostInfo()}
}

How might you use this pattern in code? Whenever you return an error from your code, we wrap it using the ErrorW function. Following this pattern ensures that my error types and those of other libraries I do not control are consistently wrapped with the call stack at first observation in my code base.

func returnError() error {
return fmt.Errorf( "this is an error")
}
func example() error {
if err := returnError(); err != nil {
return ErrorW( err )
}
// Do something
return nil
}

func callExample() {
err := example()
if me, ok := err.(WrappedError); ok {
log.Print( me.Err )
log.Print( me.HostInfo ) // Will print stack trace and error information
} else {
log.Print( me.Error()) //
}
}

Now, when I print an error message, I can choose to report the stack frames that caused the error, and if I am smart about how I output the message, my IntelliJ or another IDE, can jump to the correct line of code.
The other side effect is that I now have just one place to insert a breakpoint to get me started. Unfortunately, unlike an exception, the breakpoint will not contain the state of the running code; we have already unwound the stack. This is still exceptionally helpful in recursive code, such as my ledger code, and it is much quicker to get the right pace in a complex code base.

And now, for the encore, since I’m often writing RPC calls, I also define a from and to JSON method that allows the error to stream. One pitfall here is that there is no guarantee that a nested error supports streaming to JSON. Interface streams are complex, and treating the nested error as a string is more straightforward and safer.

The code looks like the following.

// Force errors to string form when converted to JSON
type wrappedErrorStream struct {
Err string `json:"err"`
HostInfo HostInfo `json:"hostInfo"`
}

func (w WrappedError) MarshalJSON() ([]byte, error) {
return json.Marshal(wrappedErrorStream{
Err: w.Err.Error(),
HostInfo: w.HostInfo,
})
}

func (w *WrappedError) UnmarshalJSON(b []byte) error {
var ws wrappedErrorStream
err := json.Unmarshal(b, &ws)
if err != nil {
return err
}
w.HostInfo = ws.HostInfo
w.Err = fmt.Errorf("report error : %s", ws.Err)
return nil
}

Even when you successfully stream the type, the caller may not have a way to unstream the nested error, which is likely to cause runtime errors. We do not want to do this.

If this were a production version of the streaming code, I might include a magic number in the stream. I would use this number to ensure that the error returned will likely be of WrappedError type and, hence, safely unstreamed. Implementing the magic number is an exercise left to the reader.

The code presented here was created before the addition of error wrapping in Go. This wrapping supports nested error context carrying but does not solve the call site issue.

Conclusion: Now, if Go’s error handling were more of a built-in to the language vs. a convention, the runtime could directly provide the above functionality. Zig takes this latter approach, and I would like to see Go treat errors as a little special, including the call stack.

Thank you for reading and I hope this provides some pointers for others.

--

--