How to decide on the amount of concurrent actions?

Issue

I’m currently writing an encoder and (obviously) want to make it fast.

I have a working system for doing the encoding (so every goroutine is doing the same thing) but I am struggling with finding the right amount of goroutines to run the code in. I basically want to decide on a maximum amount of Goroutines that keeps the CPU busy.

The following thoughts crossed my mind:

  • If a file is only <1 kB it’s not useful to run the code in a lot of goroutines
  • The amount of goroutines should be influenced by the cores/threads available
    • running 16 goroutines on a 4×4 GHz CPU will not be a problem but what with a 4×1 GHz CPU?
    • hard to determine reliably cross-platform
  • The CPU should be busy but not as busy as to keep other programs from responding (~70-ish %?)
    • hard to decide beforehand due to clockspeed and other parameters

Now I’ve tried to decide based on these factors on how many goroutines to use but I’m not quite sure how to do so cross-platform and in a reliable way.

Attempts already made:

  • using a linear function to determine based on filesize
    • requires different functions based on CPU
  • parsing CPU-specs from lscpu
    • not cross-platform
    • requires another function to determine based on frequency

Which have not been satisfactory.

Solution

You mention in a comment that

every goroutine is reading the file that is to be encoded

But of course the file—any file—is already encoded in some way: as plain text, perhaps, or UTF-8 (stream of bytes), perhaps assembled into units of “lines”. Or it might be an image stream, such an an mpeg file, consisting of some number of frames. Or it might be a database, consisting of records. Whatever its input form is, it contains some sort of basic unit that you could feed to your (re-)encoder.

That unit, whatever it may be, is a sensible place to divide work. (How sensible, depends on what it is. See the idea of chunking below.)

Let’s say the file consists of independent lines: then use scanner.Scan to read them, and pass each line to a channel that takes lines. Spin off N, for some N, readers that read the channel, one line at a time:

ch := make(chan string)
for i := 0; i < n; i++ {
    go readAndEncode(ch)
}

// later, or immediately:
for s := bufio.NewScanner(os.Stdin); s.Scan(); {
    ch <- s.Text()
}
close(ch)

If there are 100 lines, and 4 readers, the first four ch <- s.Text() operations go fast, and the fifth one pauses until one of the readers is done encoding and goes back to reading the channel.

If individual lines are too small a unit, perhaps you should read a “chunk” (e.g., 1 MB) at a time. If the chunk has a partial line at the end, back up, or read more, until you have a whole line. Then send the entire data chunk.

Because channels copy the data, you may wish to send a reference to the chunk instead.1 This would be true of any larger data unit. (Lines tend to be short, and the overhead of copying them is generally not very large compared to the overhead of using channels in the first place. If your lines have type string, well, see the footnote.)

If line, or chunk-of-lines, are not the correct unit of work here, figure out what is. Think of goroutines as people (or busy little gophers) who each get one job to do. They can depend on someone else—another person or gopher—to do a smaller job, whatever that might be; and having ten people, or gophers, working on sub-tasks allows a supervisor to manage them. If you need to do the same job N times, and N is not unbounded, you can spin off N goroutines. If N is potentially unbounded, spin off a fixed number (maybe based on #cpus) and feed them work through a channel.


1As Burak Serdar notes, some copies can be elided automatically: e.g., strings are in effect read-only slices. Slice types have three parts: a pointer (reference) to the underlying data, a length, and a capacity. Copying a slice copies these three parts, but not the underlying data. The same goes for strings: string headers omit the capacity, so sending a string through a channel copies only the two header words. Hence many of the obvious and easy-to-code ways of chunking data will already be pretty efficient.

Answered By – torek

Answer Checked By – Terry (GoLangFix Volunteer)

Leave a Reply

Your email address will not be published.