recursive concurrency for quadtree with golang


I’m trying to parallelize my quadtree lookup by using go-routines to share the recursive lookup.

My Quadtree struct looks like this:

type Quadtree struct {
    Rectangle //Boundary of the quadtree
    Points   []Point
    Ne       *Quadtree
    Se       *Quadtree
    Sw       *Quadtree
    Nw       *Quadtree
    Divided  bool
    Capacity int
    Parent   *Quadtree

I have a method called QueryForNearestPoints which accepts a Point and a searchRadius and returns all the points within the searcharea in the Quadtree. If no points are found within the given searchRadius, then the searchRadius is increased and tried again.

points is the channel to which points in the search-radius are send to. I’m also using a sync.WaitGroup wg to wait for all go-routines to finish.

func (q *Quadtree) QueryForNearestPoints(p Point, searchRadius float64) []QueryResult {
    searcharea = Circle{p, searchRadius}
    var testResults []QueryResult

    for {

        go func() {
            for {
                select {
                case item, _ := <-points:
                    testResults = append(testResults, item)
                case <-done:
                    break loop
        go q.query(searcharea)
        done <- true
        if len(testResults) > 0 {
        } else {
            // Proportionally increase the search radius if no points are found.
            searcharea = Circle{p, searcharea.Radius + searcharea.Radius*50/100}
    return testResults

The parallelized quadtree’s query method:

func (q *Quadtree) query(area Circle) {
    defer wg.Done()
    if !q.overlapsCircle(area) {
    if len(q.Points) > 0 {
        for i, point := range q.Points {
            if area.contains(point) {
                points <- QueryResult{point, i, q}
    if q.Divided {
        go q.Ne.parallelquery(area)
        go q.Se.parallelquery(area)
        go q.Sw.parallelquery(area)
        go q.Nw.parallelquery(area)

(q *Quadtree) parallelquery method:

func (q *Quadtree) parallelquery(area Circle) {
    semaphore <- struct{}{}
    defer func() {

If I’m not wrong this is a cpu-bound workload so having 1000’s of go-routines isn’t going to help, so I’m using a buffered channel semaphore as a counting semaphore to make sure that at any point only n number of go-routines are active.(Here I made n as 4 because my computer has 4 cpus.)

The only problem is that this approach is significantly slower than the normal non-concurrent sequential approach. What am I missing here? How can i make this faster?


This is much slower because the work that you are doing in each of the goroutines is very cheap, and there are very many goroutines.

For those that will criticise the terminology in this answer, I am going to use parallelism and concurrency interchangeably. Even though strictly, parallelism is a property of the runtime / computer, and concurrency is a property of the code. Some code can still be concurrent if it is running on hardware with a single processor, but it will not be running in parallel.

The reason that the concurrent version here is slower is because there is still some synchronisation required when using channels etc. Under the hood, channels use a mutex to synchronise the sends / receives on a channel. This means that you are introducing contention between the goroutines.

Each time you call parallelquery(area) you are starting 4 more goroutines that must be scheduled, but (using a semaphore channel) you’ve limited the number of concurrent goroutines to 4.

After a n recursions you’ve got 4^n goroutines all trying to receive a value from the semaphore channel, this causes a lot of contention.

Using a blocking CPU profiler on this you will probably find that most of the execution time is spent on contention attempting to get a token from the semaphore.

Answered By – Zak

Answer Checked By – Dawn Plyler (GoLangFix Volunteer)

Leave a Reply

Your email address will not be published.