Exercise: Web Crawler – concurrency not working

Issue

I am going through the golang tour and working on the final exercise to change a web crawler to crawl in parallel and not repeat a crawl ( http://tour.golang.org/#73 ). All I have changed is the crawl function.

    var used = make(map[string]bool)

    func Crawl(url string, depth int, fetcher Fetcher) {
        if depth <= 0 {
            return
        }
        body, urls, err := fetcher.Fetch(url)
        if err != nil {
            fmt.Println(err)
            return
        }
        fmt.Printf("\nfound: %s %q\n\n", url, body)
        for _,u := range urls {
            if used[u] == false {
                used[u] = true
                Crawl(u, depth-1, fetcher)
            }
        }
        return
    }

In order to make it concurrent I added the go command in front of the call to the function Crawl, but instead of recursively calling the Crawl function the program only finds the “http://golang.org/” page and no other pages.

Why doesn’t the program work when I add the go command to the call of the function Crawl?

Solution

The problem seems to be, that your process is exiting before all URLs can be followed
by the crawler. Because of the concurrency, the main() procedure is exiting before
the workers are finished.

To circumvent this, you could use sync.WaitGroup:

func Crawl(url string, depth int, fetcher Fetcher, wg *sync.WaitGroup) {
    defer wg.Done()
    if depth <= 0 {
         return
    }
    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        fmt.Println(err)
        return
    }
    fmt.Printf("\nfound: %s %q\n\n", url, body)
    for _,u := range urls {
        if used[u] == false {
           used[u] = true
           wg.Add(1)
           go Crawl(u, depth-1, fetcher, wg)
        }
    }
    return
}

And call Crawl in main as follows:

func main() {
    wg := &sync.WaitGroup{}

    Crawl("http://golang.org/", 4, fetcher, wg)

    wg.Wait()
}

Also, don’t rely on the map being thread safe.

Answered By – nemo

Answer Checked By – Clifford M. (GoLangFix Volunteer)

Leave a Reply

Your email address will not be published.