Add unique values in an array as a value in concurrent map golang?

Issue

I am iterating over flatProduct.Catalogs slice and populating my productCatalog concurrent map in golang. I am using upsert method so that I can add only unique productID's into my productCatalog map.

This uses a linear scan to check for duplicate product IDs but in my case I have more than 700k productId’s so it is very slow for me. I am looking for ways to make it more efficient.

Below code is called by multiple goroutines in parallel that is why I am using concurrent map here to populate data into it.

var productRows []ClientProduct
err = json.Unmarshal(byteSlice, &productRows)
if err != nil {
    return err
}
for i := range productRows {
    flatProduct, err := r.Convert(spn, productRows[i])
    if err != nil {
        return err
    }
    if flatProduct.StatusCode == definitions.DONE {
        continue
    }
    r.products.Set(strconv.Itoa(flatProduct.ProductId, 10), flatProduct)
    for _, catalogId := range flatProduct.Catalogs {
        catalogValue := strconv.FormatInt(int64(catalogId), 10)
        // how can I improve below Upsert code for `productCatalog` map so that it can runs faster for me?
        r.productCatalog.Upsert(catalogValue, flatProduct.ProductId, func(exists bool, valueInMap interface{}, newValue interface{}) interface{} {
            productID := newValue.(int64)
            if valueInMap == nil {
                return []int64{productID}
            }
            oldIDs := valueInMap.([]int64)

            for _, id := range oldIDs {
                if id == productID {
                    // Already exists, don't add duplicates.
                    return oldIDs
                }
            }
            return append(oldIDs, productID)
        })
    }
}

Above upsert code is very slow for me and it takes lot of time to add unique product id’s as a value in my concurrent map. Here is how productCatalog is defined.

productCatalog *cmap.ConcurrentMap

Here is the upsert method which I am using – https://github.com/orcaman/concurrent-map/blob/master/concurrent_map.go#L56

This is how I am reading data from this cmap:

catalogProductMap := clientRepo.GetProductCatalogMap()
productIds, ok := catalogProductMap.Get("200")
var data = productIds.([]int64)
for _, pid := range data {
  ...
}

Solution

To summarize answers from comments:

The upsert function is O(n**2) where n is the length of the slice.

The problem as you also mentioned is iterating through whole slice to find duplicate. This can be avoided using another map.

Example:

r.productCatalog.Upsert(catalogValue, flatProduct.ProductId, func(exists bool, valueInMap interface{}, newValue interface{}) interface{} {
    productID := newValue.(int64)
    if valueInMap == nil {
        return map[int64]struct{}{productID: {}}
    }
    oldIDs := valueInMap.(map[int64]struct{})
    
    // value is irrelevant, no need to check if key exists 
    oldIDs[productID] = struct{}{}
    return oldIDs
})

Nested map will add lot of allocation causing lot of memory usage right?

Nope, using empty struct won’t create new allocations or increase memory usage. You can find plenty of articles/questions about empty struct and its usage. (e. g. What uses a type with empty struct has in Go?)

Note: you could use some kind of optimised search for array like binary search used by sort.Search, but it requires sorted array.

Answered By – medasx

Answer Checked By – Terry (GoLangFix Volunteer)

Leave a Reply

Your email address will not be published.