How does Hugo's Related Content algorithm work? What are the factors?


On their website they say:

Hugo uses a set of factors to identify a page’s related content based on Front Matter parameters. This can be tuned to the desired set of indices and parameters or left to Hugo’s default Related Content configuration.


But how exactly does the algorithmus work? What are the factors?


The original approach is explained in gohugoio/hugo PR 3815

Several attempts have been started to fix #98 — all of them have failed for some reason.
It is a hard problem to solve, and I think the main reason for failure has been the bottom-up-approach, i.e. we have started with the hardest problem: Solving Sherlock’s last case.

The reason I’m picking up this ball again now is this Twitter thread:

Using intersect and keywords in page params work reasonably well, but it is quadratic and will be slow to unusable for larger sites.

So, instead of solving the hardest problem, I have started on this PR by outlining an interface:

type PageSearcher interface {
  Search(args ...interface{}) (Pages, error)
  SearchIndex(index string, args ...interface{}) (Pages, error)
  Similar(p *Page) (Pages, error)
  SimilarIndex(index string, p *Page) (Pages, error)

Naming suggestions welcomed.

The idea is that a user defines a set of indexes in config.toml:

 - param: keywords
   weight: 1
- param: tags
   weight: 3

Then we lazily build some sort of index from that, and then you can do fast searches like:

{{ .Site.RegularPages.Similar . }}
{{ .Site.RegularPages.Search "hugo" }}
{{ .Site.RegularPages.SearchIndex "keywords" "hugo" | limit 10 }}

Initial implementation: gohugoio/hugo commit 3b4f17b

Answered By – VonC

Answer Checked By – Candace Johnson (GoLangFix Volunteer)

Leave a Reply

Your email address will not be published.