In my previous tip I introduced word2vec. I discussed it in terms of language: this word, mary, shared context with this other word, lamb, so their embeddings move closer.

Why constrain ourselves to language?

We could pretend that “Doug likes Star Wars” is the same kind of co-occurence. We can make a table of users to the movies they like:

Anchor Positive movie Negative movie

doug star wars king kong

doug star trek cinderella

tom star wars citizen kane

tom battlestar galactica the aviator

Think about what we have:

  • Doug and Tom’s embeddings grow closer through star wars. A word2vec training here shrinks the distance from Doug ←→Star Wars and Tom ←→ Star Wars, making Doug a more similar user to Tom.
  • In the same way, battlestar galactica moves closer to star trek through doug + tom

Thus now, we have a movie recommender system, through the same technology behind word2vec.

We could use this for quite a lot of domains:

  • Queries and documents
  • Images and captions

And so on!

-Doug

PS - 5 days left to signup for Cheat at Search with Agents!

This is part of Doug’s Daily Search tips - subscribe here


Enjoy softwaredoug in training course form!

Starting June 22!

I hope you join me at Cheat at Search with LLMs to learn how to apply LLMs to search applications. Check out this post for a sneak preview.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky
Take My New Course - Cheat at Search with LLMs