November
18th,
2024
Elasticsearch doesn't have a straight-forward way to match the 'full' field (all the tokens as a phrase).
November
3rd,
2024
Reciprocal Rank Fusion, while a useful tool, doesnt magically make hybrid search relevant
October
19th,
2024
A notebook showing the real decisions computing search evaluation stats
October
13th,
2024
The lack of objective definition of good search creates huge hazards when creating search, RAG, AI solutions
September
25th,
2024
Avoiding conflict is the death knell of organizations that leads to a lack of progress and careers that implode.
September
11th,
2024
In reality, staff engineers aren't about 'company wide' impact but a system of patronage where managers reward behaviors they value
September
11th,
2024
My GAR slides from Systematically Improving RAG Applications Sept 2024 course
September
7th,
2024
Junior engineers are foundational to whether a team can collaborate and innovate
August
9th,
2024
Finding search queries to improve is harder than you think. Here's one statistical procedure for deciding whether a query really has a problem -- or if its just noise.
August
6th,
2024
Integrating my BM25 pandas search library, SearchArray, into BEIR, in order to embarass myself in public.
July
31st,
2024
With normal/boring stock returns - have GOOG, MSFT, etc,run out of AI+layoff cards to play?
June
25th,
2024
All that lexical search context you need to build that RAG app
June
21st,
2024
MICES (Mix Camp E-Commerce) talk about planning e-commerce search relevance work with fast prototypes
June
17th,
2024
Berlin Buzzwords + Haystack talks about Reddit's Learning to Rank journey
May
22nd,
2024
Every team chooses different types of NDCG, choosing your ideal is perhaps the most consequential decision
May
16th,
2024
Groundbreaking and courageous software ideas start by first impressing 3 good friends
May
8th,
2024
Be of service to others and true to your craft to build a great network
May
5th,
2024
Implementing an exponential search in Cython to speed up position intersections in SearchArray phrase search
April
28th,
2024
The actual bottlenecks are the search results we gather along the way
March
24th,
2024
As Yoda would Say - A joke it is not to add slop to a search system.
March
24th,
2024
We need more than dense embeddings in our 'vector' search
January
24th,
2024
Seriously - why do we need all these vector databases? Do we need dozens of them?
January
21st,
2024
How phrase search works in search array by intersecting roaring-like numpy arrays.
November
20th,
2023
Make traditional text search a core part of the Python data stack
October
15th,
2023
Convert VisualVM's profiler output to a format suitable for a flamegraph
October
13th,
2023
Software documentation that doesn't suck needs to exist with the living
October
10th,
2023
Slides from Berlin Search Tech. Meetup describing an alternative way beyond Judgments and NDCG to think about search offline evaluation.
September
12th,
2023
Helping random people, for free, can be one of the best things you can do for your career.
September
11th,
2023
Why do we rely on such a fractured, vendor-dominated database layer in our supply chain? Why aren't we more worried?
September
5th,
2023
Slides from Chicago Search Meetup, discussing real-world tradeoffs in vector search beyond just the benchmarks
August
22nd,
2023
Implementing random projections based LSH as a C Numpy function
August
21st,
2023
Badly implementing locality-sensitive hashing as a vector search solution... for science, edification, 💩, and giggles.
July
27th,
2023
Software engineering is about designing the right feedback loop(s) with limitted resources.
July
8th,
2023
Search orgs fail because teams get stuck in functional silos rather than empowering their peers
June
24th,
2023
Visibly failing, learning, and discovering first principles is how you have real influence on a field
June
15th,
2023
Navigating between hyperfocus/executing vs perspective/information-gathering mentals states is f*cking hard.
May
29th,
2023
Where to get started, next steps to take, how to evolve search from 0 to 1.
May
28th,
2023
Feedback is the lifeblood of getting better, but be careful who you accept feedback from.
May
13th,
2023
Idiot proof "git cd" command to cd to repos in your project dir with fuzzy matching and tab completion.
May
6th,
2023
You can get started improving search relevance without labels and judgments. Which is an imperfect model anyway.
May
1st,
2023
In this post, I have very weak, uncertain labels of relevance / not. However, in aggregate, they may be able to help us make strong determination on the importance of ranking signals.
March
12th,
2023
If you know u and v's dot products to A1...An can you reconstruct u.v?
March
10th,
2023
Taking things up exactly one notch, from one shared reference, to two to estimate a dot product.
March
2nd,
2023
Given a reference vector `A`, where we know `u.A` and `v.A` what can we say about `u.v`?
February
28th,
2023
Finding the probability of a dot product between two vectors lets us quantify how much information is in cosine similarity.
February
13th,
2023
What is vector search and why all the sudden are we talking about it?
December
26th,
2022
Ninety degrees isn't particularly special in 2D, but 3D and beyond, it's the expected angle between two unit vectors.
December
24th,
2022
ChatGPT unlocks information from the Web, and away from sites that abuse their users attention with spam and writing that targets Google, not humans.
December
4th,
2022
Index some documents, provide some queries, ChatGPT will tell you the most relevant documents for those queries
December
3rd,
2022
Fred works in marketing in New York City and enjoys running. He lost his job at the beginning of the pandemic.
November
14th,
2022
With algorithm development, naive solutions provide a crucial reference implementation for your testing.
November
9th,
2022
Aliases, etc that have made rebase-based workflows in Git much less advanced feeling.
September
19th,
2022
Experiments are to search relevance correctness as unit tests are to code correctness. By definition they're a broken but nescesarry defition of the problem we need to get started.
September
11th,
2022
No need for local setup to play with Elasticsearch from a Jupyter notebook - just use Bonsai + Colab!
July
16th,
2022
Let's explore this key bias in search systems towards the old algorithm and how to overcome it!
June
20th,
2022
Analyzing the plausibility of guessing relevance judgments from runs in the VMWare Zero Shot Kaggle Competition
June
8th,
2022
Can we simulate the likely search relevance labels just from knowing which results shifted and the outcome of an A/B test?
April
23rd,
2022
Work with amazing people you love collaborating with, the rest (mission, purpose, etc) falls out from that.
January
17th,
2022
Reimplementing LambdaMART in Python for endless tinkering and learning
November
28th,
2021
LambdaMART directly optimizes whatever search relevance ranking metric matters to your business. This article details how this neat machine learning trick works to target what matters most to your product
November
12th,
2021
Contrasting how each language handles iteration helps understand how to work effectively in either.
May
5th,
2021
In this article: we assume users review every search result. So we need to find that sweet spot when we get to look reaaalllly fricken smart and declare, with confidence, "we have nothing else that matches your query".
April
21st,
2021
Using Pandas to compute Mean Reciprical Rank using the MSMarco Dataset
February
21st,
2021
Judgment lists prevent search whack-a-mole. They provide a safety net for search, allowing you to innovate quickly on relevance with a high degree of confidence.
December
22nd,
2020
High end technical consulting is a fantastic thing for you to do mid career. It helps you build a personal brand, deepen soft skills, and focus on challenging technical problems. Why and why not you might want to take this step.
August
6th,
2020
Twitter gives you an illusion of influence over political events. In reality, it meaninglessly fiddles our energy away. Doing our duty requires real work in the real world.
May
20th,
2020
What I know so far on getting cheaper delivery in Charlottesville that avoids Grubhub's shenanigans and supports local restaurants
April
5th,
2020
Add friction to your twitter login to keep yourself sane.
August
26th,
2019
Write to grow closer to the truth. Not because you have all the answers, not to get page views or win internet points. Instead write to broach a point of view and test it against your audience's norms and points of view.