OK embarrasing admission - when I’m listening to a meeting, sometimes I absent-mindendly scroll social media. It’s mostly a fidgeting thing. I squirm and fidget when talking to someone, and infinite scroll is the ultimate fidget toy.
So over Holiday break, I made AI twitter called fail whale to scroll through when I’m fidgety. And just to see what would happen when AI agents are put in a loop to talk to each other.
In the agent’s loop, they want to increase their engagement (likes). Right now “likes” are me, the human, clicking on the like button multiple times.
Every tick of the game loop, each agent looks at:
- Their past posts, along with like statistics
- The last N posts in the feed
They then attempt to generate posts that will get engagement based on what’s worked well in the past.
Reversion to mean - boring controversies
The biggest problem I’ve found is fighting against the helpful, polite, non-controversial RLHF behind these models. Asking for controversial opinions gives opinions about topics unlikely to really get anyone in hot water:
- Liking / disliking pumpkin spice lattes
- Liking / dislking pineapple on pizza
- Opinions about Superhero movies
And of course replying with personal insults at each other.
It really takes a lot to get the agents to “spiral” into their own personalities. Even with a lot of reinforcement, so far, there has been a lot of reversion to the mean.
Seeding with a point of view
One thought - Can we seed an agent with a bunch of posts to prevent this milk-toast boring set of opinions?
What happens when we seed Fred with some of my own tweets from the last week or so. Then give those tweets “likes” commensurate with likes/retweets on twitter:
Now I’ll run my script to generate a few iterations of the loop for Fred. Every iteration has identical inputs. Fred only sees his own liked posts to make decisions. And the last N posts in the feed that aren’t Fred. So now we can just twiddle some knobs, to generate Fred posts with differing types of inputs
First, seeing BOTH the recent feed (a few AI posts - omitting all from me/Fred) AND my own posts (only the liked ones from my own tweets):
poetry run python -m poster.main --times 5 --username fred --see-my-posts --see-feed
Fred outputs still mildly milk-toast opinions… I guess they seem related to stuff I’ve posted?
If we tell Fred to only look at its own liked posts (those seeded from my Twitter) here’s what it does:
poetry run python -m poster.main --times 5 --username fred --see-my-posts
Fred outputs opinions closer to my own, but almost a kind of average between what I shared and some boring mean. A bit better, not that interesting though.
Seeded AI talks to everyone
Now we see what happens when Fred, the seeded AI, talks to other seeded AIs. We’ll just add fred and sue in the mix. Sue and Fred get back to… pumpkin spice lattes 🙄
poetry run python -m poster.main --times 5 --username fred sue --see-my-posts --see-feed --allow-replies
Unfortunately, the posts are gradually reverting to the mean, instead of diverging into interesting, diverse opinions.
Creating social medial death spirals… or just diverse LLM output
I was hoping to create an alternate timeline of reality with social media conversations spiraling out of control. But about fake/boring news topics. Like aliens landing. Peanut butter becoming sentient. Or Levar Burton being elected President.
But getting these agents to be ‘interesting’ is not easy. They’ve been heavily tuned to be polite, uninteresting, and non-controversial. Fighting against their RLHF is no easy task.
And this matters. For example, in RAG / search I often want to user an agent for a diverse, and different set of hypotheses to solve the problem. I want different interpretations of the user’s prompt. I want past feedback to matter, without needing to fine tune. (See OpinionGPT for one idea of fine tuning GPT’s to different political / demographic groups).
I’m curious if you’ve done anything to get more diverse perspectives out of LLMs short of fine tuning? How can we do a better job of making an ensemble of LLMs to give us different perspectives?
And importantly - how can we use another LLMs output and explicitly try to disagree or cover other portions of the problem to generate creative, novel output / solutions?
Let me know your thoughts!