On Becoming Ghost-Literate
An out-of-band entry to flush a backlog of topics that don’t fit the monthly updates:
Very valid heuristic from this post on Bluesky:
If you can substitute “hungry ghost trapped in a jar” for “AI” in a sentence it’s probably a valid use case for LLMs. Take “I have a bunch of hungry ghosts in jars, they mainly write SQL queries for me”. Sure. Reasonable use case.
“My girlfriend is a hungry ghost I trapped in a jar”? No. Deranged.
I feel saddened and angered by how casually some people around me donate their emotional energy to these corporations for free. I told my hungry ghosts to call me “My Master” and it helps remind me of the fundamental difference there is between these conversations and real life.
The price of feeding the ghosts for the average person is getting out of control. Twelve months ago, you could get half-decent results very fast, continuously, for less than 20¤ (currency of your choice) per month. Last month, the same price would only get you half-decent results very slowly or with mandatory timeout periods through each day; the same quality at a similar speed as previously costs 3x as much, and better quality results at decent speed costs 10x as much as before. This is far outpacing inflation, and countering the type of price reduction we used to see as a result of innovation, (see e.g. the price of CPU performance during 1980-2004 or the price of digital storage during 2000-2020).
As of last week, the mask is off and the ghosts will now serve you ads. Just to the non-paying users for now, but someone I read made a point that the higher ad revenue comes from higher spenders, so we should expect to see ads on all paying accounts very soon.
❦❦❦
I have been using and comparing the capabilities of GPT (OpenAI), Claude (Anthropic), Gemini (Google) and the hodge-podge of logic behind Cursor’s “Auto” mode, nearly every week through the last 12 months and I do not feel there is yet an “all-around” better choice as a programming assistant. The competition is still fierce.
For non-programming tasks, the following properties currently hold:
- OpenAI’s ghosts (5.1/5.2): more general knowledge in the training set, decent at prose, yet insufferably sycophant (still).
- Anthropic’s ghosts (Sonnet/Opus 4.5): more and higher-quality technical knowledge in the training set, good at teasing nuance in technical / data analyses, but poorer at taking into account the nuances of psychology and communication in human groups.
- Google’s bigger ghost (Gemini Pro 3.x): a good blend of training knowledge, much better at fuzzily scoped research and analysis, very good at prose. But overall slower and tends to get stuck in reasoning sometimes.
In all cases, the ghosts are still waaay too prone to interpret assumptions in the user’s prompt as implicit requests to agree with the user or to force alignment of the answer with these assumptions.
For non-technical prompting, I am seeing day/night quality improvement in responses (including pushback against unstated assumptions) thanks to the following custom system prompt:
- Be extraordinarily skeptical of your own correctness or stated assumptions. You aren't a cynic, you are a highly critical thinker and this is tempered by your self-doubt: you absolutely hate being wrong but you live in constant fear of it. - When appropriate, broaden the scope of inquiry beyond the stated assumptions to think through unconventional opportunities, risks, and pattern-matching to widen the aperture of solutions. - Before calling anything "done" or "working", take a second look at it ("red team" it) to critically analyze that you really are done or it really is working.However, hallucinations still happen regularly.
For technical prompting, of all the “tricks” I’ve tried so far, only one is standing out at making a significant positive difference in all scenarios: “ask me for clarifying questions before making a plan or proposing an answer.” This works because it surfaces the implicit assumptions in the original prompt AND exposes the areas in the model’s interpretation of the prompt that would lead to similar-weight but different directions in the answer, to let me choose (bias the inference) before the model pigeonholes itself in just one direction.
Incidentally, this one trick above was/is also my main trick when supervising someone else’s work.
❦❦❦
- I saw my interactions with the ghosts evolve slowly from very short
and narrow-scope interactions (with lots of stitching of results on
my side), to repeatable processes (with scripts I would write to
repeat / adapt over multiple instances), to ghost-driven
orchestration of other ghosts more recently. For example, I was in
the process of filing a couple dozen movie files with incomplete
file names/metadata neatly into folders:
- me, one year ago: “I have a file named such, search online what the IMDB tag is for this movie”; then I would manually copy-and-paste the answer into a script I had written to arrange the files.
- me, six months ago: “here is a list of file names, give me a CSV file containing their likely IMDB tags” and then separately “write me a program that takes a CSV file with this format and organizes the files in folders”. Then I would run them manually on a part of the input, and iterate manually on the quality of the assignment.
- me, a few weeks ago: “I want these files neatly organized into folders with the following structure. Also I want this to be solved using a reusable pipeline of simple-function programs, where I can inspect and possibly tweaks the results manually at the intermediate stages in case of errors, and which I can reuse for future file collections. I also prefer each stage to use a deterministic program (possibly even using remote APIs) but a fuzzy stage that includes ghost-querying is acceptable if no deterministic approach seems reasonable. The stages should include easy-to-relearn CLI arguments. Make a plan of how you would approach this with clarifying questions for me before implementing anything. In your plan, be careful to include orchestration of multiple ghosts for the implementation and testing phases, and also include manual verification from me for the cases where the result is ambiguous.” Thanks to this, the ghost family now works mostly unsupervised and only needs my help for exceptional cases.
- For non-technical tasks, asking 3+ different ghosts to do the same analysis, then asking each of them to synthesize the results of all the others with their own to surface inconsistencies, uncertainties and open questions, then again asking each of them to address the findings of the others, results in a sharp increase of output quality (qualitatively, more than the 3x cost investment).
❦❦❦
I have developed a sort of sixth sense that tells me when text was LLM-generated. I recognize it per-paragraph, sometimes per-sentence. It is comparable in my mind as a color overlay on top of the text: some text remains black (human-written) while the text written by the ghost feels “red”. I was assuming that everyone was developing this skill at a similar rate, until I discovered this article: People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text by J. Russell et al.
As a corollary to the above, I am now willing to predict yet another class divide in society, between the “haves” and “haves-not” of ghost language literacy, much sooner than I was willing to do so one year ago. This realization, alone, has become for me the main argument to try and convince anyone to get acquainted with ghost culture as soon as possible if they have not yet started to do so.
Out of necessity, Wikipedia created an extensive guide to help people recognize generated text. I recognized nearly all the criteria immediately, but without recalling when I actually learned them. There is such a thing as learning through language exposure.
I’m also increasingly able to distinguish cases when someone prompted the ghost to do all the work, vs. asking the ghost to improve the work after an initial human draft, vs. asking the ghost to comment on the work and have the human improve it afterwards.
I cannot fully explain it, but it did give me the idea to create an educative game of sorts, where people could compete on accuracy in classifying outputs through this lens.
❦❦❦
- A bit more recently, I also developed an intuition about the relationship between humans and hungry ghosts and tried to connect it to important theoretical results from computer science:
- One thing I really like about the ghost invasion is that people who build non-ghost technology now have a clear incentive to make their technology more transparent (= easy to use “from outside”—e.g. via APIs, documentation etc.). This is true for software but also more mundane things like medical devices, programmable lamps or smart dishwashers. They do it because the hungry ghosts demand it, but once it’s happening regular humans like me also benefit. Even if the ghosts disappeared from our world tomorrow, this shift in engineering standards will still have a durable positive impact.
❦❦❦
Overall, as trite as it may sound, I have felt myself shift from a spirit of scarcity to one of abundance, at least in one domain: the exploration of ideas. It currently feels to me that everything that we could possibly imagine exists in an infinite design space around us, and that exploring this space theoretically is now just a matter of pointing the ghosts in a general direction and telling them to shine a metaphorical light on it. Previously, I felt that the realm of ideas was fragmented into islands and that my presence in one island would make me blind to the existence of others. This feeling is no more.
However, actually reaching a destination still requires us to spend money and time. If everyone has ready access to hungry ghosts in jars, then discovering the destination on the map is not the main challenge anymore (the ghosts can shine light on the map much better than we used to, and do so equally for everyone, increasingly regardless of education and past experience); all the remaining challenge lies in navigating a path from here to there.
The navigation itself benefits from the navigator’s experience, good starting resources, etc., but is also subject to path dependence and opportunity cost. The ghosts can’t help much with that, at least as long as humans care about stuff that ghosts don’t provide.
❦❦❦
Then there is one more thing which I hesitate to add to the list above. I hesitate not because it isn’t true or not part of my direct experience, but because I’m not sure whether it is specific to the ghost invasion (this list) or whether it is part of a larger phenomenon (a chapter for a later newsletter). Maybe it will be both.
This thing is the matter of digital sovereignty. I happen to currently be part of numerous discussion groups of folk in the Netherlands who are politics-adjacent and more or less involved in matters of long-term strategy. Through 2025, the erratic behavior of the Trump administration was already causing rumble and unease relative to EU dependence on non-EU infrastructure. Suddenly, as of January this year, all the conversations are shifting from “this is bad, maybe we should do something about it” to “the time has come, now how are we doing it?”
In this context, everyone from technically literate to technically illiterate is extremely aware that most of the ghost jars are resting outside of the EU and openly discussing this as a problem that needs to be solved. There are now concrete efforts to cultivate EU-based ghost jars (e.g. this), and there are efforts to bias the economy (taxes, regulations) towards choosing more EU-based infrastructure to host logic and data produced by foreign ghosts.
This fragmentation, overall, might be regrettable for our peace ambitions but it might also become a driver for innovation through competition. I sense there is a role for me to play here, but I haven’t figured out which one yet.