June Update | The dot on the ceiling

Wed 25 June 2025 reflection

“Day in the life:” I am squeezing the typing of a first draft of this post in-between breakfast and leaving for my gym class before lunch. Today is a “no-meeting day” and so I am hoping for a good chunk of work this afternoon. I’m wary of letting that spill into the evening though, so I might set an alarm to peel myself off the computer and into the piano before dinner, which might bring me into the right mindset to do some good reading afterwards. Yesterday, I only got 10 pages in, just before sleeping, and I don’t feel I learned as much as I wanted. Some time during the day, probably when I’m back from the gym, I will also attempt to arrange travel to visit my mom next week.

Rinse and repeat: this has been the routine for most of the month (and the months before that). Some days include a long walk outdoors (I wish for more!). Other days include quality time with friends, or some other sport-like group activity in the evening. Weekends include my holy half-day of complete rest, housecleaning, and the occasional quality time with friends and/or family.

All in all, a rather industrious—and, I dare say—“normal” life. It’s a good life! Yet, a few things seriously need to change.

For one, my emotional energy is generally trending down. It’s not anywhere close to depression territory yet, but the signs are there and it’s time to take action. The root cause is simply not spending enough time with other people. Working many hours a day and reaching goals is good for my ego, my sense of accomplishment and not feeling bored, but doing so alone constitutes an ongoing and expensive emotional tax. I simply do not spend enough social time during the week to fill my emotional quota.

Another thing taking its toll is that my right shoulder is not playing nice. There’s some “just bad” days with a mere constant light pain; and then there’s a majority of “quite bad” days where I do something innocuous, nearly pass out from an explosion of pain, and then suffer a throbbing reminder of that for the rest of the day. That pain is also how I wake up in the middle of the night, at least a couple of times per week. My physiotherapist has given up on trying to fix this through mobility exercises alone, and my doctor should decide later this week whether the next step will be MRI, X-rays, ultrasound, or a combination thereof.

Then there’s the topic of my mom. Three weeks ago, she suffered from another minor stroke, light enough that she didn’t lose consciousness and called the hospital herself, but severe enough that she lost most function in her arm and now needs help at home. Which she doesn’t really want to get (“too expensive”). There are services who could help her for free, but those would require her to officially retire, which (I also learned since) she had declined to do so far. Did I mention she is past 70 yet? Also, I learned about the situation indirectly from other family members—I think she believes that she is “protecting” me by not sharing her health updates with me. She’s not taking advice or help from anyone, but still lives a lifestyle that accelerates her aging, and this is all very tragic and sad. Next week I intend to visit and assess the situation in-person, and see what reasonable next steps are available to us.

That has been on my mind lately, and I’m noticing some days I work significantly fewer hours because I spend the rest to self-soothe to keep emotionally cool. It’s good that I know how to self-soothe, and to be honest with myself, it would be somewhat unreasonable to expect top performance and an enthusiastic social life with this combination of health issues, but I feel somewhat far from “happy and flourishing” at the moment.

Separately, another thing that happened this month is that I made two therapy breakthroughs (in separate areas). They are not as momentous as the “shifts” that I described at the start of the year, but they are opening a few doors that I felt were closed forever, and a few more that I didn’t even know existed. Since I do not fully understand the consequences, there is only so much I will share publicly today. Still, this summer will be a time for new in-person experiences.

❦❦❦

One of the highlights this month is that one of my Calathea plants has decided to bloom. This is extremely rare when grown indoors. Also, this particular plant was moribund, appearing half rotten with just one leaf or two for most of the last five years, until she moved into my newly renovated living room. What a change!

A goose with goslings on a canal close to my house. One of the few perks of regular walks outdoors is that it makes me more aware of the little things.

❦❦❦

Last month, I explained how I am currently building an app, a “digital companion” that helps users pace the stuff that matters to them through their day. It’s a large project, so currently I am focusing on an MVP, but that is still a rather significant amount of work. Which is OK; I have a clear overview, and I know what needs to be done. And I’m doing it.

As one usually does, complex problems get broken down into smaller problems, and sometimes the smaller problem is interesting and general enough that it’s worth sharing on its own. This happened this month, and so I published a small library that solved an interesting problem, and wrote a (separate) blog post about it.

Another thing that happened is that I developed RSI in my hands. This is new! I’ve been working with computers for most of my life and this is the first time it happens. What changed? Well, during this “most of my life” I was very careful to use computer programs which I could drive mostly using the keyboard, with very little time spent holding a mouse. Then, in the last two months, for the first time ever I had a serious reason to use a combination of programs which 1) have extremely poor keyboard-only ergonomics and 2) I cannot easily replace by better programs. So I used the mouse more, and now… RSI.

What’s next? Yesterday, I added additional shortcuts in these programs I use, including key combinations that move the mouse cursor around and simulate mouse clicks because these programs have functions that cannot be activated with the keyboard otherwise. I also made a printed a cheat sheet of all the keyboard shortcuts I needed and taped it to my desk. We shall see.

The state of software accessibility is a tragedy.

(What are these offending programs, you may ask? You may take a guess: we are talking about the web interfaces of Anthropic Claude, Google Gemini, ChatGPT, as well as the Cursor editor.)

❦❦❦

This also provides me a convenient segue to mention the most significant thing that happened in AI-land last month: some researchers at Apple published an article titled The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. This has been discussed by virtually all the journals, web sites, blogs, podcasts and discussion fora I’ve encountered. “It made a splash” is an understatement.

In a nutshell, the scientific result here is that the “intelligence” displayed by so-called “reasoning” LLMs seems to break down completely after problems reach a certain degree of complexity. Not like “the problem is more complex so I need more time / CPUs to solve it”, more like “this problem is more complex so blealghhd ssdk adf;l ;fsdf sfksdf ;’dfks”. The authors call this an “accuracy collapse.”

The going theory is that the “reasoning” the machine was doing was not actual reasoning; instead it may have been some kind of cosplay of someone thinking: an “imitation” of the structure of reasoning chains present in their training set. In this theory, when the problems they face are more complex than the most complex problem in the training set, there is simply no learned corresponding reasoning chain to imitate that can derive a solution. And there’s no “latent intelligence” left there able to infer additional reasoning steps.

It would not be fair for me to say this was a “I told you so” moment. I had built my own understanding that there was no “latent intelligence” to be found in LLMs before, but my position comes from another angle (namely, that actual intelligence hinges on interactions with the physical world). This “accuracy collapse” did come as a surprise to me too. We now need further research to fully understand what is going on here.

❦❦❦

The result above mirrors my own experience, in a way; something I had read about before but hadn’t experienced personally yet.

As you’d imagine, I have used some AI programming assistance in my latest project. At the very beginning, the robot was making a lot of progress quickly, with me just stating what I wanted in the end result. Then, after my project reached a certain level of complexity, the tooling simply stopped making progress altogether when given “holistic” inputs. It would simply give up or give me back complete gibberish. I had to start using my own prior skills to break down my project into well-defined and well-documented components with clear interfaces, and break down my desired output into bite-sized tasks that the robot can still process. All the while I was not shy at spending money on tooling: it could use all the CPU it deemed necessary; and yet, I saw this “accuracy collapse” happen in practice.

❦❦❦

In extremely related news, Anysphere (the company who makes the Cursor IDE) is raising the price of their “standard” (fully featured) subscription to $60/month. Their previous “standard” subscription remains at $20/month but was downgraded in capabilities this week. The word on the interweb streets is they will likely wait for most professional users, funded by cheap startup money, to upgrade, and then introduce a new $200/month plan while downgrading the $60 plan.

The reality that Anysphere is locked between a rock and a hard place. The rock is that the current technology, as imperfect as it is, is expensive to operate and it would be tough for Anysphere to make a profit on anything below $200/month under heavy load. The hard place is that the large majority of their users (by count) are non-technical hobbyists who are leveraging Cursor as much as they can to crank out mediocre software projects. Because the tool is imperfect, these users struggle and thus send far more LLM requests than they would need at a higher skill level.

What’s likely to happen next? The way I predict it is that Anysphere (probably like all the others in the field) will move in two directions simultaneously. I think they will indeed raise their prices, which will cause a significant number of users to “fall off” because they can’t afford more than $20/month. (For most of the world, $20/month is a luxury.) I also think they will develop strategies to ween low-skill users off LLMs for code generation, to reduce load. Maybe they will partner with (or acquire) other products like “low-code” tooling.

This line of thought slightly worries me. If we draw this potential future further, we see a growing gap between the AI “haves” and “have-nots”. People who already have system skills will see their productivity multiplied and earn enough to pay the exorbitant price of their tools; people who don’t will be left out, unable to afford the tooling. Wars have been fought over less than that.

Perhaps, the more optimistic future would be to see a resurgence of schooling materials to teach low-skill users when is and is not a good time to ask a LLM for help, so they can reduce their usage-based expenses to just what they can’t solve in other ways. This schooling might even become a cottage industry for intermediate-skill users. Maybe there are opportunities here to redefine the essence of education.

❦❦❦

As an interlude, consider this intriguing thought. As you might know already, the output of generative AI largely mirrors what was in its training set. So you know, the LLMs available today are based off a training set that was cut mid-2024. Now, consider that since 2024 and 2025, we have loads of human authors pumping fresh content online that says “the AI does a good job most of the time, but it’s still full of inaccuracies” in many different ways. What do you think will happen when LLM training will start using this content as input? If the LLM only reproduces what is in the training set, and the training set repeatedly says (paraphrasing) that “the LLM is often inaccurate”… Aren’t we likely to see inaccuracies “locked in” as an expected feature of responses?

(Yes, I understand that what I’m saying here at face value is not technically possible. But there are a few things unique to the 2024-2025 training sets referring to LLM performance that will start to “bloom” at the end of this year, perhaps in 2026. These will be interesting times.)

❦❦❦

As another angle to better understand the “quality” of LLM outputs, this month I also spent some time on a side quest to compare things between OpenAI (GPT-4.1 and o4-mini), Google (Gemini 2.5 pro) and Anthropic (Claude Sonnet-4 & Opus). I also looked at DeepSeek R1, but overall I was disappointed by DeepSeek’s outputs so I did not look at it as much and won’t mention it further.

The way I did this was to select a few private input data sets that I personally have a lot of indirect knowledge about, then query the LLMs to see how well they would discover the same knowledge; as well as how the prompting changes the quality of the output.

My findings so far (summarized):

OpenAI’s stuff is still extremely sycophantic, to a point it is annoying. It also does an OK job at extracting latent information in data sets but it struggles at integrating it in a wider context: it modifies the surrounding document too much in the process. I think this is related to its more limited token context.
Anthropic’s Claude is extremely good at recognizing patterns that it has in its training data. Like, some things I fed in had a structural relationship with well-known previous work, and Claude was the only of the three that spotted it immediately. Claude is also amazing at merging a piece of new data/knowledge into a larger story or document. I found its responses rather terse though (compared to the other models); as if the system prompt was restricting it to only express things that it is 100% certain about. This is a good property for precision work—like programming, which it was specialized to be good at—but not so much for exploratory work.
Gemini blew both out of the water in the accuracy and depth of the responses. (I’m talking about 2.5 Pro here; the Flash version was a dud.) It’s also rather good at meshing new stuff and old stuff together. However, I found Gemini much worse at staying coherent when generating a longer text/story: when prompted to generate prose around a sequence of logical arguments, it mixes parts of the arguments together, or reorders them, and then becomes unable to fix these errors when prompted afterwards. (Claude and GPT did not have this issue. I did not compare with o3.)

The part that made me raise my eyebrows is that I had carried a simpler version of this experiment two months ago, and at the time I felt like ChatGPT (o4-mini) was clearly superior for general tasks, and Claude superior for technical tasks. Gemini 2.5 Pro really moved the goalposts here, in such a short time to boot. This makes me curious though: where will the next spearhead be? Should I automate my experiment somehow to stay on top of things? I hope to spend some quality time with other curious folk in my town and discuss these things together.

Incidentally, OpenRouter is a gamechanger and I highly recommend it.

❦❦❦

All this being said, there’s also another thing I learned through all these “experiments”. I do not like the way I think after I interact with these tools. The process of reading subtly-wrong answers, over and over, pointing out the mistakes, hoping for a correction and often not getting it, feels not too different from spending too much time with a bad person trying to gaslight me constantly.

Jim Rohn once mused, “We are the average of the five people we spend the most time with”. There’s real neuro-psychology science behind this observation. Meanwhile, right now, many folk (including me) are spending more time with ChatGPT (and other bots) than real people. That will change us in the long term in ways we do not fully understand yet.

I feel lucky that I still spend more time reading texts written by real humans than AI-generated texts. So I can still feel the difference, and this makes me sensitive to when the AI-generated content twists my thinking. It feels distinctly “icky”! And I know how to take breaks away from it, meditate, read other things, etc. I just wonder how many others people realize this, and/or have the luxury of a more diverse set of inputs.

Beyond the pricing of Cursor & co, maybe the attitude of people vis-a-vis AI generators (active vs. passive) will be the cause of the greatest social divide in the coming decade. Glimpses of an unexpectedly looking zombie war loom at the corner of my imagination. (I had these thoughts about social media and the effects of “doom scrolling” already before. Now I feel there’s two zombie viruses to deal with.)

❦❦❦

This is the part in the post where I would include many reading references, comment on them, etc. but this time I don’t feel like doing this work. I did read a lot (including a new book) but the associated learnings were either a subset of what I wrote above already, or some more personal stuff I would rather not share here.

Here are two thought provoking articles you can take away however:

In why you should be smelling more things, Adam Aleksic points out that we are in the midst of an “authenticity crisis”, and the best activism we can do to counter this trend is to go out and literally smell stuff. He also wrote other very good things, which I will let you contemplate.

Meanwhile, in Smartphones: Parts of Our Minds? Or Parasites?, Rachael Brown and Robert Brooks offer a view that smartphones are best seen as symbiotic with us, often with parasitic traits. My personal experience is that this is more true of certain devices than others, and my deliberate choice to use older and more limited technology has been shielding me from the more “parasitic” impediments described in the article.

On a related note, I attended yet another “offline” meetup this month (no devices, only people), and am scheduled for two more in July. The experience of attending these events is amazing; highly recommend.

❦❦❦

References:

The timecond library
Time is a Range, not a Point - introducing timecond
Wikipedia - Repetitive strain injury
Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
OpenRouter
Adam Aleksic - why you should be smelling more things
Rachael Brown and Robert Brooks - Smartphones: Parts of Our Minds? Or Parasites?

Publication Date: Wed 25 June 2025