There is no I in TEAM, and there is no AI in XWord Info.
Unless you explicitly ask to see it by joining the Beta program.
Artificial Intelligence
AI is controversial. I was reminded of this when some AI experiments I was doing accidentally leaked to public pages. Some people were intrigued. Many others were outraged. Accounts were cancelled. Disparaging remarks were hurled. I quickly pulled it all.
I get it. Evil people use AI to do evil things. Many good people use AI to benefit the world too, but skepticism and fear are justified. If you’re intrigued by how computers can think about crosswords, read on. Otherwise, you’ll never see it mentioned on the site again.
There’s at least amusement value in looking at comments on published crosswords. There may be actual usefulness for constructors evaluating their own creations prior to submission.
A collection of curiosities
XWord Info started as, and continues to be, a celebration of NYT puzzles and the people who make them. Along the way, as more and more pages got added over the years, it also became a collection of curiosities. Which puzzles had the fewest blocks, or highest Scrabble points? Which constructors debuted as teens or had the most collaborators? That sort of thing. Every time I got curious about something, I added a new page.
Recently, I’ve been curious about how software could evaluate crosswords. Crosswords are a very human artform. Furthermore, they’re designed to deceive. How well could AI make the same kinds of “mental leaps” that humans do to make the connections required to match answers to clues or figure out themes?
The answer is: surprisingly well in some cases, and incredibly poorly in others.
The process
I’ve had AI analyze the last 200 or so NYT crosswords. Along the way, I’ve experimented, and I continue to iterate.
The first step is to craft a prompt. I needed an algorithmically generated set of facts and questions that would help AI make good decisions. It had to completely describe the grid (including shades, circles, etc.), the clues and the answers. It had to be short. I ask for overall evaluations and suggestions for improvements. (Suggestions are sometimes hilariously off target.)
Next, I determined which of the many AI models to use. I had to figure out how many “tokens” to spend on each request that would give reasonable results without costing too much actual money. (If tokens sounds like it’s a video game, well, it kind of is.)
A parameter called “temperature” ranges from 0 to 1. Lower values give more predictable and reproducible fact-based results. Higher values allow the AI to become more “creative” and possibly “interesting” with the potential downside of more hallucinations. I settled on a value of 0.7, but I’ll keep tweaking that too.
What AI does well
Results on themeless crosswords are often interesting. It sometimes finds culture references I’d never get on my own. It has opinions about whether sections of the grid might be too difficult, what fill feels iffy, what clues are particularly clever.
It often fact-checks, citing what it thinks might be mistakes. Sometimes these are quite funny. Example:
ENNEAD clued as “The Brady household, including Alice, e.g.” — This is clever-ish (nine people), but it’s also pretty strained: most solvers don’t naturally categorize the Brady household as an “ennead,” and the count depends on including Alice (and arguably whether you’re counting Mike/Carol’s prior spouses, etc.). The word is real, but the example feels like it’s doing a lot of work.
It finds things that look odd in the grid, pointing out answers like MRT, ATEAT, TWOD, or RANDB. These bother me too. Another example:
“There’s no such thing ___ publicity” = AS BAD: The phrase is familiar, but the fill AS BAD can feel slightly “fragmenty” in the grid. The clue does its job, though, and Monday solvers will likely be fine.
Where AI fails
It makes mistakes. Lots of mistakes. That doesn’t mean it has no value. It means you have to read it with skepticism and be willing to wade through the obvious errors to find the gems.
AI, at least the models I’ve been using, struggles with figuring out themes. I can’t predict where it will have problems. Sometimes it nails complicated Thursday gimmicks but fails on Mondays that seem obvious. I sometimes push the AI by expanding my Jim notes, which feels like cheating, but it makes the rest of the analysis more useful. When it doesn’t understand the theme, it spends too much time complaining about flaws that aren’t real.
A common complaint is that the connection between clue and answer is too loose. Of course, that’s often the point. It’s part of what makes crosswords fun.
Analyzing your own puzzles
Constructors can already see a version of what their unpublished puzzles would look like on XWord Info using the Analyze page.
If you join the beta program, you’ll also be able to see what AI thinks. Again, it will find real ways to compliment you on your brilliance, and it will find real flaws you might want to address. And it will generate lots of bogus suggestions you’ll need to ignore. It might also be fun.
For now, I’m requesting that you don’t use this feature too often. It would be easy to blow through hundreds of (my) dollars, but feel free to try it out.
How you can participate
If this sounds intriguing, you need to join the beta program. To keep the numbers down, at least for now, you need to have an Angel level account, and you need to send me an email requesting to join. At some point, I’ll provide a page for you to join or leave the Beta on your own.
When you join, you’ll see a new AI button on recent grids.
I’ll be curious to hear what you learn.
Confession
I use AI to help me code XWord Info. It’s done a great job helping me find bugs and implement new features. It has paid for itself by making the site run more efficiently, so I could downgrade the servers I use. One page that used to take over 2 seconds of server time to construct is now rendered in under 100 milliseconds.
Like most coders who use it, AI has made me a better programmer. I’ve learned a lot from a system that knows way more than I do.