AI in XWI Q&A

My last post generated so much email that I thought I’d better answer some of the questions here.

I Hate AI

Not a question, but I understand. If you don’t join the Beta program, you’ll never see the AI button, and everything you do see will be generated by real humans.

Why, oh why, are you doing this?

I’m interested in crosswords, and I’ve been able to look at them in many statistical, mathematical ways, but the core of what makes crosswords interesting is that they are an artform, designed to twist your brain, and then give you an endorphin rush when you figure them out.

How would a non-human “brain” experience this puzzle designed to deceive and delight humans? The answers say something interesting about both computers and humans.

The good news is that AI does a poor job of extracting the elements that make crosswords most enjoyable. But now and then, it comes up with ideas that at least appear to be surprisingly insightful.

How do I join the Beta?

You no longer have to send me mail. Once you’re logged in to your XWord Info account, you’ll see a link at the top called something like Your account. You’ll see a checkbox. If you’re at the Angel level, you can check or uncheck that to join or leave the program.

I may extend that to all users later, depending on feedback.

What can I do when I join?

When you look at any puzzle back to January 1, 2025, you’ll see a new View AI Analysis button. (I will probably extend further back.) Click that to see a summary of the theme, notable entries, clue highlights, construction notes, and overall assessment.

Beta constructors can also get the same information on their unpublished puzzles by uploading them here.

How reliable is this?

It varies wildly. Go for the entertainment and stay for the occasional useful gems.

AI has a tough time picking up some themes. Sometimes it completely nails a complex theme, and sometimes it complains about things it doesn’t understand. Themeless puzzles do better.

In either case, there is often useful information. Its opinions should be taken with a grain of salt, but they’re often justified. It tries to fact-check clues. You’ll need to check the accuracy yourself.

Here’s a recent check that amused me:

40-Across MERLOTS, “Sémillon rouge and Médoc noir”: this appears incorrect on its face (Sémillon is a grape, Médoc is a region; neither naturally maps to “Merlot(s)” in the way clued). Even if there’s a wordplay angle, it’s not coming through, and it risks feeling like a mistake rather than a trick.

Are you trying to be a blog?

Not at all. Only humans can judge the merits of a crossword. Besides, my focus is different. Blogs look at crosswords from the perspective of a solver. I’ve asked the AI here to think of itself as a constructor looking for advice and suggestions.

Constructors have long been able to share their uploaded crosswords and generate a private online solving link to share. That’s the best way to get useful feedback, but now they can get an additional level of support by asking AI for ideas for improvement.

How should I interpret the results?

Complaints on the published puzzles are often unfair. After all, they’ve been professionally edited and fact checked. They’ll still point out difficult crossings or loose clues, but you can assume they are intentional or at least needed to support other elements in the grid.

For uploaded puzzles, especially themed ones, it depends on how well the AI can figure out your trickery. Even when theme detection fails, there are still often useful ideas. A few constructors have already told me they wished they had this capability before submitting puzzles for publication.

2 comments

  1. Wait, why were you amused by the AI’s take on my clue for MERLOTS? There’s a lot of humor in there for me too, but it takes a bit to unpack.

    Let’s start with the factual basis for the clue {Sémillon rouge and Médoc noir}. Each is a synonym for the Merlot grape. This is easily verified with a quick Internet search for each.

    Irony #1: I estimate that for every thousand solvers who correctly entered MERLOTS, at most one did so based on knowledge of these two varieties of grape. I expect the reasoning was almost universally more like, “I am not familiar with either of those terms, but they sound French, they sound like wines, and red wines in particular, so let’s see whether MERLOTS works.” And because MERLOTS matches the crosses, almost no one gave it a second thought.

    Irony #2: I have never consumed an alcoholic beverage in my life. I wouldn’t know a Merlot wine from a bottle of off-road diesel. My lack of domain knowledge is so complete that I had no sense of how profoundly obscure my grape trivia was. As far as I knew, Sémillon rouge and Médoc noir might have been widely recognized among wine connoisseurs. But no, the grape variety trivia was so little known as to offend a few experts, prompting reactions along the lines of, “Listen, buddy, if these were real wines I would know it, but I have never heard of any Merlot wine by either of those names.”

    Irony #3: What I normally want to use AI for is discovery of facts that are difficult to ascertain via a standard search engine. It therefore infuriates me when an AI chatbot performs *worse* than a search engine at fact discovery. The top non-AI hit for Sémillon rouge is the Wikipedia article which gives it as a synonym for Merlot grape, and likewise for Médoc noir; where does Jim’s chatbot get off confidently asserting that *neither one* ‘naturally maps to “Merlot(s)”‘? Somebody unplug that idiot AI and put it out of its misery! But, hang on a second, fact-checking is totally not the purpose of hallucinatory AI in this context; a more relevant purpose would be simulating the possible response of a human solver, and the AI did a perfect job of being confused and mistaken in exactly the way a wine connoisseur might be.

    Irony #4: Despite giving a very human-like reaction, the AI would have made my puzzle worse if I had acted on its advice. My clue worked great in practice, because people could get the right answer for the wrong reason. A human could incorrectly assume that the clue named two Merlot wines, and then cancel out that wine/grape error by assuming those wines were unfamiliar ones. Linguistic associations were entirely adequate to arrive at the correct answer, so the revealingly non-comprehending AI critique turned out, in the final analysis, to be non-comprehending in an irrelevant way.

    Life is bewildering sometimes. I’m talking through it with my AI therapist.

  2. Jim’s not wrong about one thing: AI can definitely struggle with crossword puzzles, particularly with themeless clues that have a lot of serious wordplay.

    I got a big chuckle when I had the AI bot review one of my unpublished puzzles and it came back telling me the puzzle had a couple of fatal flaws that would require me to rip out, and redo from scratch, a large section of the puzzle’s fill.

    I got an even bigger chuckle when, two days later, I received Joel’s acceptance email telling me how much the editing team loved the puzzle.

    AI’s not there yet. I would consider it more an amusement than a serious constructing tool. But, amusements can be fun, too. And, who knows where this will lead in the future. Of course, as always, YMMV.

Leave a comment

Your email address will not be published. Required fields are marked *