TLDR: This research paper investigates why Sudoku difficulty varies so much across different websites. It proposes two new metrics: one based on the structural complexity of a puzzle when converted to a Satisfiability (SAT) problem, and another based on simulating human solving strategies (Nishio Human Cycles). By analyzing over a thousand puzzles from five websites, the study found that the human simulation metric correlates better with website-labeled difficulty. It also introduces a universal rating system to classify puzzles into “Universal Easy,” “Universal Medium,” and “Universal Hard,” enabling consistent comparison of difficulty levels across different Sudoku platforms.
Sudoku, the globally popular logic-based puzzle, is enjoyed by millions. However, if you’ve ever played Sudoku online, you might have noticed a puzzling inconsistency: a puzzle labeled ‘Diabolical’ on one website might feel easier than an ‘Easy’ puzzle on another. This common frustration is precisely what a new research paper, titled “Project Patti: Why can You Solve Diabolical Puzzles on one Sudoku Website but not Easy Puzzles on another Sudoku Website?” by Arman Eisenkolb-Vaithyanathan, aims to address.
The core problem lies in the subjective and varied ways different online Sudoku platforms define and categorize puzzle difficulty. Each site has its own system, leading to a lack of standardization across the board. To tackle this, the paper introduces two innovative metrics designed to objectively characterize Sudoku difficulty.
Two Novel Approaches to Difficulty
The first approach is purely computational. It involves converting a Sudoku puzzle into a Satisfiability (SAT) problem, a fundamental concept in computer science. Think of it as translating the Sudoku rules and numbers into a complex logical formula. From this conversion, the paper derives a metric called ‘Clause Length Distribution.’ This metric essentially captures the structural complexity of a Sudoku puzzle, considering factors like the number of pre-filled digits and their positions. A puzzle with more ‘short’ clauses (simpler logical statements) tends to be easier, while more ‘medium’ or ‘long’ clauses suggest higher complexity.
The second approach simulates how a human solves a Sudoku puzzle. It intertwines four popular Sudoku strategies—Naked Singles, Hidden Singles, Naked Twins, and X-wing—within a trial-and-error algorithm known as Nishio. Naked Singles are straightforward: if a cell has only one possible number, you fill it. Hidden Singles are similar but require scanning rows, columns, or boxes to find a number that can only go in one specific cell. Naked Twins involve two cells in a sub-group sharing the exact same two candidates, allowing those candidates to be eliminated elsewhere. X-wing is a more advanced strategy involving specific patterns of a candidate across two rows or columns. The metric derived from this simulation is ‘Nishio Human Cycles,’ which counts how many times these strategies are applied within the backtracking process to solve a puzzle. More cycles generally indicate a harder puzzle.
Analyzing the Landscape of Sudoku Difficulty
To test these metrics, the researcher collected a massive dataset of 1320 Sudoku puzzles from five popular websites: New York Times, Sudoku.org.uk, Extreme Sudoku, Sudoku of the Day, and Sudoku of the Day UK. Each puzzle was analyzed using both the SAT-based and human simulation methods.
The findings revealed some interesting patterns. For four out of the five websites, the Nishio Human Cycles metric showed a strong correlation with the website’s own labeled difficulty levels. This suggests that how much ‘work’ a simulated human solver has to do aligns well with perceived difficulty. The Clause Length Distribution also showed correlation for some sites, but generally, the human simulation metric was a better indicator. An interesting anomaly was Extreme Sudoku, where neither proposed metric correlated well with its difficulty labels, suggesting its internal rating system might be quite different.
Also Read:
- Unlocking Logical Formulas: A New Approach to Learning Concepts in Description Logic ALC
- Navigating Strategic Interactions with Divergent Views: The Power of Hypergames
A Universal Rating System
One of the paper’s most significant contributions is the proposal of a universal rating system. Using an unsupervised classification method, the 1320 puzzles were categorized into three universal difficulty levels: Universal Easy, Universal Medium, and Universal Hard. This system, based on the distributions of Nishio Human Cycles and the percentage of short clauses, allows for consistent comparison of difficulty across different websites. For instance, a ‘Hard’ puzzle from one site can now be objectively compared to an ‘Easy’ puzzle from another, providing clarity where there was once confusion.
The research also delves into a ‘Heuristic-Based Nishio’ method, which uses smart choices instead of random ones for solving, proving to be more efficient. This led to practical advice for early Sudoku practitioners, emphasizing the application of human strategies before resorting to trial-and-error, and suggesting quick scanning techniques to make informed guesses.
This comprehensive study not only sheds light on the complexities of Sudoku difficulty but also offers tangible tools for both researchers and players to better understand and navigate the world of online Sudoku. For more in-depth details, you can read the full research paper here.


