TLDR: Constrained Adaptive Rejection Sampling (CARS) is a new algorithm for language models that generates outputs satisfying strict semantic or syntactic constraints. It uniquely combines the fidelity of rejection sampling (producing samples from the exact constrained distribution) with adaptive efficiency. CARS achieves this by learning and pruning invalid output prefixes, significantly reducing wasted computation and improving acceptance rates. It has demonstrated superior performance in program fuzzing, molecular synthesis, and PDDL planning, offering a practical solution for generating diverse and valid outputs.
Language models (LMs) are becoming increasingly powerful, but their outputs often need to adhere to strict rules, whether for programming languages, data formats, or scientific applications. This challenge, known as constrained generation, has long presented a dilemma: either generate outputs that are perfectly valid but computationally expensive, or generate them quickly but with potential inaccuracies or biases.
Traditional methods have struggled with this trade-off. Rejection Sampling (RS), for instance, ensures outputs exactly match the desired distribution but is highly inefficient, often discarding the vast majority of generated content. Greedy Constrained Decoding (GCD) is fast because it enforces validity during generation, but it distorts the LM’s original probability distribution, which can lead to suboptimal results. Other approximate methods promise eventual accuracy but can produce biased samples in their early stages, and lack clear stopping points.
Introducing Constrained Adaptive Rejection Sampling (CARS)
A new approach, Constrained Adaptive Rejection Sampling (CARS), aims to resolve this fundamental tension. Developed by researchers from the University of Warsaw and the University of California San Diego, CARS offers a method that strictly improves the efficiency of rejection sampling without compromising the fidelity of the LM’s distribution. This means it generates samples that are both valid and truly representative of the language model’s underlying probabilities, conditioned on the given constraints.
How CARS Works
CARS builds on the idea of Adaptive Rejection Sampling but goes a significant step further. It begins by sampling from the language model without any constraints. As it generates sequences, it actively identifies and records any partial outputs (prefixes) that are guaranteed to violate the constraints. These invalid prefixes are stored in a data structure called a trie. Crucially, CARS then subtracts the probability mass associated with these invalid prefixes from future sampling attempts. This adaptive pruning ensures that the system never wastes computation by revisiting paths that have already been proven invalid. As a result, the acceptance rate of valid samples monotonically improves over time, and the final samples precisely follow the constrained distribution.
Imagine trying to generate a valid arithmetic expression like “1+0+1”. If the model initially tries to generate “0++”, CARS would identify “0++” as an invalid prefix. In subsequent generations, if the model reaches “0+”, CARS would prevent it from choosing another “+” token, effectively guiding it towards valid continuations.
Also Read:
- Unlocking Consistent LLM Performance: A New Hyperparameter-Free Decoding Method
- New Method Ensures Generative AI Models Follow Real-World Rules with High Accuracy
Key Advantages and Applications
The research paper highlights several key contributions of CARS:
- Exactness and Efficiency: CARS is the only exact method that achieves practical efficiency, making it feasible for real-world applications where other exact methods fail due to high computational cost.
- Improved Sample Diversity: By preserving the LM’s true distribution, CARS tends to produce a wider variety of valid samples compared to approximate methods.
- Lower Amortized Cost: Over many samples, the average number of LM forward passes required per valid output is significantly reduced.
CARS has been evaluated across a variety of domains, demonstrating its effectiveness:
- Program Fuzzing: In tasks like generating SQLite regression test files or XML parsing inputs, CARS-generated seeds led to higher branch coverage in fuzzers, indicating better exploration of code paths. This is crucial for finding bugs in software.
- Molecular Synthesis: For generating valid SMILES strings (a chemical language) for drug discovery, CARS showed substantial improvements in sampling efficiency while maintaining high molecular diversity and validity, which are essential for exploring chemical space.
- PDDL Planning: In generating action plans for complex tasks, CARS proved more efficient than other exact methods, sometimes making otherwise intractable sampling problems feasible.
In summary, CARS represents a significant advancement in constrained language model generation. By combining the fidelity of exact sampling with adaptive efficiency, it provides a powerful tool for applications where both validity and diversity of generated outputs are paramount, from software testing to drug discovery.


