TLDR: The Subset Sum Matching Problem (SSMP) is a new combinatorial optimization challenge introduced by J.P. Morgan researchers, abstracting financial reconciliation tasks. It involves matching subsets of numbers whose sums are within a given tolerance. The paper presents three algorithms: an optimal Mixed-Integer Linear Program (MILP) and two suboptimal methods (greedy search and dynamic programming). Experiments show that while MILP is optimal for smaller, simpler cases, the dynamic programming approach demonstrates superior scalability and efficiency for larger, more complex real-world financial reconciliation problems, especially with small error tolerances.
Researchers from J.P. Morgan Quantitative Research and J.P. Morgan AI Research have introduced a novel combinatorial optimization challenge called the Subset Sum Matching Problem (SSMP). This new problem is designed to abstract and solve complex real-world financial tasks, particularly in the area of trades reconciliation.
Combinatorial Optimization (CO) problems involve finding the best possible configuration from a discrete set of possibilities. The paper first defines a more general problem, the Subset Matching Problem (SMP), which takes two collections of items (multisets), a rule to determine if subsets from each collection form a ‘match’, and an objective function to maximize. The goal is to find a set of non-overlapping matches that yields the highest objective value.
The Subset Sum Matching Problem (SSMP) is a specific instance of SMP. In SSMP, the items in the multisets are real numbers, and a match is considered valid if the absolute difference between the sum of numbers in a subset from the first multiset and the sum of numbers in a subset from the second multiset is within a specified error tolerance (ϵ). The objective function for SSMP is designed to encourage solutions that cover more elements with a greater number of individual matches.
A primary real-world application for SSMP is financial reconciliation. This is a critical accounting process where two sets of financial records are compared to ensure accuracy and agreement. Minor discrepancies are often acceptable due to timing differences or data entry errors. SSMP directly addresses this by allowing for such small discrepancies through its error tolerance parameter. Reconciliation tasks are typically labor-intensive but are essential for confirming account consistency, detecting fraud, and maintaining data integrity.
The research paper outlines three distinct algorithms to tackle the SSMP: one optimal approach and two suboptimal methods. The optimal algorithm models SSMP as a Mixed-Integer Linear Program (MILP). This involves defining binary variables to indicate whether an element is included in a match and whether a match is formed. The MILP aims to maximize the total number of matched elements and the number of matches, subject to constraints that ensure sums are within tolerance, elements are used in at most one match, and matches are valid.
The two suboptimal algorithms are a greedy search solver and a dynamic programming (DP) solver. The greedy search algorithm works by iteratively finding and solving individual matches from the remaining elements. It employs pre-calculation and caching of subset sums to enhance its speed. The DP solver, on the other hand, is a pseudo-polynomial time algorithm. It involves discretizing real numbers into integers, reorganizing elements, building dynamic programming tables, and then backtracking to identify valid matches. This approach is particularly efficient for problems involving integers.
Experimental evaluations were conducted on both integer and real-value SSMP instances. For integer problems, the optimal MILP solver performed best when it successfully converged within the time limit. The search solver was found to be sensitive to the total number of elements, while the DP solver demonstrated more consistent performance with better running time guarantees. For real-value problems, especially those with a small error tolerance (ϵ), the MILP solver struggled with scalability and often timed out or produced less optimal results for larger instances. In contrast, the DP solver showed superior performance and efficiency for larger real-value problems. Further tests revealed that simply increasing the time allowance or providing a warm start (initializing MILP with a DP solution) did not significantly improve the MILP solver’s performance on harder problems.
Also Read:
- Unlocking Blackwell Optimality: A New Approach to Decision-Making in MDPs
- Bridging Language and Logic: How AI Models Tackle Complex Optimization Problems
This work represents a significant step in addressing complex matching problems in finance. The introduction of SSMP and the evaluation of various algorithmic approaches provide a foundation for future advancements. The researchers suggest future work could focus on developing even more efficient algorithms for SSMP and exploring its application in other real-world scenarios beyond financial reconciliation, such as task assignment or student-school matching problems. You can find the full research paper here.


