TLDR: MJKAN (Modulation Joint KAN) is a novel neural network architecture that combines the non-linear expressive power of Kolmogorov-Arnold Networks (KANs) with the computational efficiency of Multilayer Perceptrons (MLPs). It achieves this by integrating a FiLM-like modulation mechanism with Radial Basis Function (RBF) activations. Empirical results show MJKAN’s superior performance in function regression and competitive, stable performance in image and text classification, though its generalization in classification tasks is sensitive to the number of basis functions, requiring careful tuning to prevent overfitting.
Neural networks are at the heart of modern artificial intelligence, powering everything from image recognition to natural language processing. Two prominent architectures are Multilayer Perceptrons (MLPs) and the more recently introduced Kolmogorov-Arnold Networks (KANs). While MLPs are known for their efficiency and widespread use, KANs offer a unique approach by replacing fixed activation functions with learnable, univariate functions on each connection, inspired by the Kolmogorov-Arnold superposition theorem.
However, despite their theoretical elegance and promise in specific tasks like symbolic regression, KANs have faced practical hurdles. They often come with high computational costs and haven’t consistently outperformed traditional MLPs in general classification tasks. This has led researchers to explore hybrid models that can combine the best of both worlds.
Introducing MJKAN: A Hybrid Approach
A new research paper, “Bridging KAN and MLP: MJKAN, a Hybrid Architecture with Both Efficiency and Expressiveness”, introduces the Modulation Joint KAN (MJKAN). This novel neural network layer is designed to overcome the limitations of conventional KANs by integrating a FiLM (Feature-wise Linear Modulation)-like mechanism with Radial Basis Function (RBF) activations. The core idea is to create a hybrid architecture that leverages the non-linear expressive power of KANs while maintaining the computational efficiency typically associated with MLPs.
In an MJKAN layer, each input dimension is first processed by radial basis functions, and then a learned affine transformation (scaling and offset) is applied. This FiLM-like operation effectively reintroduces learnable linear weights into the KAN framework without sacrificing the non-linear capabilities of kernel activations. This design allows MJKAN to dynamically adjust the influence of different input regions, much like how an MLP’s weights work, resulting in a highly flexible layer.
Performance Across Diverse Tasks
The researchers put MJKAN through a rigorous empirical validation across various benchmarks, including function regression, image classification (MNIST, CIFAR-10/100), and natural language processing (AG News, SMS Spam).
For function regression tasks, MJKAN demonstrated superior approximation capabilities, consistently outperforming MLPs. Its performance improved as the number of basis functions increased, highlighting its strength in modeling complex, non-linear functions with localized or compositional structures.
However, in general classification tasks, the results were more nuanced. In image classification, MJKAN’s performance was competitive with MLPs, but it revealed a critical dependency on the number of basis functions. A smaller basis size was found to be crucial for better generalization, especially on more complex datasets like CIFAR-100. This suggests that while more basis functions increase theoretical expressiveness, they also raise the risk of overfitting if not carefully tuned to the data’s complexity.
In natural language processing tasks, MJKAN proved to be a robust and stable alternative to MLPs, delivering consistent performance across different basis sizes, although it didn’t consistently surpass the MLP baseline. This indicates its viability in text classification settings, particularly when combined with transformer-derived embeddings.
Also Read:
- Unveiling Neural Network Features at Convergence: A New Theorem for Understanding Learning
- Introducing SoftReMish: A New Activation Function Boosting CNN Performance for Visual Recognition
Key Takeaways
MJKAN represents a significant step towards creating more practical and versatile KAN-inspired models. It successfully combines the function approximation strengths of KANs with the efficiency of MLPs. The research underscores the importance of the basis size as a key hyperparameter, directly controlling the model’s geometric complexity and its susceptibility to overfitting. By offering a flexible, general-purpose building block, MJKAN paves the way for future hybrid architectures that are both powerfully expressive and computationally tractable.


