TLDR: SmartMixed is a novel two-phase training strategy for neural networks that allows individual neurons to learn and select their optimal activation functions. In the first ‘selection’ phase, neurons adaptively choose from a pool of candidate functions. In the second ‘fixed’ phase, these learned choices are locked in, enabling computationally efficient continued training. This approach not only improves network performance but also reveals consistent layer-wise preferences for activation functions, with early layers favoring ReLU/Leaky ReLU and deeper layers preferring ELU/SELU.
Neural networks, the backbone of modern artificial intelligence, rely heavily on activation functions to introduce non-linearity and enable complex learning. Traditionally, these networks use a single, fixed activation function like ReLU or Sigmoid across all their neurons. While this simplifies implementation, it might not be the most optimal approach for every neuron’s specific role within the network.
A new approach called SmartMixed introduces a novel two-phase training strategy that allows individual neurons to learn and select their own optimal activation functions. This method aims to combine the benefits of adaptive activation functions – where neurons can customize their behavior – with the computational efficiency of fixed activation functions during inference.
How SmartMixed Works: A Two-Phase Journey
The SmartMixed strategy unfolds in two distinct phases:
Phase 1: The Selection Phase
In this initial phase, the neural network is given the flexibility to explore. Each neuron, except those in the output layer, learns to select its most suitable activation function from a predefined pool of candidates, which includes common functions like ReLU, Sigmoid, Tanh, Leaky ReLU, ELU, and SELU. This selection process is made differentiable using a technique called the Gumbel-Softmax estimator, which allows the network to make discrete choices while still enabling gradient-based optimization. Essentially, neurons develop a preference for certain activation functions based on their role during training.
Phase 2: The Fixed Phase
Once the selection phase is complete (after a predetermined number of training epochs, for example, 50 epochs in the experiments), the network transitions to the fixed phase. In this phase, each neuron’s activation function is permanently set based on the preferences learned in Phase 1. For computational efficiency, neurons that have chosen the same activation function are grouped together. This “mixed” network, now with fixed, specialized activation functions for each neuron, continues its training. By fixing the activations, the computational overhead associated with dynamic selection is removed, and the network can be optimized using standard, efficient vectorized operations. This allows the network’s weights and biases to adapt specifically to the chosen activation configuration, often leading to improved performance.
Experimental Insights and Performance
The SmartMixed approach was evaluated on the MNIST handwritten digit classification dataset using various feedforward neural network architectures. The results showed a significant improvement in accuracy in Phase 2 (Mixed) compared to Phase 1 (Selective), demonstrating that fixing the learned activation choices provides a better foundation for continued training and leads to more stable and effective learning.
One of the most compelling findings from the research is the discovery of intrinsic activation function preferences across different network depths. The analysis revealed that neurons in early layers of the network strongly favored ReLU and Leaky ReLU activation functions. As the network depth increased, neurons in deeper layers progressively shifted their preference towards ELU and SELU functions. This layer-wise specialization pattern was consistent across diverse architectural configurations, suggesting that individual neurons intelligently adapt their activation functions based on their positional role within the network. This provides novel insights into the functional diversity within neural architectures.
SmartMixed also demonstrated competitive performance when compared against traditional fixed activation function strategies across 18 different network architectures, consistently ranking among the top three performing approaches. While no single activation function universally dominated, SmartMixed proved to be a robust adaptive solution that performed well across diverse architectural configurations. For more technical details, you can refer to the full research paper here.
Also Read:
- Enhancing Neural ODE Training with Mixed Precision Techniques
- Optimizing AI Model Learning with a Dynamic Gompertz Curve Approach
Future Directions
The success of SmartMixed on the MNIST dataset opens up exciting avenues for future research. Evaluating this strategy on more complex and diverse datasets, as well as extending it to different types of neural networks beyond feedforward architectures, would further validate its generalizability and potential impact on neural architecture optimization.


