Hourglass MLPs: A New Approach to Neural Network Design

TLDR: Researchers propose ‘Hourglass’ (wide-narrow-wide) MLP blocks that invert the traditional ‘narrow-wide-narrow’ design. This new architecture places skip connections in higher-dimensional spaces and uses narrow bottlenecks for residual computation. Experiments show Hourglass MLPs consistently achieve superior performance-parameter trade-offs in generative tasks (classification, denoising, super-resolution) on MNIST and ImageNet-32. Additionally, their input projections can be fixed randomly for efficiency, offering practical advantages. The findings suggest a re-evaluation of skip connection placement in neural networks, with potential applications in Transformers and other residual architectures.

In the evolving landscape of artificial intelligence, Multi-layer Perceptrons (MLPs) have long served as fundamental building blocks for neural networks. Traditionally, these MLP blocks follow a ‘narrow-wide-narrow’ design, where input signals expand into a broader hidden space for processing before contracting back to an output dimension. Skip connections, crucial for stable training and incremental learning, typically operate at these narrower input and output dimensions.

However, a recent research paper titled RETHINKING THE SHAPE CONVENTION OF ANMLP by Meng-Hsi Chen, Yu-Ang Lee, Feng-Ting Liao, and Da-shan Shiu from MediaTek Research and National Taiwan University challenges this long-standing convention. They propose an innovative ‘wide-narrow-wide’ MLP block, which they term the ‘Hourglass’ design. This new architecture fundamentally inverts the traditional approach, positioning skip connections to operate within expanded, higher-dimensional spaces, while the core residual computations flow through narrow bottlenecks.

The Hourglass Advantage

The core idea behind the Hourglass MLP is to leverage higher-dimensional spaces for more effective incremental refinement of data representations. Instead of constraining residual updates to the narrower input dimensions, this design allows these updates to occur in a richer, expanded latent space. This is hypothesized to enable more potent learning and refinement, while still maintaining computational efficiency through carefully matched parameter designs.

Implementing the Hourglass MLP requires an initial projection to elevate input signals to these expanded dimensions. Interestingly, the researchers propose that this initial projection can remain fixed at a random initialization throughout the entire training process. This concept, inspired by reservoir computing, offers significant practical benefits, including reduced parameter counts, lower memory bandwidth requirements, and decreased memory capacity needs, without a noticeable impact on performance, especially when the expansion factors are sufficiently large.

Empirical Validation and Key Findings

To validate their hypothesis, the researchers conducted extensive architectural comparisons between conventional and Hourglass MLP stacks. They evaluated both designs on various generative tasks, including generative classification, denoising, and super-resolution, using popular image datasets like MNIST and ImageNet-32.

The results were compelling: Hourglass architectures consistently achieved superior performance-parameter Pareto frontiers across all tested tasks. This means that for a given performance level, Hourglass MLPs required fewer parameters, or for a given parameter budget, they delivered better performance compared to conventional designs. For instance, in an ImageNet-32 denoising task, an Hourglass model achieved 22.31 dB PSNR with 66 million parameters, while the best conventional model needed 75 million parameters for the same score.

Furthermore, the study revealed distinct scaling patterns for optimal Hourglass configurations. As parameter budgets increased, the best-performing Hourglass networks favored deeper structures with wider skip connections and narrower bottleneck dimensions. This contrasts sharply with conventional MLPs, which often rely on shallower depths and very wide hidden layers.

Also Read:

Implications for Future Architectures

The findings suggest a significant reconsideration of skip connection placement in modern neural network architectures. The principles demonstrated by the Hourglass MLP could extend beyond simple MLPs to more complex residual networks, including Transformers and U-Net architectures. For Transformers, adapting this ‘wide-narrow-wide’ intuition would involve coordinated modifications to both self-attention and feed-forward layers, potentially leading to more compute-optimal designs with reduced parameter counts.

This research opens up new avenues for designing more efficient and powerful neural networks, pushing the boundaries of what’s possible in deep learning by rethinking fundamental architectural conventions.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Hourglass MLPs: A New Approach to Neural Network Design

The Hourglass Advantage

Empirical Validation and Key Findings

Implications for Future Architectures

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates