Beyond Byteplots: Improving Malware Classification with 1D Signal Representations

TLDR: A new research paper introduces a method to classify malware by converting binaries into 1D signals instead of traditional 2D images (byteplots). This 1D approach avoids information loss from 2D conversion, allowing existing 2D CNNs to be adapted or new 1D CNNs to be developed. The proposed 1D CNN achieved state-of-the-art performance on the MalNet dataset for binary, type, and family level malware classification, demonstrating a more effective and information-preserving method for cybersecurity.

Malware poses a constant threat in cybersecurity, with sophisticated obfuscation techniques making traditional detection methods less effective. While dynamic analysis offers deeper insights, it demands significant resources, limiting its widespread use. For years, a popular approach has involved converting malware binaries into 2D images, known as byteplots, and then using computer vision techniques to classify them. This method has shown promise in detecting complex malware variants.

However, this 2D image conversion process isn’t without its drawbacks. It often leads to a significant loss of crucial information. This loss occurs due to “quantisation noise,” which is essentially rounding errors when converting data to integer pixel values, and the introduction of artificial 2D dependencies that don’t exist in the original binary data. These issues can hinder the accuracy of classification models.

A New Perspective: 1D Signals for Malware Classification

A recent research paper, “Signal-Based Malware Classification Using 1D CNNs,” by Jack Wilkie, Hanan Hindy, Ivan Andonovic, Christos Tachtatzis, and Robert Atkinson, proposes an innovative solution to these challenges. Instead of converting malware binaries into 2D images, their work focuses on resizing them into 1D signals. This approach fundamentally changes how malware data is represented for machine learning models.

The core advantage of 1D signals is that they avoid the need for heuristic reshaping into a 2D grid, which can distort the original data structure. Furthermore, by storing these signals in a floating-point format, they bypass the quantisation noise that plagues 2D image representations. This means the 1D signals retain significantly more of the original binary’s information, leading to a better signal-to-noise ratio.

Adapting and Innovating with 1D Convolutional Neural Networks

The researchers demonstrated that existing 2D Convolutional Neural Network (CNN) architectures, commonly used in computer vision, can be effectively adapted to classify these 1D signals. They developed a novel method to convert 2D CNNs into 1D equivalents by flattening the convolution kernels and squaring the stride values. This ingenious transformation ensures that the adapted 1D models maintain the same number of parameters and computational requirements as their 2D counterparts, yet achieve improved performance.

Beyond adapting existing models, the team also developed a bespoke 1D CNN architecture. This custom model is based on the robust ResNet architecture, enhanced with squeeze-and-excitation layers for improved feature learning and the GELU activation function for smoother decision boundaries. This specialized 1D CNN was rigorously evaluated on the large-scale MalNet dataset, which contains over a million Android malware samples.

Also Read:

State-of-the-Art Performance and Future Implications

The results of their evaluation are highly impressive. The proposed 1D signal-based approach achieved state-of-the-art performance across various malware classification tasks: binary, type, and family level classification. Specifically, the bespoke 1D CNN model recorded F1 scores of 0.874 for binary classification, 0.503 for type classification, and 0.507 for family classification. These scores surpass those of leading 2D image-based models, including popular ResNet, DenseNet, EfficientNet architectures, and even the advanced SHERLOCK model.

The research also highlighted that the choice of resampling filter (Lanczos performed best) and signal length significantly impact performance, with longer signals generally retaining more information. Crucially, the 1D models consistently outperformed their 2D equivalents on both Android APK and Windows EXE files, demonstrating the broad applicability of this new paradigm.

This work marks a significant step forward in malware classification. By demonstrating the superior information retention and classification performance of 1D signal representations, it paves the way for future cybersecurity models to move beyond traditional image-based approaches. The ability to adapt existing 2D CNNs to this 1D modality, coupled with the development of specialized 1D architectures, offers a powerful new tool in the ongoing fight against evolving malware threats.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Beyond Byteplots: Improving Malware Classification with 1D Signal Representations

A New Perspective: 1D Signals for Malware Classification

Adapting and Innovating with 1D Convolutional Neural Networks

State-of-the-Art Performance and Future Implications

Gen AI News and Updates

Rubrik Report Reveals Alarming Decline in Cyber Resilience Amidst AI Agent Proliferation

Anthropic Reveals First AI-Orchestrated Cyber Espionage Campaign by Chinese State-Sponsored Group

TrojAI Unveils Defend for MCP to Bolster Security for AI Agent Workflows

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates