New Benchmark Reveals Surprising Similarity in LLM Creative Outputs, Underscoring Human Role

TLDR: The first-ever benchmark for evaluating the creativity of Large Language Models (LLMs) in marketing, developed by Springboards in collaboration with leading industry bodies, has found that popular AI tools like ChatGPT, Gemini, and Claude exhibit remarkably similar creative outputs. The study challenges the notion of a ‘best’ AI tool for creative tasks and emphasizes the indispensable role of human creativity and judgment in achieving breakthrough outcomes.

NEW YORK, October 21, 2025 – A groundbreaking study, dubbed the ‘Creativity Benchmark,’ has unveiled a surprising uniformity in the creative outputs of leading Large Language Models (LLMs), suggesting that these advanced AI tools are more alike in their creative capabilities than widely perceived. Conducted by Springboards, an AI platform dedicated to fostering creativity in advertising, in partnership with prominent industry organizations including the 4As, ACA, APG, D&AD, IAA, IPA, and The One Club for Creativity, the research marks the world’s first comprehensive benchmark for evaluating LLM creativity in a marketing context.

The study’s core finding indicates that AI tools such as ChatGPT, Gemini, and Claude perform with striking similarity across various creative tasks. This challenges the prevailing assumption that certain AI models significantly outperform others in generating novel ideas, suggesting instead that agencies and brands should focus less on finding a ‘superior’ AI and more on how these tools are integrated into creative workflows.

According to Pip Bingemann, CEO and co-founder of Springboards, the reason for this convergence lies in the fundamental nature of LLMs. “Everyone assumes some AI tools are way better than others for creative work,” Bingemann stated. “But our tests showed the results were pretty close. Why? Because these models are machines designed to recognize patterns and give you the most probable answer—and ‘probable’ has never been called ‘creative.’ Keeping humans in the loop and optimizing for a wider range of varied ideas is crucial.”

The benchmark evaluated LLMs across three critical types of creative challenges relevant to marketing: uncovering surprising consumer insights, developing expansive campaign ideas, and formulating bold, attention-grabbing concepts. The findings consistently showed that many AI tools tended to suggest similar ideas repeatedly, highlighting a potential limitation in generating true diversity without human intervention.

Further insights from the study underscore the continued necessity of human judgment in the creative process. When AI systems were tasked with evaluating creative ideas, their scores diverged significantly from those provided by human experts. This indicates that relying on AI alone to select the best creative concepts is unreliable, reinforcing the irreplaceable value of human discernment in assessing creative quality and relevance.

Jeremy Lockhorn, SVP, Creative Technologies & Innovation at the 4As, commented on the implications of the research: “LLMs aren’t a one-size-fits-all solution—they’re general purpose tools that require human creativity to unlock breakthrough outcomes. These findings suggest agencies and brands should continue to evaluate which models are best suited for creative work – and that a multi-model approach may well be the best path forward.” Tony Hale, CEO of the Advertising Council Australia, echoed this sentiment, remarking, “This study highlights that creativity isn’t about which AI you use, it’s about how you use it.”

The research also revealed that traditional creativity tests, often employed in psychological studies, do not effectively predict an AI’s performance in marketing-specific creative tasks, emphasizing the need for specialized metrics tailored to brand work. The ‘Creativity Benchmark’ framework itself involved human pairwise preferences from 678 practicing creatives over 11,012 anonymized comparisons, analyzed with Bradley-Terry models, showing a tightly clustered performance where the highest-rated model beat the lowest only about 61% of the time.

Also Read:

Ultimately, the study advocates for expert human evaluation and diversity-aware workflows, positioning AI as a powerful assistant that amplifies human ingenuity rather than replacing it. The findings serve as a critical guide for the advertising and marketing industries as they navigate the evolving landscape of AI-driven creative tools.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

New Benchmark Reveals Surprising Similarity in LLM Creative Outputs, Underscoring Human Role

Gen AI News and Updates

Iris Bolsters Leadership with New Innovation, AI, and Technology Director Amidst Senior Hires

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Creative Leaders Convene in London to Tackle AI’s Impact on the Industry

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

OneShield Achieves Landmark Registration Under Cloud Security Alliance AI Controls Matrix, Setting New Industry Standard

SeedAI Leads Utah’s Proactive Initiative for Ethical AI Integration in Business

Bahrain Commended for AI Preparedness in New UNESCO Global Report

U.S. Air Force Secures Skydio Drone Technology for Enhanced Autonomous Operations

Malaysia Forges Ahead with AI Development, Prioritizing Governance and Ethical Frameworks

Contractify Honored as Top Contract Management Solution Provider for 2025 by LegalTech Breakthrough Awards

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

EPAM Honored with Microsoft’s 2025 Innovate with Azure AI Platform Partner of the Year Award for Pioneering AI Solutions

EBU Academy’s School of AI Honored with European Digital Skills Award for Upskilling Media Professionals

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Netherlands Unveils Ambitious AI Strategy to Shape Global Governance Frameworks

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Prepify AI and ZoraSafe, Inc. Honored with ‘Panelists’ Choice’ Awards at UF Innovate’s GatorPitch in Miami

Subscribe to get the latest news and updates