Decoding Slang: How AI's Informal Language Differs from Human Expression

TLDR: A study compared human and AI-generated slang, finding that while large language models (LLMs) are creative, their slang usage shows systematic biases. LLMs prefer coining new terms, lean towards positive topics, and their creative patterns don’t fully align with human nuances, limiting their effectiveness for complex linguistic analysis tasks.

The way we use informal language, especially slang, is a fascinating and ever-changing aspect of human communication. For artificial intelligence systems, understanding and generating slang has been a significant hurdle. However, with the rise of large language models (LLMs) like GPT-4o and Llama-3, the ability of machines to handle such nuanced language has greatly improved. A recent research paper, titled “How do Language Models Generate Slang: A Systematic Comparison between Human and Machine-Generated Slang Usages,” delves into this very topic, offering a detailed comparison between how humans and machines create and use slang.

Authored by Siyang Wu and Zhewei Sun, this study explores whether the structural knowledge of slang captured by LLMs truly aligns with human-attested usage. This alignment is critical because LLMs are increasingly being applied to tasks such as detecting and interpreting slang, and their reliability hinges on their ability to genuinely understand this informal language.

The researchers developed an evaluative framework that examined three core aspects: the general characteristics of slang usages, the creativity involved in forming new slang words (lexical coinages) and reusing existing words with new meanings, and the informativeness of these slang usages when used to train other models. By comparing human slang from the Online Slang Dictionary (OSD) with slang generated by GPT-4o and Llama-3, the study uncovered notable biases in how LLMs perceive slang.

One of the key findings was that while LLMs have indeed captured a significant amount of knowledge about the creative side of slang, this knowledge doesn’t always align sufficiently with human understanding for more complex linguistic analysis tasks. This suggests that while AI can be creative, its creativity in slang might operate on different principles than human creativity.

To conduct their research, Wu and Sun collected a massive dataset of machine-generated slang. They prompted LLMs to create novel slang usages, each including a slang term, a definition, and a usage context. They explored different generation settings: controlled generation, where the model was given existing human-defined meanings, and uncontrolled generation, where the model relied solely on its pre-trained knowledge. They also controlled for word choice, asking models to either create entirely new terms (coinage), reuse existing words with new meanings, or generate freely.

Distinctive Characteristics of AI-Generated Slang

The study revealed clear distinctions in the characteristics of human versus machine-generated slang. Human slang from the OSD showed a balanced mix of creating new terms and reusing existing ones. In contrast, both GPT-4o and Llama-3 displayed a strong preference for producing coinages. This bias was somewhat reduced when GPT-4o was guided by human-attested slang definitions, but the proportion of word reuse remained lower than in human language.

When analyzing the word formation processes for coined terms, GPT-4o showed a bias towards creating compound words (combining two existing words verbatim), while Llama-3 exhibited less preference for specific formation types compared to human data. This indicates that different LLMs develop their own unique perceptions of slang formation.

Another interesting observation came from topic analysis. Human slang often revolves around taboo subjects like sex and profanity, reflecting cultural dynamics. However, machine-generated slang tended to focus on more positive but less concrete concepts. The researchers hypothesize that this might be due to alignment techniques (like RLHF) used in LLMs, which steer them away from potentially offensive or controversial content towards more neutral or positive expressions.

Creativity in Coinage and Reuse

The paper also evaluated the creativity of coined slang terms. GPT-4o-generated terms were found to be more morphologically complex (having more segments) than human coinages, especially in uncontrolled settings. Llama-3, on the other hand, produced simpler constructions. Interestingly, GPT-4o’s uncontrolled coinages also demonstrated better morphological coherence, meaning the coined terms were more semantically grounded with respect to their constituent parts. This suggests that GPT-4o prefers semantically consistent new words, while human word choices can be more playful.

For word reuse, LLMs consistently generated slang usages with higher semantic novelty, meaning they created more semantically divergent meanings for existing words. However, human-generated slang showed a wider creative spectrum, indicating a more loosely defined level of creativity. The study also measured “surprisal in context,” a metric correlating with human processing effort, and found that machine-generated slang showed nuanced control over contextual surprisal, similar to human usages.

Also Read:

Informativeness for Downstream Tasks

To assess the informativeness of machine-generated slang, the researchers conducted a distillation experiment. They fine-tuned a smaller Llama-3-8B-Instruct model using slang generated by either humans or GPT-4o. While fine-tuning on GPT-generated slang did lead to an increase in morphological complexity in the student model’s coinages, the overall performance gains on downstream tasks like slang generation and interpretation were minimal or task-sensitive. Human-generated slang proved more informative for improving the quality of generated definitions in free-form interpretation tasks.

In summary, the research highlights that while LLMs are highly capable of generating creative slang, their underlying structural knowledge about this informal language differs significantly from human usage. LLMs exhibit specific preferences in characteristics and creative qualities, which can impact how their generated slang is perceived and used. This suggests that LLMs have not yet fully captured the nuanced structures inherent in human slang usage. For a deeper dive into their methodology and findings, you can access the full research paper here: How do Language Models Generate Slang: A Systematic Comparison between Human and Machine-Generated Slang Usages.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Decoding Slang: How AI’s Informal Language Differs from Human Expression

Distinctive Characteristics of AI-Generated Slang

Creativity in Coinage and Reuse

Informativeness for Downstream Tasks

Gen AI News and Updates

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Unveiling LLM Refusal: A Multi-Directional Approach Using Self-Organizing Maps

AI Models Begin to Grasp What Makes Math Problems Interesting to Humans

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates