TLDR: This research introduces a novel framework for internal talent recommendation that analyzes enterprise email data to model two dimensions of employee fit: ‘WHAT they do’ (semantic similarity of tasks) and ‘HOW they work’ (structural characteristics of interactions). These are represented as independent graphs and adaptively fused using a Dual Graph Convolutional Network (GCN) with a gating mechanism. The model significantly outperforms traditional methods, achieving a Hit@100 of 40.9%, and demonstrates high interpretability by learning context-aware fusion strategies for different job families, such as prioritizing relational data for sales and marketing roles while applying a balanced approach for research roles.
Identifying the right talent for internal roles is a cornerstone of organizational success, ensuring continuity and leveraging existing employee strengths. However, traditional methods often fall short, relying on limited managerial perspectives, leading to overlooked qualified candidates, and struggling with static HR data that doesn’t capture dynamic work relationships.
A recent master’s thesis, titled Multi-View Graph Convolution Network for Internal Talent Recommendation Based on Enterprise Emails, by Soo Hyun Kim from Sungkyunkwan University, supervised by Jang Hyun Kim, introduces a groundbreaking framework to address these challenges. The research proposes a novel approach that models two crucial dimensions of an employee’s suitability for a position directly from enterprise email data: ‘WHAT they do’ (the semantic similarity of their tasks) and ‘HOW they work’ (the structural characteristics of their interactions and collaborations).
A Dual-Perspective Approach to Talent Discovery
The core of this framework lies in representing these two dimensions as independent graphs. The ‘Structure Network’ captures direct communication patterns and frequencies between employees, showing who interacts with whom and how often. The ‘Semantic Similarity Network,’ on the other hand, infers task-related similarities by analyzing the content of email subject lines, connecting employees whose work is semantically aligned, even if they don’t directly communicate frequently.
To process this rich, multi-dimensional data, the study employs a Dual Graph Convolutional Network (GCN). GCNs are a type of deep learning architecture particularly adept at learning relationships within graph structures, generating embeddings that incorporate both individual attributes and the context of their connections. The innovation here is an adaptive fusion strategy, specifically a ‘gating mechanism,’ which intelligently combines the insights from both the structural and semantic graphs.
Unpacking the Methodology
The research utilized email log data from a mid-sized company, encompassing over 192,000 email exchanges among 1,518 employees over six months. To maintain confidentiality, all personal identifiers were removed, and only subject lines were retained for semantic analysis. Word2Vec embeddings were used to capture the meaning of individual terms in subject lines, forming the basis for semantic similarity.
Node features for each employee included their semantic embedding (average of email subject embeddings) and four key centrality measures: degree (how many connections), closeness (how quickly they can reach others), betweenness (how often they act as a bridge), and eigenvector (influence based on connections to other influential nodes). These features were concatenated into a single vector for the GNN.
The model was trained using a weak supervision framework, where ‘positive pairs’ were defined as employees sharing the same job family and role, serving as a proxy for position similarity. A pairwise ranking loss function was used to teach the model to assign higher scores to suitable candidates.
Key Findings and Interpretability
The experimental results demonstrated a significant improvement over traditional methods. The proposed gating-based fusion model achieved a top performance of 40.9% on Hit@100, meaning it successfully identified a suitable candidate within the top 100 recommendations in nearly 41% of cases. Even the simplest single GCN model performed more than three times better than the heuristic baseline, highlighting the power of GNNs in capturing complex network relationships.
Beyond numerical superiority, the gating mechanism offered remarkable interpretability. The model learned distinct, context-aware fusion strategies for different job functions. For instance, in ‘sales and marketing’ roles, where relational networks are often paramount, the model prioritized structural (HOW) data, assigning it an overwhelming 88% weight. Conversely, for ‘research’ functions, which require both specialized expertise and peer collaboration, the model adopted a more balanced approach, weighting semantic (WHAT) and structural (HOW) information in an approximately 56:44 ratio. This ability to adapt its fusion strategy based on the job family mirrors real-world organizational dynamics.
Also Read:
- Skill-Based Explanations Enhance Student Course Selection Confidence
- Tailoring Recommendations for Explored and Unexplored Items in E-commerce
Future Directions
While the model shows practical utility, the researchers acknowledge limitations, including reliance on proxy labels and the existence of non-quantifiable factors like reputation. Future work could involve incorporating ‘Role-aware Structural Embedding’ to identify specific functional roles (e.g., ‘broker’ or ‘hub’), extending the model to consider different types and contexts of relationships (e.g., ‘intra-team’ vs. ‘inter-departmental’ collaboration using Relational GCNs), and applying time-series analysis with Dynamic GNNs to capture the evolving nature of organizational interactions.
This research offers a quantitative and comprehensive framework for internal talent discovery, minimizing the risk of candidate omission inherent in traditional methods. Its primary contribution lies in its ability to empirically determine the optimal fusion ratio between task alignment and collaborative patterns, providing important practical implications for human resource management.


