Logo CrossLink

Enhancing Cross-domain Link Prediction via
Evolution Process Modeling

1Zhejiang University, 2Finvolution
*Equal Contribution
WWW 2025

Introduction

This paper proposes Logo CrossLink, a novel framework for cross-domain link prediction.LogoCrossLink learns the evolution pattern of a specific downstream graph and subsequently makes pattern-specific link predictions.

It employs a technique called conditioned link generation, which integrates both evolution and structure modeling to perform evolution-specific link prediction. This conditioned link generation is carried out by a transformer-decoder architecture, enabling efficient parallel training and inference.

LogoCrossLink is trained on extensive dynamic graphs across diverse domains, encompassing 6 million dynamic edges. Extensive experiments on eight untrained graphs demonstrate thatLogoCrossLink achieves state-of-the-art performance in cross-domain link prediction.

Compared to advanced baselines under the same settings,LogoCrossLink shows an average improvement of 11.40% in Average Precision across eight graphs. Impressively, it surpasses the fully supervised performance of 8 advanced baselines on 6 untrained graphs.

Background

Dynamic Link are real-world widespread. Although them have different semantics, but all can be modeled as a link prediction task

algebraic reasoning

Examples of dynamic link in real-world.

Link prediction (LP) is a crucial task in dynamic graph modeling. owever, current methods mainly consider single graph setting. In this setting, the graph model is trained using supervised learning on a given graph and then makes inferences on the same graph (End2End setting).

This approach has several notable limitations when applied in real-world scenarios:

(1) High human/time costs: The End2End setting requires independently training different models for each graph. Each training process demands careful design and optimization of hyperparameters by experts. Additionally, the training process is time-consuming.

(2) Unsuitability for small datasets: The End-to-End setting typically requires a substantial number of samples for satisfactory domain-specific performance. This makes it ill-suited for small-scale application scenarios, such as B2B businesses or situations involving large graphs with limited data.

(3) Inability to learn more knowledge from different applications: Graphs in different applications may contain complementary knowledge. For instance, users purchasing items and users listening to music are both projections of human behavior. Therefore, learning from both user-item graphs and user-music graphs can help the model better understand behavior-related knowledge. However, End2End training is limited to a single graph.

Hence, we propose Logo CrossLink, the first framework for cross-domain link prediction.LogoCrossLink learns the evolution pattern of a specific downstream graph and subsequently makes pattern-specific link predictions.

Method

However, cross-domain link prediction faces a fundamental challenge: how to model ambiguous structures. Different graphs are interdependent, meaning the same structure may hold different meanings and evolve differently across various graphs.

As shown in Figure, Graph A typically follows a triadic closure process, where two nodes with common neighbors are more likely to form edges, while Graph B exhibits a contrasting pattern. Consequently, even if the node pair (red and blue nodes) in Graph A and Graph B has the same local structure, their ground truths are different. We refer to this type of local structure, which has diverse ground truths across various graphs, as ambiguous structure.

Current methods usually consider single graph settings, focusing on predicting future edges between two nodes solely based on their local structure.

arithmetic reasoning
(a) shows a case of structure conflict. Graph A follow a triadic closure process, while Graph B exhibits a contrasting process. (b) shows current methods cannot address this conflict. and (c) shows how prediction via modeling evolution, and it can address structure conflict.

Therefore, these methods struggle to effectively model ambiguous structures under cross-domain setting. This limitation not only impedes the model's ability to accurately learn the meaning of ambiguous structures in multiple graphs but also hinders its capability to infer future edges correctly in target graphs, especially when the graphs contain many ambiguous structures.

To address these challenges,LogoCrossLink adopts the following approach:

data-overview

Framework of LogoCrossLink. (a) Models the graph's evolution process via a sequence of link prediction tasks with ground truths; (b) Evolution-specific link prediction based on both nodes' representations and the evolution process.

TheLogoCrossLink process follows these key steps:

  • Choose one ego-graph from a random domain for analysis and processing
  • Sort all links based on their temporal appearance in the graph
  • Generate embeddings for each graph structure
  • Process the embeddings through a Transformer architecture for analysis
  • Generate label predictions through parallel computation
  • Calculate and optimize loss functions in parallel

Results

Design Advantages:

  • Enhanced efficiency in processing and prediction
  • Improved generalization through in-context learning, adapting NLP concepts to cross-domain modeling

Key Insights:

  • Link prediction demonstrates consistency across various domains, making it a universally applicable task
  • DyExpert achieves significant improvements in cross-domain link prediction performance
  • The model surpasses fully supervised performance metrics across six distinct datasets
data-overview

Performance of various methods regarding cross-domain link prediction. We report their Average Precision
(average of 3 runs and omit by %) across eight graphs.


  • Multi-domain training enhances model generalization, with CLG effectively modeling differences between domains
  • Extended evolution periods on target datasets improve prediction accuracy, highlighting the value of historical data
data-overview

Analysis result of Logo CrossLink regarding multi-domain training. (a) shows the result of ablation studies, where ``w/o'' removes a certain component of our model. (b) shows the performance ofLogoCrossLink adopts different maximum sequence lengths (both training and inference). (c) indicates the performance on evaluated graphs that model solely trained by a specific graph.


  • Optimal hidden size varies according to training dataset size, requiring dynamic adjustment
  • DyExpert shows scaling potential similar to prompt tuning in GPT, where performance improves with increased prompts and cases
data-overview

Performance ofLogoCrossLink with diverse settings. (a) show the performance of the model improves with more training samples. (b) shows the performance ofLogoCrossLink is influenced by the number of training graphs. (c) shows the best-hidden size of the model under 6M training samples. (d) further shows the best-hidden size under different training samples .

BibTeX

@misc{huang2024graphmodelcrossdomaindynamic,
  title={One Graph Model for Cross-domain Dynamic Link Prediction}, 
  author={Xuanwen Huang and Wei Chow and Yang Wang and Ziwei Chai and Chunping Wang and Lei Chen and Yang Yang},
  year={2024},
  eprint={2402.02168},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2402.02168}, 
}