This paper proposes CrossLink, a novel framework for cross-domain link prediction.CrossLink learns the evolution pattern of a specific downstream graph and subsequently makes pattern-specific link predictions.
It employs a technique called conditioned link generation, which integrates both evolution and structure modeling to perform evolution-specific link prediction. This conditioned link generation is carried out by a transformer-decoder architecture, enabling efficient parallel training and inference.
CrossLink is trained on extensive dynamic graphs across diverse domains, encompassing 6 million dynamic edges. Extensive experiments on eight untrained graphs demonstrate thatCrossLink achieves state-of-the-art performance in cross-domain link prediction.
Compared to advanced baselines under the same settings,CrossLink shows an average improvement of 11.40% in Average Precision across eight graphs. Impressively, it surpasses the fully supervised performance of 8 advanced baselines on 6 untrained graphs.
Dynamic Link are real-world widespread. Although them have different semantics, but all can be modeled as a link prediction task
Examples of dynamic link in real-world.
Link prediction (LP) is a crucial task in dynamic graph modeling. owever, current methods mainly consider single graph setting. In this setting, the graph model is trained using supervised learning on a given graph and then makes inferences on the same graph (End2End setting).
This approach has several notable limitations when applied in real-world scenarios:
(1) High human/time costs: The End2End setting requires independently training different models for each graph. Each training process demands careful design and optimization of hyperparameters by experts. Additionally, the training process is time-consuming.
(2) Unsuitability for small datasets: The End-to-End setting typically requires a substantial number of samples for satisfactory domain-specific performance. This makes it ill-suited for small-scale application scenarios, such as B2B businesses or situations involving large graphs with limited data.
(3) Inability to learn more knowledge from different applications: Graphs in different applications may contain complementary knowledge. For instance, users purchasing items and users listening to music are both projections of human behavior. Therefore, learning from both user-item graphs and user-music graphs can help the model better understand behavior-related knowledge. However, End2End training is limited to a single graph.
Hence, we propose CrossLink, the first framework for cross-domain link prediction.CrossLink learns the evolution pattern of a specific downstream graph and subsequently makes pattern-specific link predictions.
However, cross-domain link prediction faces a fundamental challenge: how to model ambiguous structures. Different graphs are interdependent, meaning the same structure may hold different meanings and evolve differently across various graphs.
As shown in Figure, Graph A typically follows a triadic closure process, where two nodes with common neighbors are more likely to form edges, while Graph B exhibits a contrasting pattern. Consequently, even if the node pair (red and blue nodes) in Graph A and Graph B has the same local structure, their ground truths are different. We refer to this type of local structure, which has diverse ground truths across various graphs, as ambiguous structure.
Current methods usually consider single graph settings, focusing on predicting future edges between two nodes solely based on their local structure.
Therefore, these methods struggle to effectively model ambiguous structures under cross-domain setting. This limitation not only impedes the model's ability to accurately learn the meaning of ambiguous structures in multiple graphs but also hinders its capability to infer future edges correctly in target graphs, especially when the graphs contain many ambiguous structures.
To address these challenges,CrossLink adopts the following approach:
Framework of CrossLink. (a) Models the graph's evolution process via a sequence of link prediction tasks with ground truths; (b) Evolution-specific link prediction based on both nodes' representations and the evolution process.
TheCrossLink process follows these key steps:
Design Advantages:
Key Insights:
Performance of various methods regarding cross-domain link prediction. We report their Average Precision
(average of 3 runs and omit by %) across eight graphs.
Analysis result of
CrossLink regarding multi-domain training. (a) shows the result of ablation studies, where ``w/o'' removes a certain component of our model. (b) shows the performance ofCrossLink adopts different maximum sequence lengths (both training and inference). (c) indicates the performance on evaluated graphs that model solely trained by a specific graph.
Performance ofCrossLink with diverse settings. (a) show the performance of the model improves with more training samples. (b) shows the performance ofCrossLink is influenced by the number of training graphs. (c) shows the best-hidden size of the model under 6M training samples. (d) further shows the best-hidden size under different training samples
.
@misc{huang2024graphmodelcrossdomaindynamic,
title={One Graph Model for Cross-domain Dynamic Link Prediction},
author={Xuanwen Huang and Wei Chow and Yang Wang and Ziwei Chai and Chunping Wang and Lei Chen and Yang Yang},
year={2024},
eprint={2402.02168},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2402.02168},
}