How Airwallex Leverages NebulaGraph for Cross-Border Payment Solutions

The author,Xin Hao, is a risk management engineer at Airwallex and a maintainer of nebula-go. This article will introduce the applications of NebulaGraph at Airwallex.

Airwallex

As a renowned global payment and financial platform, Airwallex provides one-stop cross-border payment solutions for enterprises of all sizes and stages of development.

When shopping, gaming, subscribing to video platforms, or using paid software, you encounter various cross-border payment scenarios, including card payments or recurring subscriptions. These payment services are facilitated by game companies, e-commerce platforms, or merchants on other platforms. Behind the scenes, Airwallex supports these enterprises by offering services such as payment acceptance, fund collection, and treasury management.

In cross-border scenarios, our clients are highly diverse. On one hand, some clients are gaming companies providing in-game purchase products to users across different countries. Others are e-commerce platforms, including self-operated platforms and those like Taobao that serve multiple merchants. On the other hand, our clients extend beyond the platforms themselves to include the merchants on these platforms. These merchants are legally independent entities, necessitating clear and precise identification.

Application Scenarios of NebulaGraph

We categorize NebulaGraph’s use application scenarios into four main areas:

Know Your Customer (KYC)

It is well known that cross-border operations involve unique legal frameworks across jurisdictions. Commercial entities are the fundamental legal units, and we must classify and link these entities to ensure compliance with local regulations and business scopes.

IPersonal Information Association:
- Authorized Representatives and Legal Persons of Commercial Entities: Their information must be linked to the entities to identify who actually controls and manages them.
- Operators of Registered Accounts: When individuals register and use our services, their email addresses, phone numbers, and other details are associated with their accounts, aiding identity verification and risk assessment when needed.
Device Information Association: Device information association involves client devices, such as IP addresses and device IDs. This helps detect abnormal login behaviors, such as access from unusual geographic locations or devices.
Network Information Association: Network information association covers clients’ online behaviors and transaction patterns. By analyzing network activities, we identify anomalous transaction patterns or potential risks—for instance, frequent large transactions or multiple transactions from different locations within a short timeframe.
Facial Recognition Association: Facial recognition association leverages biometric technology to verify client identities.

Real-Time Transaction Correlation

During real-time transactions, NebulaGraph is used to correlate transaction data.

Device information association here differs from the KYC context. Here, device data originates from payment components integrated with Airwallex SDK. When users initiate payments via Airwallex SDK (mobile, web, etc.), we track device IDs, models, IP addresses, and other details.

Payment and collection information association ensures transaction legitimacy and security. In e-commerce, billing information association is key to detecting abnormal purchase patterns and potential fraud, while logistics information association tracks product flows and ensures transaction integrity.

Model Feature Extraction

For model feature extraction, raw data is preprocessed into features used for model training and real-time inference. During online graph service operations, periodic data snapshots are generated for training, while real-time interfaces compute features for inference.

Visualization and Interaction

Our current implementation is simplified, primarily serving internal operations teams. We provide web interfaces for operators to input queries and interact with the system on demand.

Why NebulaGraph?

Graph Relationships and KYC

Know Your Customer (KYC) is a critical risk control process to verify client identities and assess risks throughout their lifecycle. It ensures compliance and prevents fraud by validating identities, understanding financial behaviors, and detecting anomalies.

Graph databases naturally model KYC relationships. By representing clients, accounts, devices, and transactions as nodes and edges, NebulaGraph visualizes complex connections—such as legal-entity associations or account-device links. This structured approach enhances data readability and enables efficient anomaly detection, such as identifying fraudulent chains through relationship analysis.

Thus, adopting NebulaGraph was the best choice.

Challenges

Before NebulaGraph, we used a simple Python-based graph service for basic applications. During our early stages with smaller datasets, we extracted nodes and edges from offline data warehouses and stored them in memory.

This architecture worked initially but faced issues as data grew:

Instability: Periodic data updates every 10 minutes caused frequent failures.
Memory Inefficiency: Increasing data volumes strained memory resources.
Limited Query Flexibility: Basic query languages hindered complex scenario support, reducing development efficiency.

These challenges drove us to adopt NebulaGraph.

NebulaGraph Selection

During evaluation, we benchmarked multiple graph databases. Despite some competitors’ comparable performance, NebulaGraph excelled in functional completeness and enterprise adoption, making it the optimal choice.

System Architecture Design

Customer Relationship System Architecture

Graph Construction for Customer Relationships

We built a heterogeneous graph, offering three key advantages:

Adaptability: Supports diverse query scenarios, such as analyzing account linkages or transaction patterns.
Scalability: New relationship types can be added effortlessly as business evolves.
TTL-Based Data Cleanup: Automatic expiration of outdated data ensures relevance.

Over time, we enriched the graph with additional relationship types, allowing granular analysis via edge attributes and weights.

System Architecture

Data Ingestion
- Batch Processing (Full Data): Hourly bulk updates.
- Real-Time Data Supplement (Incremental): APIs fill missing data during new client registration.
Schema-First Approach
- Schema as Code: Field definitions and changes are codified, ensuring consistency.
- Unified Schema: Shared across services and ETL pipelines for efficiency.

Key applications of the graph service include:

Model feature extraction for real-time client and transaction analysis.
A rules engine for risk control decisions based on graph features.
Internal business services leveraging graph data.
Visualization interfaces for operational queries.

Business Complexities

Dynamic and Complex Schemas
- Multiple node types (10+ variants) with diverse attributes.
- Frequent schema adjustments (monthly).
Data Logic Changes and Cleansing
- Requires flexible workflows to accommodate rapid business shifts.

Mitigation Strategies

Centralized Schema Definition
- Define schemas in code, auto-generating changes upon commits.
- Field modifications (renaming, additions) are code-driven and automated.
TTL + Full Data Writes
- Combines TTL for data freshness with full writes for completeness.

We adopted a declarative schema management approach. Field modifications trigger auto-generated schema change statements.

For query result mapping, we use a "declare-and-get" method. Structs defined in code with annotations automatically map query results to fields, eliminating manual handling. SDKs or APIs dynamically fetch the latest schema, ensuring stability despite changes.

Transaction Correlation System Architecture

This system prioritizes real-time performance, adopting a "compute-separate, merge-on-read" architecture.

Historical Data (>4 hours): Pre-processed via hourly batch jobs into subgraphs stored in the graph database.
Real-Time Data (≤4 hours): Ingested via Kafka, split, and written to the graph database after processing.

Future Outlook

Resource Isolation: Single Nebula clusters lack isolation, causing interference between applications during query peaks.
TP/AP Compatibility: A single cluster struggles to balance transactional processing (TP) and analytical processing (AP).
Algorithm Ecosystem: Limited openness and ecosystem for graph algorithms.

Future improvements aim to enhance resource isolation, enrich graph algorithm support, and foster community collaboration to advance the technology and ecosystem.