A comprehensive Python benchmark suite for comparing RPyC (Remote Python Call) with HTTP/REST performance across multiple dimensions.
================================================================================
BENCHMARK RESULTS SUMMARY
================================================================================
RPYC_THREADED
----------------------------------------
Connection Time: 10.67ms (±6.69ms)
Latency Mean: 64.48ms (±14.33ms)
Latency Median: 64.96ms
Latency P95: 87.05ms
Latency P99: 96.98ms
Concurrent Connections: 128
Total Requests: 12800
Success Rate: 100.00%
RPYC_FORKING
----------------------------------------
Connection Time: 10.92ms (±8.36ms)
Latency Mean: 64.68ms (±13.99ms)
Latency Median: 64.91ms
Latency P95: 86.68ms
Latency P99: 97.87ms
Concurrent Connections: 128
Total Requests: 12800
Success Rate: 100.00%
HTTP_THREADED
----------------------------------------
Connection Time: 0.09ms (±0.05ms)
Latency Mean: 179.76ms (±110.44ms)
Latency Median: 159.08ms
Latency P95: 361.90ms
Latency P99: 570.77ms
Concurrent Connections: 128
Total Requests: 12800
Success Rate: 100.00%
================================================================================
- Performance Study - Academic study with localhost & LAN results
- Features & Architecture
- Quick Install
- Installation Options
- Command-Line Usage - Testing RPyC vs HTTP/Flask generically
- Python API for Existing Apps - Integrating benchmarks into your application
- How It Works
- Architecture
- Contributing
📊 Read the full performance study - Research paper with comprehensive localhost and LAN benchmarks.
Testing on Intel Core i9-9980HK (16 cores, 32GB RAM) across two network topologies:
| Topology | RPyC Mean Latency | HTTP Mean Latency | RPyC Advantage | Network Overhead |
|---|---|---|---|---|
| Localhost | 3.3ms | 12.3ms | 3.7x faster | Baseline |
| LAN (Parallels VM) | 4.0-4.9ms | 12.3ms | 2.5-3.1x faster | RPyC: +19-47% HTTP: +0.5% |
Critical Insight: RPyC provides substantial latency advantages but shows greater sensitivity to network overhead. HTTP maintains consistent performance across topologies.
# Run your own benchmarks
pip install rpycbench
# Localhost testing
rpycbench-sweep --output-dir benchmarks --description "localhost only"
# LAN/remote testing
rpycbench-sweep --remote-host user@hostname --description "your topology"Results include:
- JSON files with full system specs and metrics
- Publication-quality graphs (PNG)
- Statistical analysis (mean, median, P95, P99)
When to use RPyC: Python-to-Python communication on low-latency networks (<2ms) where 2-4x performance gain matters
When to use HTTP/REST: Cross-language systems, public APIs, or networks with variable latency where consistency matters
See the full study for detailed methodology, results, and production recommendations.
rpycbench measures RPyC vs HTTP/REST performance across five dimensions:
- Connection Time: Initial handshake and connection establishment
- Latency: Round-trip time for request/response pairs (mean, median, P95, P99)
- Bandwidth: Data transfer rates for various payload sizes
- Binary File Transfer: Large file transfers with configurable sizes and chunk sizes
- Concurrency: Performance under load with multiple simultaneous connections
What's Being Measured:
- RPyC uses binary protocol over raw sockets with Python object serialization
- HTTP uses JSON over REST with request/response overhead
- Tests measure both baseline protocol performance and real-world usage patterns
- Profiler identifies bottlenecks like excessive round trips and netref overhead
Key Capabilities:
- Comprehensive metrics: connection time, latency (P95/P99), bandwidth, system resources
- Multiple server modes: RPyC threaded/forking, HTTP threaded
- High concurrency testing: 128+ parallel connections with per-connection tracking
- Two usage modes: CLI for generic testing, Python API for application integration
- Built-in profiling: track RPyC round trips, netrefs, and call patterns
Always get the latest version with a single command:
# Using Python (works everywhere)
curl -sSL https://raw.githubusercontent.com/patrickkidd/rpycbench/main/install-latest.py | python3
# Or using bash (Linux/Mac)
curl -sSL https://raw.githubusercontent.com/patrickkidd/rpycbench/main/install-latest.sh | bashThis automatically fetches and installs the latest wheel from GitHub releases - no version string needed!
# Install specific version
pip install https://github.com/patrickkidd/rpycbench/releases/download/v0.1.0-build.123/rpycbench-0.1.0-py3-none-any.whl
# Upgrade to a specific version
pip install --upgrade --force-reinstall https://github.com/patrickkidd/rpycbench/releases/download/v0.1.0-build.123/rpycbench-0.1.0-py3-none-any.whlBrowse all releases: https://github.com/patrickkidd/rpycbench/releases
git clone https://github.com/patrickkidd/rpycbench.git
cd rpycbench
pip install -r requirements.txt
pip install -e .Purpose: Test RPyC vs HTTP/Flask performance generically without writing code.
The command-line tool runs benchmarks comparing RPyC and HTTP servers across different scenarios. Use this to understand baseline performance characteristics before integrating into your application.
# Run all benchmarks with default settings
rpycbench
# Quick baseline (skip forking server for speed)
rpycbench --skip-rpyc-forking
# Save results to JSON
rpycbench --output results.json# Test only latency
rpycbench --num-serial-connections 10 --num-requests 5000
# Test only concurrency
rpycbench --num-parallel-clients 50 --requests-per-client 200
# Test only binary transfers
rpycbench --test-binary-transfer --binary-file-sizes 1048576 10485760The rpycbench-sweep command runs comprehensive benchmarks across localhost and remote hosts, collecting full system specifications and generating publication-quality graphs.
# Localhost benchmarks
rpycbench-sweep --output-dir benchmarks
# Remote host benchmarks (requires SSH)
rpycbench-sweep --remote-host user@hostname --description "your topology"- JSON results with complete system specs (CPU, RAM, OS, Python version)
- Graphs: Connection time, latency, percentiles, bandwidth comparisons
- Reproducible on any infrastructure with SSH access
benchmarks/
├── results_local.json # Localhost results
├── results_remote.json # Remote host results
├── graphs/
│ ├── connection_time_comparison.png
│ ├── latency_comparison.png
│ ├── percentile_comparison.png
│ └── localhost_vs_lan_comparison.png
└── PERFORMANCE_STUDY.md # Generated study template
# Skip specific tests
rpycbench-sweep --skip-rpyc-forking --skip-http
# Skip graph generation (JSON only)
rpycbench-sweep --skip-graphs
# Custom description for results
rpycbench-sweep --description "AWS us-east-1 t3.xlarge"- Research: Generate academic performance studies with reproducible methodology
- Infrastructure comparison: Benchmark different network topologies (LAN, WAN, cloud)
- Architecture decisions: Quantify RPyC vs HTTP trade-offs for your specific environment
- CI/CD: Track performance regression across deployments
See benchmarks/PERFORMANCE_STUDY.md for example output and interpretation.
Goal: Verify RPyC and HTTP are working correctly
rpycbench \
--skip-rpyc-forking \
--num-serial-connections 10 \
--num-requests 100 \
--num-parallel-clients 5What it tests: Basic connectivity, latency, and light concurrency Time: ~10 seconds
Goal: Understand request/response latency characteristics
rpycbench \
--num-requests 10000 \
--num-serial-connections 1 \
--num-parallel-clients 1What it tests: P50, P95, P99 latency with large sample size Time: ~30 seconds Output: Detailed percentile breakdown
Goal: Test performance under heavy concurrent load
rpycbench \
--num-parallel-clients 128 \
--requests-per-client 500 \
--skip-rpyc-forkingWhat it tests: 128 parallel connections making 500 requests each Time: ~1-2 minutes Watch for: Success rate, connection failures, resource usage
Goal: Understand data transfer rates for different payload sizes
rpycbench \
--num-serial-connections 10 \
--num-requests 100 \
--num-parallel-clients 1What it tests: Upload/download bandwidth for 1KB - 1MB payloads Time: ~20 seconds Focus: Bandwidth benchmark results
Goal: Test large file transfer performance
# Test with default 64KB chunks
rpycbench --test-binary-transfer
# Compare different chunk sizes
rpycbench --test-binary-transfer --binary-chunk-size 8192
rpycbench --test-binary-transfer --binary-chunk-size 524288
# Custom file sizes
rpycbench --test-binary-transfer \
--binary-file-sizes 5242880 52428800 \
--binary-chunk-size 65536 \
--binary-iterations 5What it tests: Multi-MB file transfers with different chunking strategies Time: Varies by file size (can be 5-10 minutes for 500MB) Insight: Shows impact of chunk size on throughput
Goal: Compare threaded vs forking server performance
rpycbench \
--num-parallel-clients 32 \
--requests-per-client 100What it tests: Both threaded and forking RPyC servers under load Time: ~1 minute Compare: Threaded vs forking results for your workload
Goal: Simulate production-like mixed workload
rpycbench \
--num-serial-connections 50 \
--num-requests 2000 \
--num-parallel-clients 20 \
--requests-per-client 200 \
--test-binary-transfer \
--binary-file-sizes 1048576 \
--binary-chunk-size 65536 \
--output production-sim.jsonWhat it tests: Connection establishment, latency, bandwidth, concurrency, file transfers Time: ~3-5 minutes Use: Baseline for production planning
Goal: Test only forking server (may be needed for CPU-bound work)
rpycbench \
--skip-rpyc-threaded \
--skip-http \
--num-parallel-clients 10What it tests: RPyC forking server isolation Time: ~30 seconds Use: When GIL contention is a concern
Goal: Establish HTTP/REST baseline for comparison
rpycbench \
--skip-rpyc-threaded \
--skip-rpyc-forking \
--num-requests 5000 \
--output http-baseline.jsonWhat it tests: Pure HTTP/Flask performance Time: ~20 seconds Use: Compare against existing HTTP services
Goal: Find absolute minimum latency/overhead
rpycbench \
--skip-rpyc-forking \
--num-serial-connections 1 \
--num-requests 10000 \
--num-parallel-clients 1What it tests: Single connection, sequential requests Time: ~30 seconds Result: Best-case latency numbers (no concurrency overhead)
Goal: Run benchmarks with server on a remote host via SSH
# Benchmark against remote server (automatic deployment)
rpycbench --remote-host user@hostname
# Benchmark with custom host/port configuration
rpycbench --remote-host user@192.168.1.100 \
--rpyc-host 0.0.0.0 \
--http-host 0.0.0.0
# Skip HTTP and test only RPyC on remote host
rpycbench --remote-host deploy@production-server \
--skip-http \
--num-parallel-clients 32What it does:
- Automatically deploys rpycbench to remote host via SSH
- Caches deployment (only redeploys when code changes)
- Starts server processes on remote host
- Runs benchmarks from local machine against remote server
- Cleans up remote processes when done
Requirements:
- SSH access to remote host with public key authentication
uvinstalled on remote host- Firewall allows connections on specified ports
Time: Initial deployment ~30s, cached deployments ~5s overhead Use: Test performance across network, production-like infrastructure testing
patrick@turin:~/dev/rpycbench$ uv run rpycbench --remote-host parallels@hurin
Starting Benchmark Suite...
================================================================================
[1/3] Testing RPyC Threaded Server...
[Remote RPyC] Connecting to parallels@hurin...
[Remote Deploy] Starting deployment to remote host...
[Remote Deploy] Local code checksum: d766c9d8007e...
[Remote Deploy] Checksums differ, deploying new code...
[Remote Deploy] Packaging code...
[Remote Deploy] Transferring code to hurin...
[Remote Deploy] Extracting code on remote host...
[Remote Deploy] Found uv at: /home/parallels/.local/bin/uv
[Remote Deploy] Setting up Python environment...
[Remote Deploy] Installing dependencies...
[Remote Deploy] Deployment complete
[Remote RPyC] Starting RPyC server (threaded) binding to 0.0.0.0:18812 on hurin...
[Remote RPyC] Server started with PID 10022
[Remote RPyC] Server ready
- Connection benchmark (100 serial connections)...
- Latency benchmark (1000 requests)...
- Bandwidth benchmark...
- Concurrent benchmark (10 parallel clients)...
Starting 10 concurrent clients...
10/10 clients completed...
All 10 clients completed
[Remote RPyC] Stopping server (PID 10022)...
[Remote RPyC] Disconnected
[2/3] Testing RPyC Forking Server...
[Remote RPyC] Connecting to parallels@hurin...
[Remote Deploy] Starting deployment to remote host...
[Remote Deploy] Local code checksum: d766c9d8007e...
[Remote Deploy] Using cached deployment (checksum: d766c9d8007e...)
[Remote RPyC] Starting RPyC server (forking) binding to 0.0.0.0:18812 on hurin...
[Remote RPyC] Server started with PID 10193
[Remote RPyC] Server ready
- Connection benchmark (100 serial connections)...
- Latency benchmark (1000 requests)...
- Bandwidth benchmark...
- Concurrent benchmark (10 parallel clients)...
Starting 10 concurrent clients...
10/10 clients completed...
All 10 clients completed
[Remote RPyC] Stopping server (PID 10193)...
[Remote RPyC] Disconnected
[3/3] Testing HTTP/REST Server...
[Remote HTTP] Connecting to parallels@hurin...
[Remote Deploy] Starting deployment to remote host...
[Remote Deploy] Local code checksum: d766c9d8007e...
[Remote Deploy] Using cached deployment (checksum: d766c9d8007e...)
[Remote HTTP] Starting HTTP server binding to 0.0.0.0:5000 on hurin...
[Remote HTTP] Server started with PID 10361
[Remote HTTP] Server ready
- Connection benchmark (100 serial connections)...
- Latency benchmark (1000 requests)...
- Bandwidth benchmark...
- Concurrent benchmark (10 parallel clients)...
Starting 10 concurrent clients...
10/10 clients completed...
All 10 clients completed
[Remote HTTP] Stopping server (PID 10361)...
[Remote HTTP] Disconnected
================================================================================
All benchmarks complete!
================================================================================
BENCHMARK RESULTS SUMMARY
================================================================================
RPYC_THREADED
----------------------------------------
Connection Time: 1.76ms (±0.38ms)
Latency Mean: 7.01ms (±2.89ms)
Latency Median: 6.61ms
Latency P95: 12.00ms
Latency P99: 15.89ms
Concurrent Connections: 10
Total Requests: 1000
Success Rate: 100.00%
RPYC_FORKING
----------------------------------------
Connection Time: 1.04ms (±0.26ms)
Latency Mean: 5.20ms (±2.42ms)
Latency Median: 5.00ms
Latency P95: 6.69ms
Latency P99: 8.35ms
Concurrent Connections: 10
Total Requests: 1000
Success Rate: 100.00%
HTTP_THREADED
----------------------------------------
Connection Time: 0.10ms (±0.04ms)
Latency Mean: 17.72ms (±3.85ms)
Latency Median: 16.74ms
Latency P95: 25.81ms
Latency P99: 32.77ms
Concurrent Connections: 10
Total Requests: 1000
Success Rate: 100.00%
================================================================================
rpycbench [options]
--remote-host USER@HOST Remote host for server deployment via SSH (format: user@hostname)
Enables automatic deployment and server management on remote host
--rpyc-host HOST RPyC server host (default: localhost)
--rpyc-port PORT RPyC server port (default: 18812)
--http-host HOST HTTP server host (default: localhost)
--http-port PORT HTTP server port (default: 5000)
--skip-rpyc-threaded Skip RPyC threaded server tests
--skip-rpyc-forking Skip RPyC forking server tests
--skip-http Skip HTTP server tests
--num-serial-connections N Sample size: number of serial connections created one-at-a-time
to measure average connection establishment time (default: 100)
--num-requests N Sample size: number of requests for latency benchmark (default: 1000)
--num-parallel-clients N Number of parallel clients (simultaneous connections to
measure performance under load) (default: 10)
--requests-per-client N Requests per parallel client (default: 100)
--test-binary-transfer Enable binary file transfer benchmarks
--binary-file-sizes SIZE [SIZE ...]
File sizes in bytes (default: 1572864 134217728 524288000)
--binary-chunk-size SIZE Chunk size in bytes (default: 65536)
Run multiple times with different values to compare
--binary-iterations N Number of iterations per test (default: 3)
--output FILE, -o FILE Save JSON results to file
--quiet, -q Suppress summary output
# Comprehensive test with all options
rpycbench \
--rpyc-host localhost \
--rpyc-port 18812 \
--http-host localhost \
--http-port 5000 \
--num-serial-connections 200 \
--num-requests 5000 \
--num-parallel-clients 50 \
--requests-per-client 100 \
--test-binary-transfer \
--binary-file-sizes 1048576 10485760 52428800 \
--binary-chunk-size 65536 \
--binary-iterations 5 \
--output comprehensive-results.json
# Minimal fast test
rpycbench \
--skip-rpyc-forking \
--skip-http \
--num-serial-connections 5 \
--num-requests 50 \
--num-parallel-clients 2 \
--quietrpycbench-sweep [options]
Generate complete performance studies with system specifications, graphs, and reproducible methodology.
--remote-host USER@HOST Remote host for network testing (format: user@hostname)
Enables SSH-based remote deployment and testing
--output-dir PATH Output directory for results and graphs (default: benchmarks)
--skip-graphs Skip graph generation (JSON results only)
--description TEXT Description of test topology for documentation
--skip-rpyc-threaded Skip RPyC threaded server tests
--skip-rpyc-forking Skip RPyC forking server tests
--skip-http Skip HTTP server tests
benchmarks/
├── results_local.json # Localhost benchmark results
├── results_remote.json # Remote host results (if --remote-host used)
├── graphs/
│ ├── connection_time_comparison.png
│ ├── latency_comparison.png
│ ├── percentile_comparison.png
│ └── localhost_vs_lan_comparison.png # Generated when both exist
└── PERFORMANCE_STUDY.md # Optional: template for academic study
# Localhost only
rpycbench-sweep
# Remote testing with custom description
rpycbench-sweep \
--remote-host parallels@hurin \
--description "Parallels VM on same MacOS host"
# JSON only (skip graphs)
rpycbench-sweep --skip-graphs --output-dir results
# Test subset of protocols
rpycbench-sweep --skip-rpyc-forking --remote-host aws@serverPurpose: Integrate benchmarks into your existing application to measure real-world performance.
Use the Python API to:
- Measure your application's actual RPyC/HTTP performance
- Compare baseline protocol performance vs your application overhead
- Track performance over time in your codebase
- Profile specific operations in your application
from rpycbench.core.benchmark import BenchmarkContext
from rpycbench.servers.rpyc_servers import RPyCServer, create_rpyc_connection
# Start server
with RPyCServer(host='localhost', port=18812, mode='threaded'):
# Create benchmark context
with BenchmarkContext(
name="My Operation",
protocol="rpyc",
measure_latency=True,
) as bench:
conn = create_rpyc_connection('localhost', 18812)
# Measure your operations
for i in range(100):
with bench.measure_request():
result = conn.root.ping()
bench.record_request(success=True)
conn.close()
# Get results
metrics = bench.get_results()
stats = metrics.compute_statistics()
print(f"Average latency: {stats['latency']['mean']*1000:.2f}ms")
print(f"P95 latency: {stats['latency']['p95']*1000:.2f}ms")Here's a complete example of a synthetic application with remote function calls being benchmarked:
"""
Synthetic Application Example: Remote Data Processing Service
This demonstrates a realistic application with:
- Data validation
- Remote computation
- Error handling
- Performance measurement
"""
import rpyc
from rpycbench.core.benchmark import BenchmarkContext
from rpycbench.servers.rpyc_servers import RPyCServer, create_rpyc_connection
# Define your RPyC service (your application logic)
class DataProcessingService(rpyc.Service):
"""Example remote data processing service"""
def exposed_validate_data(self, data):
"""Validate input data"""
if not isinstance(data, dict):
raise ValueError("Data must be a dictionary")
if 'values' not in data:
raise ValueError("Data must contain 'values' key")
return True
def exposed_compute_statistics(self, data):
"""Compute statistics on data"""
values = data['values']
return {
'mean': sum(values) / len(values),
'min': min(values),
'max': max(values),
'count': len(values),
}
def exposed_process_batch(self, batch):
"""Process a batch of data items"""
results = []
for item in batch:
# Simulate processing
processed = {
'id': item['id'],
'result': item['value'] * 2,
'status': 'processed'
}
results.append(processed)
return results
def exposed_store_results(self, results):
"""Store processing results"""
# Simulate storage
return {'stored': len(results), 'status': 'success'}
# Your application client
class DataProcessingClient:
"""Client for data processing service"""
def __init__(self, connection):
self.conn = connection
def validate(self, data):
"""Validate data before processing"""
return self.conn.root.validate_data(data)
def compute(self, data):
"""Compute statistics"""
return self.conn.root.compute_statistics(data)
def process_batch(self, batch):
"""Process a batch of items"""
return self.conn.root.process_batch(batch)
def store(self, results):
"""Store results"""
return self.conn.root.store_results(results)
def full_pipeline(self, data, batch):
"""Full processing pipeline"""
# Validation
self.validate(data)
# Computation
stats = self.compute(data)
# Batch processing
results = self.process_batch(batch)
# Storage
store_result = self.store(results)
return {
'stats': stats,
'batch_results': results,
'storage': store_result
}
# Benchmark your application
def benchmark_application():
"""Benchmark the data processing application"""
# Start server with your service
with RPyCServer(host='localhost', port=18812, mode='threaded'):
# Connect to server
conn = create_rpyc_connection('localhost', 18812)
# Create client
client = DataProcessingClient(conn)
# Prepare test data
test_data = {
'values': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
}
test_batch = [
{'id': i, 'value': i * 10}
for i in range(20)
]
# Benchmark individual operations
print("=" * 80)
print("BENCHMARKING INDIVIDUAL OPERATIONS")
print("=" * 80)
# 1. Validation performance
with BenchmarkContext("Validation", "rpyc", measure_latency=True) as bench:
for _ in range(1000):
with bench.measure_request():
client.validate(test_data)
bench.record_request(success=True)
stats = bench.get_results().compute_statistics()
print(f"\nValidation:")
print(f" Mean: {stats['latency']['mean']*1000:.2f}ms")
print(f" P95: {stats['latency']['p95']*1000:.2f}ms")
# 2. Computation performance
with BenchmarkContext("Computation", "rpyc", measure_latency=True) as bench:
for _ in range(1000):
with bench.measure_request():
client.compute(test_data)
bench.record_request(success=True)
stats = bench.get_results().compute_statistics()
print(f"\nComputation:")
print(f" Mean: {stats['latency']['mean']*1000:.2f}ms")
print(f" P95: {stats['latency']['p95']*1000:.2f}ms")
# 3. Batch processing performance with bandwidth tracking
with BenchmarkContext("Batch Processing", "rpyc",
measure_latency=True,
measure_bandwidth=True) as bench:
for _ in range(100):
with bench.measure_request(bytes_sent=len(str(test_batch))):
results = client.process_batch(test_batch)
bench.record_request(success=True)
stats = bench.get_results().compute_statistics()
print(f"\nBatch Processing:")
print(f" Mean latency: {stats['latency']['mean']*1000:.2f}ms")
print(f" P95 latency: {stats['latency']['p95']*1000:.2f}ms")
print(f" Throughput: {stats['upload_bandwidth']['mean']/(1024*1024):.2f} MB/s")
# 4. Full pipeline performance
print("\n" + "=" * 80)
print("BENCHMARKING FULL PIPELINE")
print("=" * 80)
with BenchmarkContext("Full Pipeline", "rpyc", measure_latency=True) as bench:
for _ in range(100):
with bench.measure_request():
try:
result = client.full_pipeline(test_data, test_batch)
bench.record_request(success=True)
except Exception as e:
bench.record_request(success=False)
stats = bench.get_results().compute_statistics()
metrics = bench.get_results()
success_rate = (metrics.total_requests - metrics.failed_requests) / metrics.total_requests * 100
print(f"\nFull Pipeline:")
print(f" Mean latency: {stats['latency']['mean']*1000:.2f}ms")
print(f" P95 latency: {stats['latency']['p95']*1000:.2f}ms")
print(f" P99 latency: {stats['latency']['p99']*1000:.2f}ms")
print(f" Success rate: {success_rate:.1f}%")
print(f" Total requests: {metrics.total_requests}")
print(f" Failed: {metrics.failed_requests}")
conn.close()
if __name__ == '__main__':
benchmark_application()Save this as my_app_benchmark.py and run:
python my_app_benchmark.pyCompare your application's performance against baseline protocol performance:
from rpycbench.core.benchmark import BenchmarkContext, LatencyBenchmark
from rpycbench.servers.rpyc_servers import RPyCServer, create_rpyc_connection
def compare_baseline_vs_application():
"""Compare baseline RPyC performance vs application performance"""
with RPyCServer(host='localhost', port=18812, mode='threaded'):
# 1. Baseline benchmark (direct ping)
baseline_bench = LatencyBenchmark(
name="Baseline",
protocol="rpyc",
server_mode="threaded",
connection_factory=lambda: create_rpyc_connection('localhost', 18812),
request_func=lambda conn: conn.root.ping(),
num_requests=1000,
)
baseline_metrics = baseline_bench.execute()
baseline_stats = baseline_metrics.compute_statistics()
# 2. Application benchmark
conn = create_rpyc_connection('localhost', 18812)
# Your application class with business logic
class MyApp:
def __init__(self, connection):
self.conn = connection
self.cache = {}
def process_request(self, data):
# Your application logic (validation, caching, etc.)
if data in self.cache:
return self.cache[data]
# Make remote call
result = self.conn.root.echo(data)
# Post-processing
self.cache[data] = result
return result
app = MyApp(conn)
with BenchmarkContext("Application", "rpyc", measure_latency=True) as bench:
for i in range(1000):
with bench.measure_request():
app.process_request(f"data_{i}")
bench.record_request(success=True)
app_metrics = bench.get_results()
app_stats = app_metrics.compute_statistics()
conn.close()
# Compare results
print("=" * 80)
print("BASELINE VS APPLICATION COMPARISON")
print("=" * 80)
baseline_mean = baseline_stats['latency']['mean'] * 1000
app_mean = app_stats['latency']['mean'] * 1000
overhead = app_mean - baseline_mean
overhead_pct = (overhead / baseline_mean) * 100
print(f"\nBaseline (direct RPyC call):")
print(f" Mean: {baseline_mean:.2f}ms")
print(f" P95: {baseline_stats['latency']['p95']*1000:.2f}ms")
print(f"\nApplication (with business logic):")
print(f" Mean: {app_mean:.2f}ms")
print(f" P95: {app_stats['latency']['p95']*1000:.2f}ms")
print(f"\nOverhead:")
print(f" Absolute: {overhead:.2f}ms")
print(f" Relative: {overhead_pct:.1f}%")
if overhead_pct > 50:
print(f"\n⚠️ Application overhead is {overhead_pct:.1f}% - consider optimization")
else:
print(f"\n✓ Application overhead is reasonable at {overhead_pct:.1f}%")
if __name__ == '__main__':
compare_baseline_vs_application()from rpycbench.core.benchmark import BenchmarkContext
from rpycbench.servers.rpyc_servers import RPyCServer, create_rpyc_connection
with RPyCServer(host='localhost', port=18812, mode='threaded'):
with BenchmarkContext("My Service", "rpyc", measure_latency=True) as ctx:
conn = create_rpyc_connection('localhost', 18812)
for i in range(100):
with ctx.measure_request():
conn.root.my_method(i)
ctx.record_request(success=True)
conn.close()
stats = ctx.get_results().compute_statistics()
print(f"Mean: {stats['latency']['mean']*1000:.2f}ms")with BenchmarkContext("File Upload", "rpyc", measure_bandwidth=True) as ctx:
conn = create_rpyc_connection('localhost', 18812)
for file_data in my_files:
with ctx.measure_request(bytes_sent=len(file_data)):
conn.root.upload(file_data)
ctx.record_request(success=True)
conn.close()
stats = ctx.get_results().compute_statistics()
print(f"Upload speed: {stats['upload_bandwidth']['mean']/(1024*1024):.2f} MB/s")with BenchmarkContext("Connections", "rpyc", measure_connection=True) as ctx:
for i in range(50):
with ctx.measure_connection_time():
conn = create_rpyc_connection('localhost', 18812)
# Use connection
conn.root.ping()
conn.close()
stats = ctx.get_results().compute_statistics()
print(f"Avg connection time: {stats['connection_time']['mean']*1000:.2f}ms")with BenchmarkContext("API Calls", "rpyc", measure_latency=True) as ctx:
conn = create_rpyc_connection('localhost', 18812)
for request in requests:
with ctx.measure_request():
try:
result = conn.root.process(request)
ctx.record_request(success=True)
except Exception as e:
ctx.record_request(success=False)
# Handle error
conn.close()
metrics = ctx.get_results()
success_rate = (metrics.total_requests - metrics.failed_requests) / metrics.total_requests
print(f"Success rate: {success_rate*100:.1f}%")from rpycbench.core.benchmark import BinaryTransferBenchmark
from rpycbench.servers.rpyc_servers import RPyCServer, create_rpyc_connection
with RPyCServer(host='localhost', port=18812, mode='threaded'):
bench = BinaryTransferBenchmark(
name="My File Transfer",
protocol="rpyc",
server_mode="threaded",
connection_factory=lambda: create_rpyc_connection('localhost', 18812),
upload_func=lambda conn, data: conn.root.upload_file(data),
download_func=lambda conn, size: conn.root.download_file(size),
upload_chunked_func=lambda conn, chunks: conn.root.upload_file_chunked(chunks),
download_chunked_func=lambda conn, size, chunk_size: conn.root.download_file_chunked(size, chunk_size),
file_sizes=[1_048_576, 10_485_760], # 1MB, 10MB
chunk_size=65_536, # 64KB
iterations=5,
)
metrics = bench.execute()
for result in metrics.metadata['transfer_results']:
print(f"{result['type']}: {result['throughput_mbps']:.2f} Mbps")For comprehensive guides on using the Python API to profile and optimize RPyC applications:
-
Quickstart Guide - 5-minute diagnosis for slow RPyC applications
- Profile existing RPyC applications with zero code changes
- Measure baseline vs application performance
- Common scenarios: slow calls, concurrency issues, parallel clients
-
Cookbook - Real-world patterns and solutions
- Diagnostic flowchart for performance issues
- CPU-bound vs I/O-bound bottleneck diagnosis
- Server vs client bottleneck detection
- Understanding Python GIL and server modes
- Optimization techniques with before/after examples
-
API Reference - Complete API documentation
- All benchmark classes with parameters and examples
- Metrics interpretation guidance with red flags
- Profiling and telemetry API
- Visualization functions
- Server management
-
CPU vs I/O Comparison Example - Demonstrates GIL impact on performance
Profile and diagnose performance issues in your RPyC applications by tracking network round trips, netref usage, and call patterns.
from rpycbench.utils.profiler import create_profiled_connection
from rpycbench.utils.telemetry import RPyCTelemetry
from rpycbench.servers.rpyc_servers import RPyCServer
with RPyCServer(host='localhost', port=18812, mode='threaded'):
telemetry = RPyCTelemetry(
enabled=True,
track_netrefs=True,
slow_call_threshold=0.1,
deep_stack_threshold=5,
)
conn = create_profiled_connection(
host='localhost',
port=18812,
telemetry_inst=telemetry,
)
# Your remote calls are automatically tracked
for i in range(10):
conn.root.ping()
conn.close()
# Print comprehensive report
telemetry.print_summary()- Network Round Trips: Count every remote call
- NetRef Operations: Track netref creation, access, lifecycle
- Call Stacks: Monitor nesting depth and call chains
- Slow Calls: Automatically detect calls exceeding threshold
- Performance: Latency per call, total duration, resource usage
================================================================================
RPYC TELEMETRY SUMMARY
================================================================================
Total Calls: 45
Network Round Trips: 45
NetRefs Created: 3
Active NetRefs: 1
Current Stack Depth: 0
Max Stack Depth: 3
Slow Calls (>0.1s): 2
Avg Call Duration: 12.34ms
--------------------------------------------------------------------------------
SLOW CALLS:
--------------------------------------------------------------------------------
process_large_batch 150.23ms depth=0
nested_computation 120.45ms depth=2
For more profiling examples, see the examples/profiling_*.py files in the repository.
Profile any RPyC application from the command line without modifying your code:
# Run your script with automatic profiling
python -m rpycbench.autobench myapp.py
# With arguments to your script
python -m rpycbench.autobench myapp.py --host localhost --port 18861
# Run a module
python -m rpycbench.autobench -m mymoduleHow it works:
- Automatically monkey-patches
rpyc.connect()andrpyc.utils.classic.connect() - Tracks all RPyC calls transparently
- Prints detailed telemetry summary on exit
- Zero modifications to your application code
Example output:
================================================================================
RPYC TELEMETRY SUMMARY
================================================================================
Total Calls: 128
Network Round Trips: 128
NetRefs Created: 15
Active NetRefs: 3
Current Stack Depth: 0
Max Stack Depth: 5
Slow Calls (>0.1s): 3
Avg Call Duration: 15.23ms
--------------------------------------------------------------------------------
SLOW CALLS (>0.1s):
--------------------------------------------------------------------------------
remote_sys.version (getattr)
Duration: 150.23ms
Stack Depth: 2
Called from: /Users/user/myapp.py:45 in fetch_system_info
Call Stack:
├─> getattr(modules) (getattr) [5.12ms]
└─> getattr(version) (getattr) [150.23ms]
at /Users/user/myapp.py:45 in fetch_system_info
Configuration options:
# Adjust slow call threshold
python -m rpycbench.autobench myapp.py --slow-threshold 0.05
# Adjust deep stack threshold
python -m rpycbench.autobench myapp.py --deep-stack-threshold 10
# Disable netref tracking for performance
python -m rpycbench.autobench myapp.py --no-netrefs
# Disable call stack tracking
python -m rpycbench.autobench myapp.py --no-stacksAdd optional markers to identify important time windows in your application:
from rpycbench import mark
# Using context manager (recommended)
with mark.section("Establishing 128 client connections"):
for i in range(128):
connections.append(rpyc.connect(host, port))
# Using explicit start/end
mark.start("Processing batch")
process_large_batch()
mark.end()Markers are no-ops when autobench is not running, so you can leave them in your code permanently without any performance impact.
Example output with markers:
================================================================================
PROFILING MARKERS - CRITICAL SECTIONS
================================================================================
Establishing 128 client connections
Duration: 2543.12ms
Round Trips: 128
NetRefs Created: 128
Processing batch
Duration: 856.34ms
Round Trips: 45
NetRefs Created: 12
Complete example:
#!/usr/bin/env python3
"""
Example: myapp.py
Run normally: python myapp.py
Run profiled: python -m rpycbench.autobench myapp.py
"""
import rpyc.utils.classic
try:
from rpycbench import mark
except ImportError:
# Markers become no-ops if rpycbench not installed
class mark:
@staticmethod
def start(name): pass
@staticmethod
def end(): pass
@staticmethod
def section(name):
from contextlib import contextmanager
@contextmanager
def noop():
yield
return noop()
def main():
with mark.section("Connecting to server"):
conn = rpyc.utils.classic.connect('localhost', 18812)
with mark.section("Fetching remote data"):
remote_os = conn.modules.os
cwd = remote_os.getcwd()
print(f"Remote CWD: {cwd}")
conn.close()
if __name__ == '__main__':
main()See examples/autobench_example.py for a complete working example.
Run benchmarks against a remote server using SSH deployment:
from rpycbench.benchmarks.suite import BenchmarkSuite
suite = BenchmarkSuite(
rpyc_host='0.0.0.0',
rpyc_port=18812,
http_host='0.0.0.0',
http_port=5000,
remote_host='user@remote-server.com',
)
results = suite.run_all(
test_rpyc_threaded=True,
test_rpyc_forking=True,
test_http=True,
)
results.print_summary()What happens:
- Connects to remote host via SSH
- Deploys rpycbench code (cached if unchanged)
- Sets up Python environment on remote host
- Starts server processes remotely
- Runs benchmarks from local machine
- Stops servers and cleans up
Using Remote Servers with BenchmarkContext:
from rpycbench.core.benchmark import BenchmarkContext
from rpycbench.remote.servers import RemoteRPyCServer
from rpycbench.servers.rpyc_servers import create_rpyc_connection
with RemoteRPyCServer(
remote_host='user@hostname',
host='0.0.0.0',
port=18812,
mode='threaded'
):
with BenchmarkContext("My Remote Test", "rpyc", measure_latency=True) as bench:
conn = create_rpyc_connection('hostname', 18812)
for i in range(100):
with bench.measure_request():
result = conn.root.ping()
bench.record_request(success=True)
conn.close()
stats = bench.get_results().compute_statistics()
print(f"Remote latency: {stats['latency']['mean']*1000:.2f}ms")Important: All benchmark servers run in separate processes from the benchmark client code. This provides true isolation without GIL interference.
-
Server Lifecycle:
- Server spawned in separate process before each test
- Server lifecycle automatically managed by parent process
- Tests run against the server
- Server cleanly terminated after test completes
- Each server type tested sequentially (one at a time)
-
Process Isolation (✅ No GIL Interference):
- Server Process: Runs independently from client
- Client Process: Runs 128+ concurrent threads without GIL from server
- Benefits: True parallelism, accurate metrics, production-like environment
-
Server Threading Models:
- RPyC ThreadedServer: New thread per client connection (in server process)
- RPyC ForkingServer: Fork new process per client (from server process)
- HTTP ThreadedServer: Flask handles requests in threads (in server process)
-
Client Concurrency:
- Default: 128 concurrent connections from single client process
- Each connection runs in own thread within client process
- Configurable: Test with any number of concurrent clients
- Per-connection metrics tracking available
Measures time to establish a connection to the server.
Measures request/response round-trip time with statistics:
- Mean, median, min, max
- Standard deviation
- 95th and 99th percentiles
Measures data transfer rates for various payload sizes (1KB - 1MB):
- Upload bandwidth
- Download bandwidth
Measures large file transfer performance:
- Default file sizes: 1.5MB, 128MB, 500MB
- Single chunk size per run (default: 64KB)
- Tests both chunked and non-chunked transfers
- Run multiple times with different chunk sizes to compare
Measures performance with multiple simultaneous clients:
- Connection establishment under load
- Request throughput
- Success rate
- Resource usage
================================================================================
BENCHMARK RESULTS SUMMARY
================================================================================
RPYC_THREADED
----------------------------------------
Connection Time: 1.23ms (±0.45ms)
Latency Mean: 2.34ms (±0.67ms)
Latency Median: 2.10ms
Latency P95: 3.56ms
Latency P99: 4.23ms
Upload Bandwidth: 45.67 MB/s
Download Bandwidth: 67.89 MB/s
Success Rate: 100.00%
{
"rpyc_threaded": {
"name": "RPyC Latency (threaded)",
"protocol": "rpyc",
"server_mode": "threaded",
"latency": {
"mean": 0.00234,
"median": 0.00210,
"p95": 0.00356,
"p99": 0.00423
}
}
}rpycbench/
├── core/
│ ├── benchmark.py # Benchmark classes and context managers
│ └── metrics.py # Metrics collection and statistics
├── servers/
│ ├── rpyc_servers.py # RPyC server implementations
│ └── http_servers.py # HTTP/REST server implementations
├── benchmarks/
│ └── suite.py # Complete benchmark suite
├── runners/
│ └── autonomous.py # CLI entry point
└── utils/
├── telemetry.py # RPyC telemetry tracking
├── profiler.py # Connection profiling
└── visualizer.py # Telemetry visualization
- Python >= 3.8
- rpyc >= 5.3.0
- requests >= 2.31.0
- flask >= 3.0.0
- numpy >= 1.24.0
- pandas >= 2.0.0
- matplotlib >= 3.7.0
- psutil >= 5.9.0
# Install with dev dependencies
pip install -e ".[dev]"
# Run all tests
pytest rpycbench/tests/
# Run with coverage
pytest rpycbench/tests/ --cov=rpycbench --cov-report=htmlContributions are welcome! Please feel free to submit pull requests.
See LICENSE file for details.