LoginChecker: Billion-Scale Membership Testing

A comprehensive implementation and comparison of membership testing algorithms for billion-scale datasets. This project implements and benchmarks five different approaches: Linear Search, Binary Search, Disk-based Hash Table, Bloom Filter, and Cuckoo Filter.

Features

Billion-scale capability with memory-mapped storage
Five membership testing algorithms: Linear Search, Binary Search, Hash Table, Bloom Filter, Cuckoo Filter
Performance benchmarking with detailed metrics
Memory-efficient persistent storage
Comprehensive test suite

Algorithm Complexity Summary

Algorithm	Time Complexity	Space Complexity	Scale Limit	Key Features
Linear Search	O(n)	O(1)	100K	Simple, no preprocessing
Binary Search	O(log n)	O(n)	100M	Requires sorted data
Hash Table	O(1) avg, O(n) worst	O(n)	1B+	Memory-mapped, persistent
Bloom Filter	O(k)	O(m)	1B+	Probabilistic, no false negatives
Cuckoo Filter	O(1) expected	O(n)	1B+	Supports deletions, exact FPR

Installation

Prerequisites

Python 3.12+
Git

Setup

git clone https://github.com/FahdSeddik/LoginChecker.git
cd LoginChecker
uv sync
python data/generate.py

Usage

Quick Start

from data.reader import UsernameReader
from src.algorithms.linear_search import LinearSearch
from src.data_structures.bloom_filter import BloomFilter

# Load username data
reader = UsernameReader("data/usernames.dat")

# Linear search example
linear_search = LinearSearch(reader)
result = linear_search.search("john_doe")
print(f"Found: {result.found}, Comparisons: {result.comparisons}")

# Bloom filter example
bloom = BloomFilter(capacity=1_000_000, false_positive_rate=0.01)
bloom.add("john_doe")
print(f"Contains 'john_doe': {bloom.contains('john_doe')}")

Run Benchmarks

# Quick algorithm comparison
python src/benchmark/simple_runner.py --algorithms linear_search,bloom_filter --sizes 1000,10000,100000

# Full benchmark suite (not recommended)
python src/benchmark/run_billion_benchmark.py

Run Tests

# All tests
pytest --run-slow

Project Structure

├── src/
│   ├── algorithms/          # Search algorithms
│   │   ├── algorithm.py
│   │   ├── binary_search.py
│   │   └── linear_search.py
│   ├── benchmark/           # Benchmarking tools
│   │   ├── benchmark.py
│   │   ├── run_billion_benchmark.py
│   │   └── simple_runner.py
│   └── data_structures/     # Data structures
│       ├── bloom_filter.py
│       ├── cuckoo_filter.py
│       ├── disk_hashset.py
│       └── sorted_list.py
├── data/                    # Data generation and access
│   ├── generate.py
│   ├── reader.py
│   └── writer.py
├── tests/                   # Test suite
├── report/                  # Academic report
└── pyproject.toml

Name	Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode	.vscode
data	data
report	report
src	src
tests	tests
.gitignore	.gitignore
.python-version	.python-version
COSC 520 -Assignment 1-Sep 2025.pdf	COSC 520 -Assignment 1-Sep 2025.pdf
README.md	README.md
pyproject.toml	pyproject.toml
uv.lock	uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LoginChecker: Billion-Scale Membership Testing

Features

Algorithm Complexity Summary

Installation

Prerequisites

Setup

Usage

Quick Start

Run Benchmarks

Run Tests

Project Structure

About

Uh oh!

Releases

Packages

Languages

Search code, repositories, users, issues, pull requests...

FahdSeddik/LoginChecker

Folders and files

Latest commit

History

Repository files navigation

LoginChecker: Billion-Scale Membership Testing

Features

Algorithm Complexity Summary

Installation

Prerequisites

Setup

Usage

Quick Start

Run Benchmarks

Run Tests

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages