- Added author name and email to README.md - Added author information to QUICKSORT_COMPARISON.md
Randomized Quicksort & Hash Table with Chaining - Algorithm Efficiency and Scalability
Overview
This project implements two fundamental algorithms and data structures demonstrating algorithm efficiency and scalability:
- Randomized Quicksort Algorithm - An efficient sorting algorithm with average O(n log n) time complexity
- Hash Table with Chaining - A hash table implementation using chaining for collision resolution
Both implementations provide comprehensive test suites, performance analysis utilities, and detailed documentation for educational purposes.
Key Features
- ✅ Randomized Quicksort: Efficient sorting with randomized pivot selection to avoid worst-case performance
- ✅ Performance Analysis: Built-in utilities for comparing and analyzing algorithm performance
- ✅ Hash Table with Chaining: Complete hash table implementation with dynamic resizing
- ✅ Comprehensive Test Suite: Extensive test coverage including edge cases, stress tests, and performance benchmarks
- ✅ Well-Documented Code: Clear comments, docstrings, and educational examples
- ✅ Production-Ready: Robust error handling and comprehensive test coverage
Architecture
Randomized Quicksort Algorithm Flow
Input Array: [64, 34, 25, 12, 22, 11, 90, 5]
↓
┌─────────────────────────────────────┐
│ Randomized Quicksort Process │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ Step 1: Randomly select pivot │
│ Pivot: 25 (randomly selected) │
│ Partition: [12, 22, 11, 5] | 25 | [64, 34, 90] │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 2: Recursively sort left │
│ Array: [12, 22, 11, 5] │
│ Pivot: 11 → [5, 11] | [12, 22] │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 3: Recursively sort right │
│ Array: [64, 34, 90] │
│ Pivot: 64 → [34, 64] | [90] │
└─────────────────────────────────────┘
↓
Output Array: [5, 11, 12, 22, 25, 34, 64, 90]
Hash Table with Chaining Structure
Hash Table (size=8)
┌─────────────────────────────────────────┐
│ Bucket 0: [Key: 8, Value: "eight"] │
│ [Key: 16, Value: "sixteen"] │
│ Bucket 1: [Key: 9, Value: "nine"] │
│ Bucket 2: [Key: 10, Value: "ten"] │
│ [Key: 18, Value: "eighteen"] │
│ Bucket 3: [Key: 11, Value: "eleven"] │
│ Bucket 4: [Key: 12, Value: "twelve"] │
│ Bucket 5: [Key: 13, Value: "thirteen"] │
│ Bucket 6: [Key: 14, Value: "fourteen"] │
│ Bucket 7: [Key: 15, Value: "fifteen"] │
└─────────────────────────────────────────┘
↓
Collision Resolution via Chaining
(Multiple keys hash to same bucket)
Core Algorithm Structure
Randomized Quicksort
┌─────────────────────────────────────────────────────────────┐
│ Randomized Quicksort │
├─────────────────────────────────────────────────────────────┤
│ Function: randomized_quicksort(arr) │
│ Input: Array of comparable elements │
│ Output: Array sorted in ascending order │
├─────────────────────────────────────────────────────────────┤
│ Algorithm Steps: │
│ 1. If array has ≤ 1 element, return │
│ 2. Randomly select pivot element │
│ 3. Partition array around pivot │
│ 4. Recursively sort left subarray │
│ 5. Recursively sort right subarray │
│ 6. Combine results │
└─────────────────────────────────────────────────────────────┘
Hash Table with Chaining
┌─────────────────────────────────────────────────────────────┐
│ Hash Table with Chaining │
├─────────────────────────────────────────────────────────────┤
│ Class: HashTable │
│ Operations: insert, get, delete, contains │
├─────────────────────────────────────────────────────────────┤
│ Key Operations: │
│ 1. Hash function: h(k) = floor(m × (k × A mod 1)) │
│ 2. Collision resolution: Chaining (linked lists) │
│ 3. Load factor management: Resize when threshold exceeded │
│ 4. Dynamic resizing: Double size when load > 0.75 │
└─────────────────────────────────────────────────────────────┘
Implementation Details
Part 1: Randomized Quicksort
Core Functions
1. randomized_quicksort(arr)
- Purpose: Sort array using randomized quicksort algorithm
- Parameters:
arr(list) - Input array to be sorted - Returns:
list- New array sorted in ascending order - Space Complexity: O(n) - Creates a copy of the input array
- Time Complexity:
- Average: O(n log n)
- Worst: O(n²) - rarely occurs due to randomization
- Best: O(n log n)
2. randomized_partition(arr, low, high)
- Purpose: Partition array using a randomly selected pivot
- Parameters:
arr(list) - Array to partitionlow(int) - Starting indexhigh(int) - Ending index
- Returns:
int- Final position of pivot element - Key Feature: Random pivot selection prevents worst-case O(n²) performance
3. compare_with_builtin(arr)
- Purpose: Compare randomized quicksort with Python's built-in sort
- Returns: Dictionary with timing metrics and correctness verification
4. analyze_performance(array_sizes)
- Purpose: Analyze quicksort performance across different array sizes
- Returns: List of performance metrics for each array size
5. deterministic_quicksort(arr)
- Purpose: Sort array using deterministic quicksort (first element as pivot)
- Parameters:
arr(list) - Input array to be sorted - Returns:
list- New array sorted in ascending order - Time Complexity:
- Average: O(n log n)
- Worst: O(n²) - occurs on sorted/reverse-sorted arrays
- Note: Included for empirical comparison with randomized version
Algorithm Logic
Why Randomization?
Standard quicksort can degrade to O(n²) when:
- Pivot is always the smallest element (worst case)
- Pivot is always the largest element (worst case)
- Array is already sorted or reverse sorted
Randomization ensures:
- Expected O(n log n) performance
- Expected number of comparisons: 2n ln n ≈ 1.39n log₂ n
- Very low probability of worst-case behavior
Part 2: Hash Table with Chaining
Core Operations
1. insert(key, value)
- Purpose: Insert or update a key-value pair
- Time Complexity: O(1) average case, O(n) worst case
- Features:
- Automatically updates if key exists
- Triggers resize when load factor exceeds threshold
2. get(key)
- Purpose: Retrieve value associated with a key
- Time Complexity: O(1) average case, O(n) worst case
- Returns: Value if key exists, None otherwise
3. delete(key)
- Purpose: Remove a key-value pair
- Time Complexity: O(1) average case, O(n) worst case
- Returns: True if key was found and deleted, False otherwise
4. contains(key)
- Purpose: Check if a key exists in the hash table
- Time Complexity: O(1) average case, O(n) worst case
- Pythonic: Supports
inoperator
Hash Function
Multiplication Method:
h(k) = floor(m × (k × A mod 1))
where:
m= table sizeA≈ (√5 - 1) / 2 ≈ 0.618 (golden ratio)- Provides good distribution of keys across buckets
Collision Resolution
Chaining Strategy:
- Each bucket contains a linked list of key-value pairs
- When collision occurs, new element is appended to chain
- Allows multiple elements per bucket
- No clustering issues unlike open addressing
Dynamic Resizing
Load Factor Management:
- Default threshold: 0.75
- When load factor exceeds threshold, table size doubles
- All elements are rehashed into new table
- Maintains O(1) average performance
Complexity Analysis
Randomized Quicksort
| Aspect | Complexity | Description |
|---|---|---|
| Time Complexity | O(n log n) | Average case - randomized pivot selection |
| Worst Case | O(n²) | Rarely occurs due to randomization |
| Best Case | O(n log n) | Already sorted arrays |
| Space Complexity | O(log n) | Average case recursion stack depth |
| Stability | Not Stable | Equal elements may change relative order |
Hash Table with Chaining
| Aspect | Complexity | Description |
|---|---|---|
| Time Complexity | O(1) | Average case for insert, get, delete |
| Worst Case | O(n) | All keys hash to same bucket (rare) |
| Space Complexity | O(n + m) | n elements + m buckets |
| Load Factor | 0.75 | Threshold for automatic resizing |
Usage Examples
Basic Usage - Randomized Quicksort
from src.quicksort import randomized_quicksort, compare_with_builtin
# Example 1: Basic sorting
arr = [64, 34, 25, 12, 22, 11, 90, 5]
sorted_arr = randomized_quicksort(arr)
print(sorted_arr) # Output: [5, 11, 12, 22, 25, 34, 64, 90]
# Example 2: Performance comparison
comparison = compare_with_builtin(arr)
print(f"Quicksort time: {comparison['quicksort_time']:.6f} seconds")
print(f"Built-in sort time: {comparison['builtin_time']:.6f} seconds")
print(f"Speedup ratio: {comparison['speedup']:.2f}x")
print(f"Results match: {comparison['is_correct']}")
Basic Usage - Hash Table
from src.hash_table import HashTable
# Create hash table
ht = HashTable(initial_size=16)
# Insert key-value pairs
ht.insert(1, "apple")
ht.insert(2, "banana")
ht.insert(3, "cherry")
# Retrieve values
print(ht.get(1)) # "apple"
# Check if key exists
print(2 in ht) # True
# Delete a key
ht.delete(2)
# Get all items
items = ht.get_all_items()
print(items) # [(1, "apple"), (3, "cherry")]
Edge Cases Handled
Quicksort
# Empty array
empty_arr = []
result = randomized_quicksort(empty_arr)
print(result) # Output: []
# Single element
single = [42]
result = randomized_quicksort(single)
print(result) # Output: [42]
# Duplicate elements
duplicates = [3, 3, 3, 3]
result = randomized_quicksort(duplicates)
print(result) # Output: [3, 3, 3, 3]
# Negative numbers
negatives = [-5, -2, -8, 1, 3, -1, 0]
result = randomized_quicksort(negatives)
print(result) # Output: [-8, -5, -2, -1, 0, 1, 3]
Hash Table
# Empty hash table
ht = HashTable()
print(len(ht)) # 0
print(ht.get(1)) # None
# Collision handling
ht = HashTable(initial_size=5)
ht.insert(1, "one")
ht.insert(6, "six") # May collide with 1
ht.insert(11, "eleven") # May collide with 1 and 6
# All keys are stored correctly via chaining
# Load factor management
ht = HashTable(initial_size=4, load_factor_threshold=0.75)
ht.insert(1, "a")
ht.insert(2, "b")
ht.insert(3, "c")
ht.insert(4, "d") # Triggers resize (load factor = 1.0 > 0.75)
print(ht.size) # 8 (doubled)
Running the Program
Prerequisites
- Python 3.7 or higher
- No external dependencies required (uses only Python standard library)
Execution
Run Examples
python3 -m src.examples
Run Tests
Quick Tests (Essential functionality):
python3 run_tests.py --quick
Full Test Suite:
python3 run_tests.py
Unit Tests Only:
python3 run_tests.py --unit-only
Performance Benchmarks:
python3 run_tests.py --benchmark
Stress Tests:
python3 run_tests.py --stress
Negative Test Cases:
python3 run_tests.py --negative
Using unittest directly:
python3 -m unittest discover tests -v
Run Empirical Comparison
Generate Comparison Plots:
python3 -m src.generate_plots
Run Comparison Analysis:
python3 -m src.quicksort_comparison
Both commands will generate detailed performance data and visualizations comparing Randomized vs Deterministic Quicksort.
Test Cases
Randomized Quicksort Tests
The test suite includes comprehensive test cases covering:
✅ Functional Tests
- Basic sorting functionality
- Already sorted arrays (ascending/descending)
- Empty arrays and single elements
- Duplicate elements
- Negative numbers and zero values
- Large arrays (1000+ elements)
✅ Behavioral Tests
- Non-destructive sorting (original array unchanged)
- Correctness verification against built-in sort
- Partition function correctness
✅ Performance Tests
- Comparison with built-in sort
- Performance analysis across different array sizes
- Timing measurements
Deterministic Quicksort Tests
The test suite includes comprehensive test cases covering:
✅ Functional Tests
- All same scenarios as randomized quicksort
- Worst-case performance on sorted/reverse-sorted arrays
- Correctness verification
✅ Comparison Tests
- Direct comparison between randomized and deterministic quicksort
- Verification that both produce identical results
- Performance consistency tests
✅ Edge Cases
- Zero elements, single element, two elements
- All zeros, mixed positive/negative numbers
- Large value ranges
- Worst-case scenarios for deterministic quicksort
Hash Table Tests
The test suite includes comprehensive test cases covering:
✅ Functional Tests
- Basic insert, get, delete operations
- Empty hash table operations
- Collision handling
- Load factor calculation
- Dynamic resizing
✅ Behavioral Tests
- Key existence checking (
inoperator) - Update existing keys
- Delete from chains (middle of chain)
- Get all items
✅ Edge Cases
- Empty hash table
- Single element
- All keys hash to same bucket
- Load factor threshold triggering resize
Empirical Comparison Study
Randomized vs Deterministic Quicksort
This project includes a comprehensive empirical comparison study comparing Randomized Quicksort with Deterministic Quicksort (using first element as pivot) across different input sizes and distributions.
Documentation: See QUICKSORT_COMPARISON.md for detailed analysis and results.
Visualizations: Three comprehensive plots are included:
quicksort_comparison_plots.png- Overview comparison across all distributionsquicksort_comparison_detailed.png- Detailed views for each distribution typequicksort_speedup_comparison.png- Speedup ratios visualization
Key Findings:
- Random Arrays: Both algorithms perform similarly (~10-15% difference)
- Sorted Arrays: Deterministic degrades to O(n²); Randomized maintains O(n log n) - up to 475x speedup
- Reverse-Sorted Arrays: Even worse degradation for deterministic - up to 857x speedup for randomized
- Repeated Elements: Similar performance for both algorithms
Running the Comparison:
# Generate plots and detailed comparison
python3 -m src.generate_plots
python3 -m src.quicksort_comparison
Project Structure
MSCS532_Assignment3/
├── src/
│ ├── __init__.py # Package initialization
│ ├── quicksort.py # Randomized & Deterministic Quicksort implementations
│ ├── quicksort_comparison.py # Empirical comparison script
│ ├── generate_plots.py # Plot generation script
│ ├── hash_table.py # Hash Table with Chaining implementation
│ └── examples.py # Example usage demonstrations
├── tests/
│ ├── __init__.py # Test package initialization
│ ├── test_quicksort.py # Comprehensive quicksort tests
│ └── test_hash_table.py # Comprehensive hash table tests
├── run_tests.py # Test runner with various options
├── README.md # This documentation
├── QUICKSORT_COMPARISON.md # Empirical comparison documentation
├── quicksort_comparison_plots.png # Overview comparison plots
├── quicksort_comparison_detailed.png # Detailed distribution plots
├── quicksort_speedup_comparison.png # Speedup ratio plots
├── LICENSE # MIT License
├── .gitignore # Git ignore file
└── requirements.txt # Python dependencies (none required)
Testing
Test Coverage
The project includes 41+ comprehensive test cases covering:
✅ Functional Tests
- Basic functionality for both algorithms
- Edge cases (empty, single element, duplicates)
- Correctness verification
✅ Behavioral Tests
- Non-destructive operations
- In-place modifications
- Collision resolution
- Dynamic resizing
✅ Performance Tests
- Timing comparisons
- Performance analysis across different sizes
- Benchmarking utilities
✅ Stress Tests
- Large arrays (1000+ elements)
- Many hash table operations
- Boundary conditions
✅ Negative Test Cases
- Invalid input types
- Edge cases and boundary conditions
- Error handling
Running Tests
The project includes a comprehensive test runner (run_tests.py) with multiple options:
- Quick Tests: Essential functionality tests
- Full Suite: All tests including edge cases
- Unit Tests: Standard unittest tests only
- Benchmarks: Performance comparison tests
- Stress Tests: Large-scale and boundary tests
- Negative Tests: Invalid input and error handling tests
Educational Value
This implementation serves as an excellent learning resource for:
- Algorithm Understanding: Clear demonstration of quicksort and hash table mechanics
- Randomization Techniques: Shows how randomization improves algorithm performance
- Data Structure Design: Demonstrates hash table implementation with collision resolution
- Code Quality: Demonstrates good practices in Python programming
- Testing: Comprehensive test suite showing edge case handling
- Documentation: Well-commented code with clear explanations
- Performance Analysis: Tools for understanding algorithm efficiency
Algorithm Analysis
Randomized Quicksort
Why Randomization?
- Standard quicksort can degrade to O(n²) when the pivot is always the smallest or largest element
- Randomization ensures expected O(n log n) performance
- Expected number of comparisons: 2n ln n ≈ 1.39n log₂ n
Performance Characteristics:
- Excellent average-case performance
- Non-destructive sorting (creates copy)
- Cache-friendly due to good locality of reference
Comparison with Other Algorithms:
- Faster than O(n²) algorithms (bubble, insertion, selection sort)
- Comparable to merge sort but with better space efficiency
- Generally slower than Python's built-in Timsort (optimized hybrid)
Empirical Comparison Results
Randomized vs Deterministic Quicksort:
The project includes comprehensive empirical analysis comparing Randomized Quicksort with Deterministic Quicksort (first element as pivot). Results demonstrate:
- On Random Arrays: Deterministic is ~10-15% faster (minimal overhead from randomization)
- On Sorted Arrays: Randomized is up to 475x faster (deterministic shows O(n²) worst-case)
- On Reverse-Sorted Arrays: Randomized is up to 857x faster (even worse degradation for deterministic)
- On Repeated Elements: Both perform similarly (~5% difference)
Visual Evidence: The included plots (quicksort_comparison_*.png) clearly show:
- Exponential degradation curves for deterministic quicksort on worst-case inputs
- Consistent O(n log n) performance for randomized quicksort across all distributions
- Minimal overhead of randomization on random inputs
See QUICKSORT_COMPARISON.md for detailed analysis, tables, and conclusions.
Hash Table with Chaining
Chaining vs. Open Addressing:
- Chaining stores multiple elements in the same bucket using linked lists
- Handles collisions gracefully without clustering
- Load factor threshold prevents performance degradation
Hash Function:
- Uses multiplication method: h(k) = floor(m × (k × A mod 1))
- A ≈ (√5 - 1) / 2 ≈ 0.618 (golden ratio)
- Provides good distribution of keys across buckets
Performance Considerations:
- O(1) average case performance
- Dynamic resizing maintains efficiency
- Trade-off between space and time efficiency
Performance Considerations
-
Quicksort:
- Best for general-purpose sorting
- Randomization prevents worst-case scenarios
- Good for medium to large arrays
-
Hash Table:
- Maintains O(1) average performance through load factor management
- Resizing doubles table size when threshold is exceeded
- Trade-off between space and time efficiency
Contributing
This is an educational project demonstrating algorithm implementations. Feel free to:
- Add more test cases
- Implement additional algorithms
- Improve documentation
- Optimize the implementations
- Add visualization tools
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
Carlos Gutierrez
Email: cgutierrez44833@ucumberlands.edu
Created for MSCS532 Assignment 3: Understanding Algorithm Efficiency and Scalability
Acknowledgments
- Based on standard algorithm implementations from Introduction to Algorithms (CLRS)
- Educational project for algorithm analysis and data structures course