Files
MSCS532_Assignment3/README.md
Carlos Gutierrez a7fe11fd74 Initial commit: Randomized Quicksort and Hash Table with Chaining implementation
- Implemented Randomized Quicksort algorithm with performance analysis
- Implemented Hash Table with Chaining for collision resolution
- Added comprehensive test suite (30+ test cases)
- Created test runner script with multiple test options
- Added detailed README with architecture diagrams and documentation
- Added MIT License
- Includes examples and comprehensive documentation
2025-11-04 21:35:02 -05:00

20 KiB
Raw Blame History

Randomized Quicksort & Hash Table with Chaining - Algorithm Efficiency and Scalability

Overview

This project implements two fundamental algorithms and data structures demonstrating algorithm efficiency and scalability:

  1. Randomized Quicksort Algorithm - An efficient sorting algorithm with average O(n log n) time complexity
  2. Hash Table with Chaining - A hash table implementation using chaining for collision resolution

Both implementations provide comprehensive test suites, performance analysis utilities, and detailed documentation for educational purposes.

Key Features

  • Randomized Quicksort: Efficient sorting with randomized pivot selection to avoid worst-case performance
  • Performance Analysis: Built-in utilities for comparing and analyzing algorithm performance
  • Hash Table with Chaining: Complete hash table implementation with dynamic resizing
  • Comprehensive Test Suite: Extensive test coverage including edge cases, stress tests, and performance benchmarks
  • Well-Documented Code: Clear comments, docstrings, and educational examples
  • Production-Ready: Robust error handling and comprehensive test coverage

Architecture

Randomized Quicksort Algorithm Flow

Input Array: [64, 34, 25, 12, 22, 11, 90, 5]
             ↓
    ┌─────────────────────────────────────┐
    │   Randomized Quicksort Process      │
    └─────────────────────────────────────┘
             ↓
    ┌─────────────────────────────────────────────────┐
    │  Step 1: Randomly select pivot                  │
    │  Pivot: 25 (randomly selected)                  │
    │  Partition: [12, 22, 11, 5] | 25 | [64, 34, 90] │
    └─────────────────────────────────────────────────┘
             ↓
    ┌─────────────────────────────────────┐
    │  Step 2: Recursively sort left      │
    │  Array: [12, 22, 11, 5]             │
    │  Pivot: 11 → [5, 11] | [12, 22]     │
    └─────────────────────────────────────┘
             ↓
    ┌─────────────────────────────────────┐
    │  Step 3: Recursively sort right     │
    │  Array: [64, 34, 90]                │
    │  Pivot: 64 → [34, 64] | [90]        │
    └─────────────────────────────────────┘
             ↓
Output Array: [5, 11, 12, 22, 25, 34, 64, 90]

Hash Table with Chaining Structure

Hash Table (size=8)
┌─────────────────────────────────────────┐
│ Bucket 0: [Key: 8, Value: "eight"]      │
│           [Key: 16, Value: "sixteen"]   │
│ Bucket 1: [Key: 9, Value: "nine"]       │
│ Bucket 2: [Key: 10, Value: "ten"]       │
│           [Key: 18, Value: "eighteen"]  │
│ Bucket 3: [Key: 11, Value: "eleven"]    │
│ Bucket 4: [Key: 12, Value: "twelve"]    │
│ Bucket 5: [Key: 13, Value: "thirteen"]  │
│ Bucket 6: [Key: 14, Value: "fourteen"]  │
│ Bucket 7: [Key: 15, Value: "fifteen"]   │
└─────────────────────────────────────────┘
         ↓
    Collision Resolution via Chaining
    (Multiple keys hash to same bucket)

Core Algorithm Structure

Randomized Quicksort

┌─────────────────────────────────────────────────────────────┐
│              Randomized Quicksort                           │
├─────────────────────────────────────────────────────────────┤
│  Function: randomized_quicksort(arr)                        │
│  Input: Array of comparable elements                        │
│  Output: Array sorted in ascending order                    │
├─────────────────────────────────────────────────────────────┤
│  Algorithm Steps:                                           │
│  1. If array has ≤ 1 element, return                        │
│  2. Randomly select pivot element                           │
│  3. Partition array around pivot                            │
│  4. Recursively sort left subarray                          │
│  5. Recursively sort right subarray                         │
│  6. Combine results                                         │
└─────────────────────────────────────────────────────────────┘

Hash Table with Chaining

┌─────────────────────────────────────────────────────────────┐
│              Hash Table with Chaining                       │
├─────────────────────────────────────────────────────────────┤
│  Class: HashTable                                           │
│  Operations: insert, get, delete, contains                  │
├─────────────────────────────────────────────────────────────┤
│  Key Operations:                                            │
│  1. Hash function: h(k) = floor(m × (k × A mod 1))          │
│  2. Collision resolution: Chaining (linked lists)           │
│  3. Load factor management: Resize when threshold exceeded  │
│  4. Dynamic resizing: Double size when load > 0.75          │
└─────────────────────────────────────────────────────────────┘

Implementation Details

Part 1: Randomized Quicksort

Core Functions

1. randomized_quicksort(arr)
  • Purpose: Sort array using randomized quicksort algorithm
  • Parameters: arr (list) - Input array to be sorted
  • Returns: list - New array sorted in ascending order
  • Space Complexity: O(n) - Creates a copy of the input array
  • Time Complexity:
    • Average: O(n log n)
    • Worst: O(n²) - rarely occurs due to randomization
    • Best: O(n log n)
2. randomized_partition(arr, low, high)
  • Purpose: Partition array using a randomly selected pivot
  • Parameters:
    • arr (list) - Array to partition
    • low (int) - Starting index
    • high (int) - Ending index
  • Returns: int - Final position of pivot element
  • Key Feature: Random pivot selection prevents worst-case O(n²) performance
3. compare_with_builtin(arr)
  • Purpose: Compare randomized quicksort with Python's built-in sort
  • Returns: Dictionary with timing metrics and correctness verification
4. analyze_performance(array_sizes)
  • Purpose: Analyze quicksort performance across different array sizes
  • Returns: List of performance metrics for each array size

Algorithm Logic

Why Randomization?

Standard quicksort can degrade to O(n²) when:

  • Pivot is always the smallest element (worst case)
  • Pivot is always the largest element (worst case)
  • Array is already sorted or reverse sorted

Randomization ensures:

  • Expected O(n log n) performance
  • Expected number of comparisons: 2n ln n ≈ 1.39n log₂ n
  • Very low probability of worst-case behavior

Part 2: Hash Table with Chaining

Core Operations

1. insert(key, value)
  • Purpose: Insert or update a key-value pair
  • Time Complexity: O(1) average case, O(n) worst case
  • Features:
    • Automatically updates if key exists
    • Triggers resize when load factor exceeds threshold
2. get(key)
  • Purpose: Retrieve value associated with a key
  • Time Complexity: O(1) average case, O(n) worst case
  • Returns: Value if key exists, None otherwise
3. delete(key)
  • Purpose: Remove a key-value pair
  • Time Complexity: O(1) average case, O(n) worst case
  • Returns: True if key was found and deleted, False otherwise
4. contains(key)
  • Purpose: Check if a key exists in the hash table
  • Time Complexity: O(1) average case, O(n) worst case
  • Pythonic: Supports in operator

Hash Function

Multiplication Method:

h(k) = floor(m × (k × A mod 1))

where:

  • m = table size
  • A ≈ (√5 - 1) / 2 ≈ 0.618 (golden ratio)
  • Provides good distribution of keys across buckets

Collision Resolution

Chaining Strategy:

  • Each bucket contains a linked list of key-value pairs
  • When collision occurs, new element is appended to chain
  • Allows multiple elements per bucket
  • No clustering issues unlike open addressing

Dynamic Resizing

Load Factor Management:

  • Default threshold: 0.75
  • When load factor exceeds threshold, table size doubles
  • All elements are rehashed into new table
  • Maintains O(1) average performance

Complexity Analysis

Randomized Quicksort

Aspect Complexity Description
Time Complexity O(n log n) Average case - randomized pivot selection
Worst Case O(n²) Rarely occurs due to randomization
Best Case O(n log n) Already sorted arrays
Space Complexity O(log n) Average case recursion stack depth
Stability Not Stable Equal elements may change relative order

Hash Table with Chaining

Aspect Complexity Description
Time Complexity O(1) Average case for insert, get, delete
Worst Case O(n) All keys hash to same bucket (rare)
Space Complexity O(n + m) n elements + m buckets
Load Factor 0.75 Threshold for automatic resizing

Usage Examples

Basic Usage - Randomized Quicksort

from src.quicksort import randomized_quicksort, compare_with_builtin

# Example 1: Basic sorting
arr = [64, 34, 25, 12, 22, 11, 90, 5]
sorted_arr = randomized_quicksort(arr)
print(sorted_arr)  # Output: [5, 11, 12, 22, 25, 34, 64, 90]

# Example 2: Performance comparison
comparison = compare_with_builtin(arr)
print(f"Quicksort time: {comparison['quicksort_time']:.6f} seconds")
print(f"Built-in sort time: {comparison['builtin_time']:.6f} seconds")
print(f"Speedup ratio: {comparison['speedup']:.2f}x")
print(f"Results match: {comparison['is_correct']}")

Basic Usage - Hash Table

from src.hash_table import HashTable

# Create hash table
ht = HashTable(initial_size=16)

# Insert key-value pairs
ht.insert(1, "apple")
ht.insert(2, "banana")
ht.insert(3, "cherry")

# Retrieve values
print(ht.get(1))  # "apple"

# Check if key exists
print(2 in ht)  # True

# Delete a key
ht.delete(2)

# Get all items
items = ht.get_all_items()
print(items)  # [(1, "apple"), (3, "cherry")]

Edge Cases Handled

Quicksort

# Empty array
empty_arr = []
result = randomized_quicksort(empty_arr)
print(result)  # Output: []

# Single element
single = [42]
result = randomized_quicksort(single)
print(result)  # Output: [42]

# Duplicate elements
duplicates = [3, 3, 3, 3]
result = randomized_quicksort(duplicates)
print(result)  # Output: [3, 3, 3, 3]

# Negative numbers
negatives = [-5, -2, -8, 1, 3, -1, 0]
result = randomized_quicksort(negatives)
print(result)  # Output: [-8, -5, -2, -1, 0, 1, 3]

Hash Table

# Empty hash table
ht = HashTable()
print(len(ht))  # 0
print(ht.get(1))  # None

# Collision handling
ht = HashTable(initial_size=5)
ht.insert(1, "one")
ht.insert(6, "six")  # May collide with 1
ht.insert(11, "eleven")  # May collide with 1 and 6
# All keys are stored correctly via chaining

# Load factor management
ht = HashTable(initial_size=4, load_factor_threshold=0.75)
ht.insert(1, "a")
ht.insert(2, "b")
ht.insert(3, "c")
ht.insert(4, "d")  # Triggers resize (load factor = 1.0 > 0.75)
print(ht.size)  # 8 (doubled)

Running the Program

Prerequisites

  • Python 3.7 or higher
  • No external dependencies required (uses only Python standard library)

Execution

Run Examples

python3 -m src.examples

Run Tests

Quick Tests (Essential functionality):

python3 run_tests.py --quick

Full Test Suite:

python3 run_tests.py

Unit Tests Only:

python3 run_tests.py --unit-only

Performance Benchmarks:

python3 run_tests.py --benchmark

Stress Tests:

python3 run_tests.py --stress

Negative Test Cases:

python3 run_tests.py --negative

Using unittest directly:

python3 -m unittest discover tests -v

Test Cases

Randomized Quicksort Tests

The test suite includes comprehensive test cases covering:

Functional Tests

  • Basic sorting functionality
  • Already sorted arrays (ascending/descending)
  • Empty arrays and single elements
  • Duplicate elements
  • Negative numbers and zero values
  • Large arrays (1000+ elements)

Behavioral Tests

  • Non-destructive sorting (original array unchanged)
  • Correctness verification against built-in sort
  • Partition function correctness

Performance Tests

  • Comparison with built-in sort
  • Performance analysis across different array sizes
  • Timing measurements

Hash Table Tests

The test suite includes comprehensive test cases covering:

Functional Tests

  • Basic insert, get, delete operations
  • Empty hash table operations
  • Collision handling
  • Load factor calculation
  • Dynamic resizing

Behavioral Tests

  • Key existence checking (in operator)
  • Update existing keys
  • Delete from chains (middle of chain)
  • Get all items

Edge Cases

  • Empty hash table
  • Single element
  • All keys hash to same bucket
  • Load factor threshold triggering resize

Project Structure

MSCS532_Assignment3/
├── src/
│   ├── __init__.py                # Package initialization
│   ├── quicksort.py              # Randomized Quicksort implementation
│   ├── hash_table.py             # Hash Table with Chaining implementation
│   └── examples.py               # Example usage demonstrations
├── tests/
│   ├── __init__.py               # Test package initialization
│   ├── test_quicksort.py         # Comprehensive quicksort tests
│   └── test_hash_table.py        # Comprehensive hash table tests
├── run_tests.py                  # Test runner with various options
├── README.md                     # This documentation
├── LICENSE                       # MIT License
├── .gitignore                    # Git ignore file
└── requirements.txt              # Python dependencies (none required)

Testing

Test Coverage

The project includes 30+ comprehensive test cases covering:

Functional Tests

  • Basic functionality for both algorithms
  • Edge cases (empty, single element, duplicates)
  • Correctness verification

Behavioral Tests

  • Non-destructive operations
  • In-place modifications
  • Collision resolution
  • Dynamic resizing

Performance Tests

  • Timing comparisons
  • Performance analysis across different sizes
  • Benchmarking utilities

Stress Tests

  • Large arrays (1000+ elements)
  • Many hash table operations
  • Boundary conditions

Negative Test Cases

  • Invalid input types
  • Edge cases and boundary conditions
  • Error handling

Running Tests

The project includes a comprehensive test runner (run_tests.py) with multiple options:

  • Quick Tests: Essential functionality tests
  • Full Suite: All tests including edge cases
  • Unit Tests: Standard unittest tests only
  • Benchmarks: Performance comparison tests
  • Stress Tests: Large-scale and boundary tests
  • Negative Tests: Invalid input and error handling tests

Educational Value

This implementation serves as an excellent learning resource for:

  • Algorithm Understanding: Clear demonstration of quicksort and hash table mechanics
  • Randomization Techniques: Shows how randomization improves algorithm performance
  • Data Structure Design: Demonstrates hash table implementation with collision resolution
  • Code Quality: Demonstrates good practices in Python programming
  • Testing: Comprehensive test suite showing edge case handling
  • Documentation: Well-commented code with clear explanations
  • Performance Analysis: Tools for understanding algorithm efficiency

Algorithm Analysis

Randomized Quicksort

Why Randomization?

  • Standard quicksort can degrade to O(n²) when the pivot is always the smallest or largest element
  • Randomization ensures expected O(n log n) performance
  • Expected number of comparisons: 2n ln n ≈ 1.39n log₂ n

Performance Characteristics:

  • Excellent average-case performance
  • Non-destructive sorting (creates copy)
  • Cache-friendly due to good locality of reference

Comparison with Other Algorithms:

  • Faster than O(n²) algorithms (bubble, insertion, selection sort)
  • Comparable to merge sort but with better space efficiency
  • Generally slower than Python's built-in Timsort (optimized hybrid)

Hash Table with Chaining

Chaining vs. Open Addressing:

  • Chaining stores multiple elements in the same bucket using linked lists
  • Handles collisions gracefully without clustering
  • Load factor threshold prevents performance degradation

Hash Function:

  • Uses multiplication method: h(k) = floor(m × (k × A mod 1))
  • A ≈ (√5 - 1) / 2 ≈ 0.618 (golden ratio)
  • Provides good distribution of keys across buckets

Performance Considerations:

  • O(1) average case performance
  • Dynamic resizing maintains efficiency
  • Trade-off between space and time efficiency

Performance Considerations

  1. Quicksort:

    • Best for general-purpose sorting
    • Randomization prevents worst-case scenarios
    • Good for medium to large arrays
  2. Hash Table:

    • Maintains O(1) average performance through load factor management
    • Resizing doubles table size when threshold is exceeded
    • Trade-off between space and time efficiency

Contributing

This is an educational project demonstrating algorithm implementations. Feel free to:

  • Add more test cases
  • Implement additional algorithms
  • Improve documentation
  • Optimize the implementations
  • Add visualization tools

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Created for MSCS532 Assignment 3: Understanding Algorithm Efficiency and Scalability

Acknowledgments

  • Based on standard algorithm implementations from Introduction to Algorithms (CLRS)
  • Educational project for algorithm analysis and data structures course