Files

Carlos Gutierrez a7fe11fd74 Initial commit: Randomized Quicksort and Hash Table with Chaining implementation

- Implemented Randomized Quicksort algorithm with performance analysis
- Implemented Hash Table with Chaining for collision resolution
- Added comprehensive test suite (30+ test cases)
- Created test runner script with multiple test options
- Added detailed README with architecture diagrams and documentation
- Added MIT License
- Includes examples and comprehensive documentation

2025-11-04 21:35:02 -05:00

20 KiB

Raw Blame History

Randomized Quicksort & Hash Table with Chaining - Algorithm Efficiency and Scalability

Overview

This project implements two fundamental algorithms and data structures demonstrating algorithm efficiency and scalability:

Randomized Quicksort Algorithm - An efficient sorting algorithm with average O(n log n) time complexity
Hash Table with Chaining - A hash table implementation using chaining for collision resolution

Both implementations provide comprehensive test suites, performance analysis utilities, and detailed documentation for educational purposes.

Key Features

✅ Randomized Quicksort: Efficient sorting with randomized pivot selection to avoid worst-case performance
✅ Performance Analysis: Built-in utilities for comparing and analyzing algorithm performance
✅ Hash Table with Chaining: Complete hash table implementation with dynamic resizing
✅ Comprehensive Test Suite: Extensive test coverage including edge cases, stress tests, and performance benchmarks
✅ Well-Documented Code: Clear comments, docstrings, and educational examples
✅ Production-Ready: Robust error handling and comprehensive test coverage

Architecture

Randomized Quicksort Algorithm Flow

Input Array: [64, 34, 25, 12, 22, 11, 90, 5]
             ↓
    ┌─────────────────────────────────────┐
    │   Randomized Quicksort Process      │
    └─────────────────────────────────────┘
             ↓
    ┌─────────────────────────────────────────────────┐
    │  Step 1: Randomly select pivot                  │
    │  Pivot: 25 (randomly selected)                  │
    │  Partition: [12, 22, 11, 5] | 25 | [64, 34, 90] │
    └─────────────────────────────────────────────────┘
             ↓
    ┌─────────────────────────────────────┐
    │  Step 2: Recursively sort left      │
    │  Array: [12, 22, 11, 5]             │
    │  Pivot: 11 → [5, 11] | [12, 22]     │
    └─────────────────────────────────────┘
             ↓
    ┌─────────────────────────────────────┐
    │  Step 3: Recursively sort right     │
    │  Array: [64, 34, 90]                │
    │  Pivot: 64 → [34, 64] | [90]        │
    └─────────────────────────────────────┘
             ↓
Output Array: [5, 11, 12, 22, 25, 34, 64, 90]

Hash Table with Chaining Structure

Hash Table (size=8)
┌─────────────────────────────────────────┐
│ Bucket 0: [Key: 8, Value: "eight"]      │
│           [Key: 16, Value: "sixteen"]   │
│ Bucket 1: [Key: 9, Value: "nine"]       │
│ Bucket 2: [Key: 10, Value: "ten"]       │
│           [Key: 18, Value: "eighteen"]  │
│ Bucket 3: [Key: 11, Value: "eleven"]    │
│ Bucket 4: [Key: 12, Value: "twelve"]    │
│ Bucket 5: [Key: 13, Value: "thirteen"]  │
│ Bucket 6: [Key: 14, Value: "fourteen"]  │
│ Bucket 7: [Key: 15, Value: "fifteen"]   │
└─────────────────────────────────────────┘
         ↓
    Collision Resolution via Chaining
    (Multiple keys hash to same bucket)

Core Algorithm Structure

Randomized Quicksort

┌─────────────────────────────────────────────────────────────┐
│              Randomized Quicksort                           │
├─────────────────────────────────────────────────────────────┤
│  Function: randomized_quicksort(arr)                        │
│  Input: Array of comparable elements                        │
│  Output: Array sorted in ascending order                    │
├─────────────────────────────────────────────────────────────┤
│  Algorithm Steps:                                           │
│  1. If array has ≤ 1 element, return                        │
│  2. Randomly select pivot element                           │
│  3. Partition array around pivot                            │
│  4. Recursively sort left subarray                          │
│  5. Recursively sort right subarray                         │
│  6. Combine results                                         │
└─────────────────────────────────────────────────────────────┘

Hash Table with Chaining

┌─────────────────────────────────────────────────────────────┐
│              Hash Table with Chaining                       │
├─────────────────────────────────────────────────────────────┤
│  Class: HashTable                                           │
│  Operations: insert, get, delete, contains                  │
├─────────────────────────────────────────────────────────────┤
│  Key Operations:                                            │
│  1. Hash function: h(k) = floor(m × (k × A mod 1))          │
│  2. Collision resolution: Chaining (linked lists)           │
│  3. Load factor management: Resize when threshold exceeded  │
│  4. Dynamic resizing: Double size when load > 0.75          │
└─────────────────────────────────────────────────────────────┘

Implementation Details

Part 1: Randomized Quicksort

Core Functions

1. `randomized_quicksort(arr)`

Purpose: Sort array using randomized quicksort algorithm
Parameters: arr (list) - Input array to be sorted
Returns: list - New array sorted in ascending order
Space Complexity: O(n) - Creates a copy of the input array
Time Complexity:
- Average: O(n log n)
- Worst: O(n²) - rarely occurs due to randomization
- Best: O(n log n)

2. `randomized_partition(arr, low, high)`

Purpose: Partition array using a randomly selected pivot
Parameters:
- arr (list) - Array to partition
- low (int) - Starting index
- high (int) - Ending index
Returns: int - Final position of pivot element
Key Feature: Random pivot selection prevents worst-case O(n²) performance

3. `compare_with_builtin(arr)`

Purpose: Compare randomized quicksort with Python's built-in sort
Returns: Dictionary with timing metrics and correctness verification

4. `analyze_performance(array_sizes)`

Purpose: Analyze quicksort performance across different array sizes
Returns: List of performance metrics for each array size

Algorithm Logic

Why Randomization?

Standard quicksort can degrade to O(n²) when:

Pivot is always the smallest element (worst case)
Pivot is always the largest element (worst case)
Array is already sorted or reverse sorted

Randomization ensures:

Expected O(n log n) performance
Expected number of comparisons: 2n ln n ≈ 1.39n log₂ n
Very low probability of worst-case behavior

Part 2: Hash Table with Chaining

Core Operations

1. `insert(key, value)`

Purpose: Insert or update a key-value pair
Time Complexity: O(1) average case, O(n) worst case
Features:
- Automatically updates if key exists
- Triggers resize when load factor exceeds threshold

2. `get(key)`

Purpose: Retrieve value associated with a key
Time Complexity: O(1) average case, O(n) worst case
Returns: Value if key exists, None otherwise

3. `delete(key)`

Purpose: Remove a key-value pair
Time Complexity: O(1) average case, O(n) worst case
Returns: True if key was found and deleted, False otherwise

4. `contains(key)`

Purpose: Check if a key exists in the hash table
Time Complexity: O(1) average case, O(n) worst case
Pythonic: Supports in operator

Hash Function

Multiplication Method:

h(k) = floor(m × (k × A mod 1))

where:

m = table size
A ≈ (√5 - 1) / 2 ≈ 0.618 (golden ratio)
Provides good distribution of keys across buckets

Collision Resolution

Chaining Strategy:

Each bucket contains a linked list of key-value pairs
When collision occurs, new element is appended to chain
Allows multiple elements per bucket
No clustering issues unlike open addressing

Dynamic Resizing

Load Factor Management:

Default threshold: 0.75
When load factor exceeds threshold, table size doubles
All elements are rehashed into new table
Maintains O(1) average performance

Complexity Analysis

Randomized Quicksort

Aspect	Complexity	Description
Time Complexity	O(n log n)	Average case - randomized pivot selection
Worst Case	O(n²)	Rarely occurs due to randomization
Best Case	O(n log n)	Already sorted arrays
Space Complexity	O(log n)	Average case recursion stack depth
Stability	Not Stable	Equal elements may change relative order

Hash Table with Chaining

Aspect	Complexity	Description
Time Complexity	O(1)	Average case for insert, get, delete
Worst Case	O(n)	All keys hash to same bucket (rare)
Space Complexity	O(n + m)	n elements + m buckets
Load Factor	0.75	Threshold for automatic resizing

Usage Examples

Basic Usage - Randomized Quicksort

from src.quicksort import randomized_quicksort, compare_with_builtin

# Example 1: Basic sorting
arr = [64, 34, 25, 12, 22, 11, 90, 5]
sorted_arr = randomized_quicksort(arr)
print(sorted_arr)  # Output: [5, 11, 12, 22, 25, 34, 64, 90]

# Example 2: Performance comparison
comparison = compare_with_builtin(arr)
print(f"Quicksort time: {comparison['quicksort_time']:.6f} seconds")
print(f"Built-in sort time: {comparison['builtin_time']:.6f} seconds")
print(f"Speedup ratio: {comparison['speedup']:.2f}x")
print(f"Results match: {comparison['is_correct']}")

Basic Usage - Hash Table

from src.hash_table import HashTable

# Create hash table
ht = HashTable(initial_size=16)

# Insert key-value pairs
ht.insert(1, "apple")
ht.insert(2, "banana")
ht.insert(3, "cherry")

# Retrieve values
print(ht.get(1))  # "apple"

# Check if key exists
print(2 in ht)  # True

# Delete a key
ht.delete(2)

# Get all items
items = ht.get_all_items()
print(items)  # [(1, "apple"), (3, "cherry")]

Edge Cases Handled

Quicksort

# Empty array
empty_arr = []
result = randomized_quicksort(empty_arr)
print(result)  # Output: []

# Single element
single = [42]
result = randomized_quicksort(single)
print(result)  # Output: [42]

# Duplicate elements
duplicates = [3, 3, 3, 3]
result = randomized_quicksort(duplicates)
print(result)  # Output: [3, 3, 3, 3]

# Negative numbers
negatives = [-5, -2, -8, 1, 3, -1, 0]
result = randomized_quicksort(negatives)
print(result)  # Output: [-8, -5, -2, -1, 0, 1, 3]

Hash Table

# Empty hash table
ht = HashTable()
print(len(ht))  # 0
print(ht.get(1))  # None

# Collision handling
ht = HashTable(initial_size=5)
ht.insert(1, "one")
ht.insert(6, "six")  # May collide with 1
ht.insert(11, "eleven")  # May collide with 1 and 6
# All keys are stored correctly via chaining

# Load factor management
ht = HashTable(initial_size=4, load_factor_threshold=0.75)
ht.insert(1, "a")
ht.insert(2, "b")
ht.insert(3, "c")
ht.insert(4, "d")  # Triggers resize (load factor = 1.0 > 0.75)
print(ht.size)  # 8 (doubled)

Running the Program

Prerequisites

Python 3.7 or higher
No external dependencies required (uses only Python standard library)

Execution

Run Examples

python3 -m src.examples

Run Tests

Quick Tests (Essential functionality):

python3 run_tests.py --quick

Full Test Suite:

python3 run_tests.py

Unit Tests Only:

python3 run_tests.py --unit-only

Performance Benchmarks:

python3 run_tests.py --benchmark

Stress Tests:

python3 run_tests.py --stress

Negative Test Cases:

python3 run_tests.py --negative

Using unittest directly:

python3 -m unittest discover tests -v

Test Cases

Randomized Quicksort Tests

The test suite includes comprehensive test cases covering:

✅ Functional Tests

Basic sorting functionality
Already sorted arrays (ascending/descending)
Empty arrays and single elements
Duplicate elements
Negative numbers and zero values
Large arrays (1000+ elements)

✅ Behavioral Tests

Non-destructive sorting (original array unchanged)
Correctness verification against built-in sort
Partition function correctness

✅ Performance Tests

Comparison with built-in sort
Performance analysis across different array sizes
Timing measurements

Hash Table Tests

The test suite includes comprehensive test cases covering:

✅ Functional Tests

Basic insert, get, delete operations
Empty hash table operations
Collision handling
Load factor calculation
Dynamic resizing

✅ Behavioral Tests

Key existence checking (in operator)
Update existing keys
Delete from chains (middle of chain)
Get all items

✅ Edge Cases

Empty hash table
Single element
All keys hash to same bucket
Load factor threshold triggering resize

Project Structure

MSCS532_Assignment3/
├── src/
│   ├── __init__.py                # Package initialization
│   ├── quicksort.py              # Randomized Quicksort implementation
│   ├── hash_table.py             # Hash Table with Chaining implementation
│   └── examples.py               # Example usage demonstrations
├── tests/
│   ├── __init__.py               # Test package initialization
│   ├── test_quicksort.py         # Comprehensive quicksort tests
│   └── test_hash_table.py        # Comprehensive hash table tests
├── run_tests.py                  # Test runner with various options
├── README.md                     # This documentation
├── LICENSE                       # MIT License
├── .gitignore                    # Git ignore file
└── requirements.txt              # Python dependencies (none required)

Testing

Test Coverage

The project includes 30+ comprehensive test cases covering:

✅ Functional Tests

Basic functionality for both algorithms
Edge cases (empty, single element, duplicates)
Correctness verification

✅ Behavioral Tests

Non-destructive operations
In-place modifications
Collision resolution
Dynamic resizing

✅ Performance Tests

Timing comparisons
Performance analysis across different sizes
Benchmarking utilities

✅ Stress Tests

Large arrays (1000+ elements)
Many hash table operations
Boundary conditions

✅ Negative Test Cases

Invalid input types
Edge cases and boundary conditions
Error handling

Running Tests

The project includes a comprehensive test runner (run_tests.py) with multiple options:

Quick Tests: Essential functionality tests
Full Suite: All tests including edge cases
Unit Tests: Standard unittest tests only
Benchmarks: Performance comparison tests
Stress Tests: Large-scale and boundary tests
Negative Tests: Invalid input and error handling tests

Educational Value

This implementation serves as an excellent learning resource for:

Algorithm Understanding: Clear demonstration of quicksort and hash table mechanics
Randomization Techniques: Shows how randomization improves algorithm performance
Data Structure Design: Demonstrates hash table implementation with collision resolution
Code Quality: Demonstrates good practices in Python programming
Testing: Comprehensive test suite showing edge case handling
Documentation: Well-commented code with clear explanations
Performance Analysis: Tools for understanding algorithm efficiency

Algorithm Analysis

Randomized Quicksort

Why Randomization?

Standard quicksort can degrade to O(n²) when the pivot is always the smallest or largest element
Randomization ensures expected O(n log n) performance
Expected number of comparisons: 2n ln n ≈ 1.39n log₂ n

Performance Characteristics:

Excellent average-case performance
Non-destructive sorting (creates copy)
Cache-friendly due to good locality of reference

Comparison with Other Algorithms:

Faster than O(n²) algorithms (bubble, insertion, selection sort)
Comparable to merge sort but with better space efficiency
Generally slower than Python's built-in Timsort (optimized hybrid)

Hash Table with Chaining

Chaining vs. Open Addressing:

Chaining stores multiple elements in the same bucket using linked lists
Handles collisions gracefully without clustering
Load factor threshold prevents performance degradation

Hash Function:

Uses multiplication method: h(k) = floor(m × (k × A mod 1))
A ≈ (√5 - 1) / 2 ≈ 0.618 (golden ratio)
Provides good distribution of keys across buckets

Performance Considerations:

O(1) average case performance
Dynamic resizing maintains efficiency
Trade-off between space and time efficiency

Performance Considerations

Quicksort:
- Best for general-purpose sorting
- Randomization prevents worst-case scenarios
- Good for medium to large arrays
Hash Table:
- Maintains O(1) average performance through load factor management
- Resizing doubles table size when threshold is exceeded
- Trade-off between space and time efficiency

Contributing

This is an educational project demonstrating algorithm implementations. Feel free to:

Add more test cases
Implement additional algorithms
Improve documentation
Optimize the implementations
Add visualization tools

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Created for MSCS532 Assignment 3: Understanding Algorithm Efficiency and Scalability

Acknowledgments

Based on standard algorithm implementations from Introduction to Algorithms (CLRS)
Educational project for algorithm analysis and data structures course

20 KiB Raw Blame History Unescape Escape

Randomized Quicksort & Hash Table with Chaining - Algorithm Efficiency and Scalability

Overview

Key Features

Architecture

Randomized Quicksort Algorithm Flow

Hash Table with Chaining Structure

Core Algorithm Structure

Randomized Quicksort

Hash Table with Chaining

Implementation Details

Part 1: Randomized Quicksort

Core Functions

1. randomized_quicksort(arr)

2. randomized_partition(arr, low, high)

3. compare_with_builtin(arr)

4. analyze_performance(array_sizes)

Algorithm Logic

Part 2: Hash Table with Chaining

Core Operations

1. insert(key, value)

2. get(key)

3. delete(key)

4. contains(key)

Hash Function

Collision Resolution

Dynamic Resizing

Complexity Analysis

Randomized Quicksort

Hash Table with Chaining

Usage Examples

Basic Usage - Randomized Quicksort

Basic Usage - Hash Table

Edge Cases Handled

Quicksort

Hash Table

Running the Program

Prerequisites

Execution

Run Examples

Run Tests

Test Cases

Randomized Quicksort Tests

✅ Functional Tests

✅ Behavioral Tests

✅ Performance Tests

Hash Table Tests

✅ Functional Tests

✅ Behavioral Tests

✅ Edge Cases

Project Structure

Testing

Test Coverage

✅ Functional Tests

✅ Behavioral Tests

✅ Performance Tests

✅ Stress Tests

✅ Negative Test Cases

Running Tests

Educational Value

Algorithm Analysis

Randomized Quicksort

Hash Table with Chaining

Performance Considerations

Contributing

License

Author

Acknowledgments

20 KiB

Raw Blame History

1. `randomized_quicksort(arr)`

2. `randomized_partition(arr, low, high)`

3. `compare_with_builtin(arr)`

4. `analyze_performance(array_sizes)`

1. `insert(key, value)`

2. `get(key)`

3. `delete(key)`

4. `contains(key)`