MSCS532_Assignment3/README.md

# Randomized Quicksort & Hash Table with Chaining - Algorithm Efficiency and Scalability

## Overview

This project implements two fundamental algorithms and data structures demonstrating algorithm efficiency and scalability:

1. **Randomized Quicksort Algorithm** - An efficient sorting algorithm with average O(n log n) time complexity
2. **Hash Table with Chaining** - A hash table implementation using chaining for collision resolution

Both implementations provide comprehensive test suites, performance analysis utilities, and detailed documentation for educational purposes.

### Key Features

* ✅ **Randomized Quicksort**: Efficient sorting with randomized pivot selection to avoid worst-case performance
* ✅ **Performance Analysis**: Built-in utilities for comparing and analyzing algorithm performance
* ✅ **Hash Table with Chaining**: Complete hash table implementation with dynamic resizing
* ✅ **Comprehensive Test Suite**: Extensive test coverage including edge cases, stress tests, and performance benchmarks
* ✅ **Well-Documented Code**: Clear comments, docstrings, and educational examples
* ✅ **Production-Ready**: Robust error handling and comprehensive test coverage

## Architecture

### Randomized Quicksort Algorithm Flow

```
Input Array: [64, 34, 25, 12, 22, 11, 90, 5]
             ↓
    ┌─────────────────────────────────────┐
    │   Randomized Quicksort Process      │
    └─────────────────────────────────────┘
             ↓
    ┌─────────────────────────────────────────────────┐
    │  Step 1: Randomly select pivot                  │
    │  Pivot: 25 (randomly selected)                  │
    │  Partition: [12, 22, 11, 5] | 25 | [64, 34, 90] │
    └─────────────────────────────────────────────────┘
             ↓
    ┌─────────────────────────────────────┐
    │  Step 2: Recursively sort left      │
    │  Array: [12, 22, 11, 5]             │
    │  Pivot: 11 → [5, 11] | [12, 22]     │
    └─────────────────────────────────────┘
             ↓
    ┌─────────────────────────────────────┐
    │  Step 3: Recursively sort right     │
    │  Array: [64, 34, 90]                │
    │  Pivot: 64 → [34, 64] | [90]        │
    └─────────────────────────────────────┘
             ↓
Output Array: [5, 11, 12, 22, 25, 34, 64, 90]
```

### Hash Table with Chaining Structure

```
Hash Table (size=8)
┌─────────────────────────────────────────┐
│ Bucket 0: [Key: 8, Value: "eight"]      │
│           [Key: 16, Value: "sixteen"]   │
│ Bucket 1: [Key: 9, Value: "nine"]       │
│ Bucket 2: [Key: 10, Value: "ten"]       │
│           [Key: 18, Value: "eighteen"]  │
│ Bucket 3: [Key: 11, Value: "eleven"]    │
│ Bucket 4: [Key: 12, Value: "twelve"]    │
│ Bucket 5: [Key: 13, Value: "thirteen"]  │
│ Bucket 6: [Key: 14, Value: "fourteen"]  │
│ Bucket 7: [Key: 15, Value: "fifteen"]   │
└─────────────────────────────────────────┘
         ↓
    Collision Resolution via Chaining
    (Multiple keys hash to same bucket)
```

### Core Algorithm Structure

#### Randomized Quicksort

```
┌─────────────────────────────────────────────────────────────┐
│              Randomized Quicksort                           │
├─────────────────────────────────────────────────────────────┤
│  Function: randomized_quicksort(arr)                        │
│  Input: Array of comparable elements                        │
│  Output: Array sorted in ascending order                    │
├─────────────────────────────────────────────────────────────┤
│  Algorithm Steps:                                           │
│  1. If array has ≤ 1 element, return                        │
│  2. Randomly select pivot element                           │
│  3. Partition array around pivot                            │
│  4. Recursively sort left subarray                          │
│  5. Recursively sort right subarray                         │
│  6. Combine results                                         │
└─────────────────────────────────────────────────────────────┘
```

#### Hash Table with Chaining

```
┌─────────────────────────────────────────────────────────────┐
│              Hash Table with Chaining                       │
├─────────────────────────────────────────────────────────────┤
│  Class: HashTable                                           │
│  Operations: insert, get, delete, contains                  │
├─────────────────────────────────────────────────────────────┤
│  Key Operations:                                            │
│  1. Hash function: h(k) = floor(m × (k × A mod 1))          │
│  2. Collision resolution: Chaining (linked lists)           │
│  3. Load factor management: Resize when threshold exceeded  │
│  4. Dynamic resizing: Double size when load > 0.75          │
└─────────────────────────────────────────────────────────────┘
```

## Implementation Details

### Part 1: Randomized Quicksort

#### Core Functions

##### 1. `randomized_quicksort(arr)`

* **Purpose**: Sort array using randomized quicksort algorithm
* **Parameters**: `arr` (list) - Input array to be sorted
* **Returns**: `list` - New array sorted in ascending order
* **Space Complexity**: O(n) - Creates a copy of the input array
* **Time Complexity**:
  - Average: O(n log n)
  - Worst: O(n²) - rarely occurs due to randomization
  - Best: O(n log n)

##### 2. `randomized_partition(arr, low, high)`

* **Purpose**: Partition array using a randomly selected pivot
* **Parameters**:
  - `arr` (list) - Array to partition
  - `low` (int) - Starting index
  - `high` (int) - Ending index
* **Returns**: `int` - Final position of pivot element
* **Key Feature**: Random pivot selection prevents worst-case O(n²) performance

##### 3. `compare_with_builtin(arr)`

* **Purpose**: Compare randomized quicksort with Python's built-in sort
* **Returns**: Dictionary with timing metrics and correctness verification

##### 4. `analyze_performance(array_sizes)`

* **Purpose**: Analyze quicksort performance across different array sizes
* **Returns**: List of performance metrics for each array size

##### 5. `deterministic_quicksort(arr)`

* **Purpose**: Sort array using deterministic quicksort (first element as pivot)
* **Parameters**: `arr` (list) - Input array to be sorted
* **Returns**: `list` - New array sorted in ascending order
* **Time Complexity**:
  - Average: O(n log n)
  - Worst: O(n²) - occurs on sorted/reverse-sorted arrays
* **Note**: Included for empirical comparison with randomized version

#### Algorithm Logic

**Why Randomization?**

Standard quicksort can degrade to O(n²) when:
- Pivot is always the smallest element (worst case)
- Pivot is always the largest element (worst case)
- Array is already sorted or reverse sorted

Randomization ensures:
- Expected O(n log n) performance
- Expected number of comparisons: 2n ln n ≈ 1.39n log₂ n
- Very low probability of worst-case behavior

### Part 2: Hash Table with Chaining

#### Core Operations

##### 1. `insert(key, value)`

* **Purpose**: Insert or update a key-value pair
* **Time Complexity**: O(1) average case, O(n) worst case
* **Features**:
  - Automatically updates if key exists
  - Triggers resize when load factor exceeds threshold

##### 2. `get(key)`

* **Purpose**: Retrieve value associated with a key
* **Time Complexity**: O(1) average case, O(n) worst case
* **Returns**: Value if key exists, None otherwise

##### 3. `delete(key)`

* **Purpose**: Remove a key-value pair
* **Time Complexity**: O(1) average case, O(n) worst case
* **Returns**: True if key was found and deleted, False otherwise

##### 4. `contains(key)`

* **Purpose**: Check if a key exists in the hash table
* **Time Complexity**: O(1) average case, O(n) worst case
* **Pythonic**: Supports `in` operator

#### Hash Function

**Multiplication Method:**
```
h(k) = floor(m × (k × A mod 1))
```
where:
- `m` = table size
- `A` ≈ (√5 - 1) / 2 ≈ 0.618 (golden ratio)
- Provides good distribution of keys across buckets

#### Collision Resolution

**Chaining Strategy:**
- Each bucket contains a linked list of key-value pairs
- When collision occurs, new element is appended to chain
- Allows multiple elements per bucket
- No clustering issues unlike open addressing

#### Dynamic Resizing

**Load Factor Management:**
- Default threshold: 0.75
- When load factor exceeds threshold, table size doubles
- All elements are rehashed into new table
- Maintains O(1) average performance

## Complexity Analysis

### Randomized Quicksort

| Aspect               | Complexity | Description                                        |
| -------------------- | ---------- | -------------------------------------------------- |
| **Time Complexity**  | O(n log n) | Average case - randomized pivot selection          |
| **Worst Case**       | O(n²)      | Rarely occurs due to randomization                 |
| **Best Case**        | O(n log n) | Already sorted arrays                              |
| **Space Complexity** | O(log n)   | Average case recursion stack depth                 |
| **Stability**        | Not Stable | Equal elements may change relative order           |

### Hash Table with Chaining

| Aspect               | Complexity | Description                                        |
| -------------------- | ---------- | -------------------------------------------------- |
| **Time Complexity**  | O(1)       | Average case for insert, get, delete               |
| **Worst Case**       | O(n)       | All keys hash to same bucket (rare)                |
| **Space Complexity** | O(n + m)   | n elements + m buckets                             |
| **Load Factor**      | 0.75       | Threshold for automatic resizing                   |

## Usage Examples

### Basic Usage - Randomized Quicksort

```python
from src.quicksort import randomized_quicksort, compare_with_builtin

# Example 1: Basic sorting
arr = [64, 34, 25, 12, 22, 11, 90, 5]
sorted_arr = randomized_quicksort(arr)
print(sorted_arr)  # Output: [5, 11, 12, 22, 25, 34, 64, 90]

# Example 2: Performance comparison
comparison = compare_with_builtin(arr)
print(f"Quicksort time: {comparison['quicksort_time']:.6f} seconds")
print(f"Built-in sort time: {comparison['builtin_time']:.6f} seconds")
print(f"Speedup ratio: {comparison['speedup']:.2f}x")
print(f"Results match: {comparison['is_correct']}")
```

### Basic Usage - Hash Table

```python
from src.hash_table import HashTable

# Create hash table
ht = HashTable(initial_size=16)

# Insert key-value pairs
ht.insert(1, "apple")
ht.insert(2, "banana")
ht.insert(3, "cherry")

# Retrieve values
print(ht.get(1))  # "apple"

# Check if key exists
print(2 in ht)  # True

# Delete a key
ht.delete(2)

# Get all items
items = ht.get_all_items()
print(items)  # [(1, "apple"), (3, "cherry")]
```

### Edge Cases Handled

#### Quicksort

```python
# Empty array
empty_arr = []
result = randomized_quicksort(empty_arr)
print(result)  # Output: []

# Single element
single = [42]
result = randomized_quicksort(single)
print(result)  # Output: [42]

# Duplicate elements
duplicates = [3, 3, 3, 3]
result = randomized_quicksort(duplicates)
print(result)  # Output: [3, 3, 3, 3]

# Negative numbers
negatives = [-5, -2, -8, 1, 3, -1, 0]
result = randomized_quicksort(negatives)
print(result)  # Output: [-8, -5, -2, -1, 0, 1, 3]
```

#### Hash Table

```python
# Empty hash table
ht = HashTable()
print(len(ht))  # 0
print(ht.get(1))  # None

# Collision handling
ht = HashTable(initial_size=5)
ht.insert(1, "one")
ht.insert(6, "six")  # May collide with 1
ht.insert(11, "eleven")  # May collide with 1 and 6
# All keys are stored correctly via chaining

# Load factor management
ht = HashTable(initial_size=4, load_factor_threshold=0.75)
ht.insert(1, "a")
ht.insert(2, "b")
ht.insert(3, "c")
ht.insert(4, "d")  # Triggers resize (load factor = 1.0 > 0.75)
print(ht.size)  # 8 (doubled)
```

## Running the Program

### Prerequisites

* Python 3.7 or higher
* No external dependencies required (uses only Python standard library)

### Execution

#### Run Examples

```bash
python3 -m src.examples
```

#### Run Tests

**Quick Tests (Essential functionality):**
```bash
python3 run_tests.py --quick
```

**Full Test Suite:**
```bash
python3 run_tests.py
```

**Unit Tests Only:**
```bash
python3 run_tests.py --unit-only
```

**Performance Benchmarks:**
```bash
python3 run_tests.py --benchmark
```

**Stress Tests:**
```bash
python3 run_tests.py --stress
```

**Negative Test Cases:**
```bash
python3 run_tests.py --negative
```

**Using unittest directly:**
```bash
python3 -m unittest discover tests -v
```

#### Run Empirical Comparison

**Generate Comparison Plots:**
```bash
python3 -m src.generate_plots
```

**Run Comparison Analysis:**
```bash
python3 -m src.quicksort_comparison
```

Both commands will generate detailed performance data and visualizations comparing Randomized vs Deterministic Quicksort.

## Test Cases

### Randomized Quicksort Tests

The test suite includes comprehensive test cases covering:

#### ✅ **Functional Tests**

* Basic sorting functionality
* Already sorted arrays (ascending/descending)
* Empty arrays and single elements
* Duplicate elements
* Negative numbers and zero values
* Large arrays (1000+ elements)

#### ✅ **Behavioral Tests**

* Non-destructive sorting (original array unchanged)
* Correctness verification against built-in sort
* Partition function correctness

#### ✅ **Performance Tests**

* Comparison with built-in sort
* Performance analysis across different array sizes
* Timing measurements

### Deterministic Quicksort Tests

The test suite includes comprehensive test cases covering:

#### ✅ **Functional Tests**

* All same scenarios as randomized quicksort
* Worst-case performance on sorted/reverse-sorted arrays
* Correctness verification

#### ✅ **Comparison Tests**

* Direct comparison between randomized and deterministic quicksort
* Verification that both produce identical results
* Performance consistency tests

#### ✅ **Edge Cases**

* Zero elements, single element, two elements
* All zeros, mixed positive/negative numbers
* Large value ranges
* Worst-case scenarios for deterministic quicksort

### Hash Table Tests

The test suite includes comprehensive test cases covering:

#### ✅ **Functional Tests**

* Basic insert, get, delete operations
* Empty hash table operations
* Collision handling
* Load factor calculation
* Dynamic resizing

#### ✅ **Behavioral Tests**

* Key existence checking (`in` operator)
* Update existing keys
* Delete from chains (middle of chain)
* Get all items

#### ✅ **Edge Cases**

* Empty hash table
* Single element
* All keys hash to same bucket
* Load factor threshold triggering resize

## Empirical Comparison Study

### Randomized vs Deterministic Quicksort

This project includes a comprehensive empirical comparison study comparing Randomized Quicksort with Deterministic Quicksort (using first element as pivot) across different input sizes and distributions.

**Documentation**: See [`QUICKSORT_COMPARISON.md`](QUICKSORT_COMPARISON.md) for detailed analysis and results.

**Visualizations**: Three comprehensive plots are included:
- `quicksort_comparison_plots.png` - Overview comparison across all distributions
- `quicksort_comparison_detailed.png` - Detailed views for each distribution type
- `quicksort_speedup_comparison.png` - Speedup ratios visualization

**Key Findings**:
- **Random Arrays**: Both algorithms perform similarly (~10-15% difference)
- **Sorted Arrays**: Deterministic degrades to O(n²); Randomized maintains O(n log n) - up to **475x speedup**
- **Reverse-Sorted Arrays**: Even worse degradation for deterministic - up to **857x speedup** for randomized
- **Repeated Elements**: Similar performance for both algorithms

**Running the Comparison**:
```bash
# Generate plots and detailed comparison
python3 -m src.generate_plots
python3 -m src.quicksort_comparison
```

## Project Structure

```
MSCS532_Assignment3/
├── src/
│   ├── __init__.py                # Package initialization
│   ├── quicksort.py              # Randomized & Deterministic Quicksort implementations
│   ├── quicksort_comparison.py   # Empirical comparison script
│   ├── generate_plots.py         # Plot generation script
│   ├── hash_table.py             # Hash Table with Chaining implementation
│   └── examples.py               # Example usage demonstrations
├── tests/
│   ├── __init__.py               # Test package initialization
│   ├── test_quicksort.py         # Comprehensive quicksort tests
│   └── test_hash_table.py        # Comprehensive hash table tests
├── run_tests.py                  # Test runner with various options
├── README.md                     # This documentation
├── QUICKSORT_COMPARISON.md       # Empirical comparison documentation
├── quicksort_comparison_plots.png       # Overview comparison plots
├── quicksort_comparison_detailed.png    # Detailed distribution plots
├── quicksort_speedup_comparison.png     # Speedup ratio plots
├── LICENSE                       # MIT License
├── .gitignore                    # Git ignore file
└── requirements.txt              # Python dependencies (none required)
```

## Testing

### Test Coverage

The project includes **41+ comprehensive test cases** covering:

#### ✅ **Functional Tests**

* Basic functionality for both algorithms
* Edge cases (empty, single element, duplicates)
* Correctness verification

#### ✅ **Behavioral Tests**

* Non-destructive operations
* In-place modifications
* Collision resolution
* Dynamic resizing

#### ✅ **Performance Tests**

* Timing comparisons
* Performance analysis across different sizes
* Benchmarking utilities

#### ✅ **Stress Tests**

* Large arrays (1000+ elements)
* Many hash table operations
* Boundary conditions

#### ✅ **Negative Test Cases**

* Invalid input types
* Edge cases and boundary conditions
* Error handling

### Running Tests

The project includes a comprehensive test runner (`run_tests.py`) with multiple options:

- **Quick Tests**: Essential functionality tests
- **Full Suite**: All tests including edge cases
- **Unit Tests**: Standard unittest tests only
- **Benchmarks**: Performance comparison tests
- **Stress Tests**: Large-scale and boundary tests
- **Negative Tests**: Invalid input and error handling tests

## Educational Value

This implementation serves as an excellent learning resource for:

* **Algorithm Understanding**: Clear demonstration of quicksort and hash table mechanics
* **Randomization Techniques**: Shows how randomization improves algorithm performance
* **Data Structure Design**: Demonstrates hash table implementation with collision resolution
* **Code Quality**: Demonstrates good practices in Python programming
* **Testing**: Comprehensive test suite showing edge case handling
* **Documentation**: Well-commented code with clear explanations
* **Performance Analysis**: Tools for understanding algorithm efficiency

## Algorithm Analysis

### Randomized Quicksort

**Why Randomization?**
- Standard quicksort can degrade to O(n²) when the pivot is always the smallest or largest element
- Randomization ensures expected O(n log n) performance
- Expected number of comparisons: 2n ln n ≈ 1.39n log₂ n

**Performance Characteristics:**
- Excellent average-case performance
- Non-destructive sorting (creates copy)
- Cache-friendly due to good locality of reference

**Comparison with Other Algorithms:**
- Faster than O(n²) algorithms (bubble, insertion, selection sort)
- Comparable to merge sort but with better space efficiency
- Generally slower than Python's built-in Timsort (optimized hybrid)

### Empirical Comparison Results

**Randomized vs Deterministic Quicksort:**

The project includes comprehensive empirical analysis comparing Randomized Quicksort with Deterministic Quicksort (first element as pivot). Results demonstrate:

1. **On Random Arrays**: Deterministic is ~10-15% faster (minimal overhead from randomization)
2. **On Sorted Arrays**: Randomized is **up to 475x faster** (deterministic shows O(n²) worst-case)
3. **On Reverse-Sorted Arrays**: Randomized is **up to 857x faster** (even worse degradation for deterministic)
4. **On Repeated Elements**: Both perform similarly (~5% difference)

**Visual Evidence**: The included plots (`quicksort_comparison_*.png`) clearly show:
- Exponential degradation curves for deterministic quicksort on worst-case inputs
- Consistent O(n log n) performance for randomized quicksort across all distributions
- Minimal overhead of randomization on random inputs

See [`QUICKSORT_COMPARISON.md`](QUICKSORT_COMPARISON.md) for detailed analysis, tables, and conclusions.

### Hash Table with Chaining

**Chaining vs. Open Addressing:**
- Chaining stores multiple elements in the same bucket using linked lists
- Handles collisions gracefully without clustering
- Load factor threshold prevents performance degradation

**Hash Function:**
- Uses multiplication method: h(k) = floor(m × (k × A mod 1))
- A ≈ (√5 - 1) / 2 ≈ 0.618 (golden ratio)
- Provides good distribution of keys across buckets

**Performance Considerations:**
- O(1) average case performance
- Dynamic resizing maintains efficiency
- Trade-off between space and time efficiency

## Performance Considerations

1. **Quicksort**:
   - Best for general-purpose sorting
   - Randomization prevents worst-case scenarios
   - Good for medium to large arrays

2. **Hash Table**:
   - Maintains O(1) average performance through load factor management
   - Resizing doubles table size when threshold is exceeded
   - Trade-off between space and time efficiency

## Contributing

This is an educational project demonstrating algorithm implementations. Feel free to:

* Add more test cases
* Implement additional algorithms
* Improve documentation
* Optimize the implementations
* Add visualization tools

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Author

**Carlos Gutierrez**
Email: cgutierrez44833@ucumberlands.edu

Created for MSCS532 Assignment 3: Understanding Algorithm Efficiency and Scalability

## Acknowledgments

* Based on standard algorithm implementations from Introduction to Algorithms (CLRS)
* Educational project for algorithm analysis and data structures course