Initial commit: Randomized Quicksort and Hash Table with Chaining implementation

- Implemented Randomized Quicksort algorithm with performance analysis - Implemented Hash Table with Chaining for collision resolution - Added comprehensive test suite (30+ test cases) - Created test runner script with multiple test options - Added detailed README with architecture diagrams and documentation - Added MIT License - Includes examples and comprehensive documentation
2025-11-04 21:35:02 -05:00
commit a7fe11fd74
12 changed files with 2024 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,593 @@
+# Randomized Quicksort & Hash Table with Chaining - Algorithm Efficiency and Scalability
+
+## Overview
+
+This project implements two fundamental algorithms and data structures demonstrating algorithm efficiency and scalability:
+
+1. **Randomized Quicksort Algorithm** - An efficient sorting algorithm with average O(n log n) time complexity
+2. **Hash Table with Chaining** - A hash table implementation using chaining for collision resolution
+
+Both implementations provide comprehensive test suites, performance analysis utilities, and detailed documentation for educational purposes.
+
+### Key Features
+
+* ✅ **Randomized Quicksort**: Efficient sorting with randomized pivot selection to avoid worst-case performance
+* ✅ **Performance Analysis**: Built-in utilities for comparing and analyzing algorithm performance
+* ✅ **Hash Table with Chaining**: Complete hash table implementation with dynamic resizing
+* ✅ **Comprehensive Test Suite**: Extensive test coverage including edge cases, stress tests, and performance benchmarks
+* ✅ **Well-Documented Code**: Clear comments, docstrings, and educational examples
+* ✅ **Production-Ready**: Robust error handling and comprehensive test coverage
+
+## Architecture
+
+### Randomized Quicksort Algorithm Flow
+
+```
+Input Array: [64, 34, 25, 12, 22, 11, 90, 5]
+             ↓
+    ┌─────────────────────────────────────┐
+    │   Randomized Quicksort Process      │
+    └─────────────────────────────────────┘
+             ↓
+    ┌─────────────────────────────────────────────────┐
+    │  Step 1: Randomly select pivot                  │
+    │  Pivot: 25 (randomly selected)                  │
+    │  Partition: [12, 22, 11, 5] | 25 | [64, 34, 90] │
+    └─────────────────────────────────────────────────┘
+             ↓
+    ┌─────────────────────────────────────┐
+    │  Step 2: Recursively sort left      │
+    │  Array: [12, 22, 11, 5]             │
+    │  Pivot: 11 → [5, 11] | [12, 22]     │
+    └─────────────────────────────────────┘
+             ↓
+    ┌─────────────────────────────────────┐
+    │  Step 3: Recursively sort right     │
+    │  Array: [64, 34, 90]                │
+    │  Pivot: 64 → [34, 64] | [90]        │
+    └─────────────────────────────────────┘
+             ↓
+Output Array: [5, 11, 12, 22, 25, 34, 64, 90]
+```
+
+### Hash Table with Chaining Structure
+
+```
+Hash Table (size=8)
+┌─────────────────────────────────────────┐
+│ Bucket 0: [Key: 8, Value: "eight"]      │
+│           [Key: 16, Value: "sixteen"]   │
+│ Bucket 1: [Key: 9, Value: "nine"]       │
+│ Bucket 2: [Key: 10, Value: "ten"]       │
+│           [Key: 18, Value: "eighteen"]  │
+│ Bucket 3: [Key: 11, Value: "eleven"]    │
+│ Bucket 4: [Key: 12, Value: "twelve"]    │
+│ Bucket 5: [Key: 13, Value: "thirteen"]  │
+│ Bucket 6: [Key: 14, Value: "fourteen"]  │
+│ Bucket 7: [Key: 15, Value: "fifteen"]   │
+└─────────────────────────────────────────┘
+         ↓
+    Collision Resolution via Chaining
+    (Multiple keys hash to same bucket)
+```
+
+### Core Algorithm Structure
+
+#### Randomized Quicksort
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│              Randomized Quicksort                           │
+├─────────────────────────────────────────────────────────────┤
+│  Function: randomized_quicksort(arr)                        │
+│  Input: Array of comparable elements                        │
+│  Output: Array sorted in ascending order                    │
+├─────────────────────────────────────────────────────────────┤
+│  Algorithm Steps:                                           │
+│  1. If array has ≤ 1 element, return                        │
+│  2. Randomly select pivot element                           │
+│  3. Partition array around pivot                            │
+│  4. Recursively sort left subarray                          │
+│  5. Recursively sort right subarray                         │
+│  6. Combine results                                         │
+└─────────────────────────────────────────────────────────────┘
+```
+
+#### Hash Table with Chaining
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│              Hash Table with Chaining                       │
+├─────────────────────────────────────────────────────────────┤
+│  Class: HashTable                                           │
+│  Operations: insert, get, delete, contains                  │
+├─────────────────────────────────────────────────────────────┤
+│  Key Operations:                                            │
+│  1. Hash function: h(k) = floor(m × (k × A mod 1))          │
+│  2. Collision resolution: Chaining (linked lists)           │
+│  3. Load factor management: Resize when threshold exceeded  │
+│  4. Dynamic resizing: Double size when load > 0.75          │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## Implementation Details
+
+### Part 1: Randomized Quicksort
+
+#### Core Functions
+
+##### 1. `randomized_quicksort(arr)`
+
+* **Purpose**: Sort array using randomized quicksort algorithm
+* **Parameters**: `arr` (list) - Input array to be sorted
+* **Returns**: `list` - New array sorted in ascending order
+* **Space Complexity**: O(n) - Creates a copy of the input array
+* **Time Complexity**: 
+  - Average: O(n log n)
+  - Worst: O(n²) - rarely occurs due to randomization
+  - Best: O(n log n)
+
+##### 2. `randomized_partition(arr, low, high)`
+
+* **Purpose**: Partition array using a randomly selected pivot
+* **Parameters**: 
+  - `arr` (list) - Array to partition
+  - `low` (int) - Starting index
+  - `high` (int) - Ending index
+* **Returns**: `int` - Final position of pivot element
+* **Key Feature**: Random pivot selection prevents worst-case O(n²) performance
+
+##### 3. `compare_with_builtin(arr)`
+
+* **Purpose**: Compare randomized quicksort with Python's built-in sort
+* **Returns**: Dictionary with timing metrics and correctness verification
+
+##### 4. `analyze_performance(array_sizes)`
+
+* **Purpose**: Analyze quicksort performance across different array sizes
+* **Returns**: List of performance metrics for each array size
+
+#### Algorithm Logic
+
+**Why Randomization?**
+
+Standard quicksort can degrade to O(n²) when:
+- Pivot is always the smallest element (worst case)
+- Pivot is always the largest element (worst case)
+- Array is already sorted or reverse sorted
+
+Randomization ensures:
+- Expected O(n log n) performance
+- Expected number of comparisons: 2n ln n ≈ 1.39n log₂ n
+- Very low probability of worst-case behavior
+
+### Part 2: Hash Table with Chaining
+
+#### Core Operations
+
+##### 1. `insert(key, value)`
+
+* **Purpose**: Insert or update a key-value pair
+* **Time Complexity**: O(1) average case, O(n) worst case
+* **Features**: 
+  - Automatically updates if key exists
+  - Triggers resize when load factor exceeds threshold
+
+##### 2. `get(key)`
+
+* **Purpose**: Retrieve value associated with a key
+* **Time Complexity**: O(1) average case, O(n) worst case
+* **Returns**: Value if key exists, None otherwise
+
+##### 3. `delete(key)`
+
+* **Purpose**: Remove a key-value pair
+* **Time Complexity**: O(1) average case, O(n) worst case
+* **Returns**: True if key was found and deleted, False otherwise
+
+##### 4. `contains(key)`
+
+* **Purpose**: Check if a key exists in the hash table
+* **Time Complexity**: O(1) average case, O(n) worst case
+* **Pythonic**: Supports `in` operator
+
+#### Hash Function
+
+**Multiplication Method:**
+```
+h(k) = floor(m × (k × A mod 1))
+```
+where:
+- `m` = table size
+- `A` ≈ (√5 - 1) / 2 ≈ 0.618 (golden ratio)
+- Provides good distribution of keys across buckets
+
+#### Collision Resolution
+
+**Chaining Strategy:**
+- Each bucket contains a linked list of key-value pairs
+- When collision occurs, new element is appended to chain
+- Allows multiple elements per bucket
+- No clustering issues unlike open addressing
+
+#### Dynamic Resizing
+
+**Load Factor Management:**
+- Default threshold: 0.75
+- When load factor exceeds threshold, table size doubles
+- All elements are rehashed into new table
+- Maintains O(1) average performance
+
+## Complexity Analysis
+
+### Randomized Quicksort
+
+| Aspect               | Complexity | Description                                        |
+| -------------------- | ---------- | -------------------------------------------------- |
+| **Time Complexity**  | O(n log n) | Average case - randomized pivot selection          |
+| **Worst Case**       | O(n²)      | Rarely occurs due to randomization                 |
+| **Best Case**        | O(n log n) | Already sorted arrays                              |
+| **Space Complexity** | O(log n)   | Average case recursion stack depth                 |
+| **Stability**        | Not Stable | Equal elements may change relative order           |
+
+### Hash Table with Chaining
+
+| Aspect               | Complexity | Description                                        |
+| -------------------- | ---------- | -------------------------------------------------- |
+| **Time Complexity**  | O(1)       | Average case for insert, get, delete               |
+| **Worst Case**       | O(n)       | All keys hash to same bucket (rare)                |
+| **Space Complexity** | O(n + m)   | n elements + m buckets                             |
+| **Load Factor**      | 0.75       | Threshold for automatic resizing                   |
+
+## Usage Examples
+
+### Basic Usage - Randomized Quicksort
+
+```python
+from src.quicksort import randomized_quicksort, compare_with_builtin
+
+# Example 1: Basic sorting
+arr = [64, 34, 25, 12, 22, 11, 90, 5]
+sorted_arr = randomized_quicksort(arr)
+print(sorted_arr)  # Output: [5, 11, 12, 22, 25, 34, 64, 90]
+
+# Example 2: Performance comparison
+comparison = compare_with_builtin(arr)
+print(f"Quicksort time: {comparison['quicksort_time']:.6f} seconds")
+print(f"Built-in sort time: {comparison['builtin_time']:.6f} seconds")
+print(f"Speedup ratio: {comparison['speedup']:.2f}x")
+print(f"Results match: {comparison['is_correct']}")
+```
+
+### Basic Usage - Hash Table
+
+```python
+from src.hash_table import HashTable
+
+# Create hash table
+ht = HashTable(initial_size=16)
+
+# Insert key-value pairs
+ht.insert(1, "apple")
+ht.insert(2, "banana")
+ht.insert(3, "cherry")
+
+# Retrieve values
+print(ht.get(1))  # "apple"
+
+# Check if key exists
+print(2 in ht)  # True
+
+# Delete a key
+ht.delete(2)
+
+# Get all items
+items = ht.get_all_items()
+print(items)  # [(1, "apple"), (3, "cherry")]
+```
+
+### Edge Cases Handled
+
+#### Quicksort
+
+```python
+# Empty array
+empty_arr = []
+result = randomized_quicksort(empty_arr)
+print(result)  # Output: []
+
+# Single element
+single = [42]
+result = randomized_quicksort(single)
+print(result)  # Output: [42]
+
+# Duplicate elements
+duplicates = [3, 3, 3, 3]
+result = randomized_quicksort(duplicates)
+print(result)  # Output: [3, 3, 3, 3]
+
+# Negative numbers
+negatives = [-5, -2, -8, 1, 3, -1, 0]
+result = randomized_quicksort(negatives)
+print(result)  # Output: [-8, -5, -2, -1, 0, 1, 3]
+```
+
+#### Hash Table
+
+```python
+# Empty hash table
+ht = HashTable()
+print(len(ht))  # 0
+print(ht.get(1))  # None
+
+# Collision handling
+ht = HashTable(initial_size=5)
+ht.insert(1, "one")
+ht.insert(6, "six")  # May collide with 1
+ht.insert(11, "eleven")  # May collide with 1 and 6
+# All keys are stored correctly via chaining
+
+# Load factor management
+ht = HashTable(initial_size=4, load_factor_threshold=0.75)
+ht.insert(1, "a")
+ht.insert(2, "b")
+ht.insert(3, "c")
+ht.insert(4, "d")  # Triggers resize (load factor = 1.0 > 0.75)
+print(ht.size)  # 8 (doubled)
+```
+
+## Running the Program
+
+### Prerequisites
+
+* Python 3.7 or higher
+* No external dependencies required (uses only Python standard library)
+
+### Execution
+
+#### Run Examples
+
+```bash
+python3 -m src.examples
+```
+
+#### Run Tests
+
+**Quick Tests (Essential functionality):**
+```bash
+python3 run_tests.py --quick
+```
+
+**Full Test Suite:**
+```bash
+python3 run_tests.py
+```
+
+**Unit Tests Only:**
+```bash
+python3 run_tests.py --unit-only
+```
+
+**Performance Benchmarks:**
+```bash
+python3 run_tests.py --benchmark
+```
+
+**Stress Tests:**
+```bash
+python3 run_tests.py --stress
+```
+
+**Negative Test Cases:**
+```bash
+python3 run_tests.py --negative
+```
+
+**Using unittest directly:**
+```bash
+python3 -m unittest discover tests -v
+```
+
+## Test Cases
+
+### Randomized Quicksort Tests
+
+The test suite includes comprehensive test cases covering:
+
+#### ✅ **Functional Tests**
+
+* Basic sorting functionality
+* Already sorted arrays (ascending/descending)
+* Empty arrays and single elements
+* Duplicate elements
+* Negative numbers and zero values
+* Large arrays (1000+ elements)
+
+#### ✅ **Behavioral Tests**
+
+* Non-destructive sorting (original array unchanged)
+* Correctness verification against built-in sort
+* Partition function correctness
+
+#### ✅ **Performance Tests**
+
+* Comparison with built-in sort
+* Performance analysis across different array sizes
+* Timing measurements
+
+### Hash Table Tests
+
+The test suite includes comprehensive test cases covering:
+
+#### ✅ **Functional Tests**
+
+* Basic insert, get, delete operations
+* Empty hash table operations
+* Collision handling
+* Load factor calculation
+* Dynamic resizing
+
+#### ✅ **Behavioral Tests**
+
+* Key existence checking (`in` operator)
+* Update existing keys
+* Delete from chains (middle of chain)
+* Get all items
+
+#### ✅ **Edge Cases**
+
+* Empty hash table
+* Single element
+* All keys hash to same bucket
+* Load factor threshold triggering resize
+
+## Project Structure
+
+```
+MSCS532_Assignment3/
+├── src/
+│   ├── __init__.py                # Package initialization
+│   ├── quicksort.py              # Randomized Quicksort implementation
+│   ├── hash_table.py             # Hash Table with Chaining implementation
+│   └── examples.py               # Example usage demonstrations
+├── tests/
+│   ├── __init__.py               # Test package initialization
+│   ├── test_quicksort.py         # Comprehensive quicksort tests
+│   └── test_hash_table.py        # Comprehensive hash table tests
+├── run_tests.py                  # Test runner with various options
+├── README.md                     # This documentation
+├── LICENSE                       # MIT License
+├── .gitignore                    # Git ignore file
+└── requirements.txt              # Python dependencies (none required)
+```
+
+## Testing
+
+### Test Coverage
+
+The project includes **30+ comprehensive test cases** covering:
+
+#### ✅ **Functional Tests**
+
+* Basic functionality for both algorithms
+* Edge cases (empty, single element, duplicates)
+* Correctness verification
+
+#### ✅ **Behavioral Tests**
+
+* Non-destructive operations
+* In-place modifications
+* Collision resolution
+* Dynamic resizing
+
+#### ✅ **Performance Tests**
+
+* Timing comparisons
+* Performance analysis across different sizes
+* Benchmarking utilities
+
+#### ✅ **Stress Tests**
+
+* Large arrays (1000+ elements)
+* Many hash table operations
+* Boundary conditions
+
+#### ✅ **Negative Test Cases**
+
+* Invalid input types
+* Edge cases and boundary conditions
+* Error handling
+
+### Running Tests
+
+The project includes a comprehensive test runner (`run_tests.py`) with multiple options:
+
+- **Quick Tests**: Essential functionality tests
+- **Full Suite**: All tests including edge cases
+- **Unit Tests**: Standard unittest tests only
+- **Benchmarks**: Performance comparison tests
+- **Stress Tests**: Large-scale and boundary tests
+- **Negative Tests**: Invalid input and error handling tests
+
+## Educational Value
+
+This implementation serves as an excellent learning resource for:
+
+* **Algorithm Understanding**: Clear demonstration of quicksort and hash table mechanics
+* **Randomization Techniques**: Shows how randomization improves algorithm performance
+* **Data Structure Design**: Demonstrates hash table implementation with collision resolution
+* **Code Quality**: Demonstrates good practices in Python programming
+* **Testing**: Comprehensive test suite showing edge case handling
+* **Documentation**: Well-commented code with clear explanations
+* **Performance Analysis**: Tools for understanding algorithm efficiency
+
+## Algorithm Analysis
+
+### Randomized Quicksort
+
+**Why Randomization?**
+- Standard quicksort can degrade to O(n²) when the pivot is always the smallest or largest element
+- Randomization ensures expected O(n log n) performance
+- Expected number of comparisons: 2n ln n ≈ 1.39n log₂ n
+
+**Performance Characteristics:**
+- Excellent average-case performance
+- Non-destructive sorting (creates copy)
+- Cache-friendly due to good locality of reference
+
+**Comparison with Other Algorithms:**
+- Faster than O(n²) algorithms (bubble, insertion, selection sort)
+- Comparable to merge sort but with better space efficiency
+- Generally slower than Python's built-in Timsort (optimized hybrid)
+
+### Hash Table with Chaining
+
+**Chaining vs. Open Addressing:**
+- Chaining stores multiple elements in the same bucket using linked lists
+- Handles collisions gracefully without clustering
+- Load factor threshold prevents performance degradation
+
+**Hash Function:**
+- Uses multiplication method: h(k) = floor(m × (k × A mod 1))
+- A ≈ (√5 - 1) / 2 ≈ 0.618 (golden ratio)
+- Provides good distribution of keys across buckets
+
+**Performance Considerations:**
+- O(1) average case performance
+- Dynamic resizing maintains efficiency
+- Trade-off between space and time efficiency
+
+## Performance Considerations
+
+1. **Quicksort**: 
+   - Best for general-purpose sorting
+   - Randomization prevents worst-case scenarios
+   - Good for medium to large arrays
+
+2. **Hash Table**: 
+   - Maintains O(1) average performance through load factor management
+   - Resizing doubles table size when threshold is exceeded
+   - Trade-off between space and time efficiency
+
+## Contributing
+
+This is an educational project demonstrating algorithm implementations. Feel free to:
+
+* Add more test cases
+* Implement additional algorithms
+* Improve documentation
+* Optimize the implementations
+* Add visualization tools
+
+## License
+
+This project is licensed under the MIT License - see the LICENSE file for details.
+
+## Author
+
+Created for MSCS532 Assignment 3: Understanding Algorithm Efficiency and Scalability
+
+## Acknowledgments
+
+* Based on standard algorithm implementations from Introduction to Algorithms (CLRS)
+* Educational project for algorithm analysis and data structures course