Initial commit: Randomized Quicksort and Hash Table with Chaining implementation
- Implemented Randomized Quicksort algorithm with performance analysis - Implemented Hash Table with Chaining for collision resolution - Added comprehensive test suite (30+ test cases) - Created test runner script with multiple test options - Added detailed README with architecture diagrams and documentation - Added MIT License - Includes examples and comprehensive documentation
This commit is contained in:
593
README.md
Normal file
593
README.md
Normal file
@@ -0,0 +1,593 @@
|
||||
# Randomized Quicksort & Hash Table with Chaining - Algorithm Efficiency and Scalability
|
||||
|
||||
## Overview
|
||||
|
||||
This project implements two fundamental algorithms and data structures demonstrating algorithm efficiency and scalability:
|
||||
|
||||
1. **Randomized Quicksort Algorithm** - An efficient sorting algorithm with average O(n log n) time complexity
|
||||
2. **Hash Table with Chaining** - A hash table implementation using chaining for collision resolution
|
||||
|
||||
Both implementations provide comprehensive test suites, performance analysis utilities, and detailed documentation for educational purposes.
|
||||
|
||||
### Key Features
|
||||
|
||||
* ✅ **Randomized Quicksort**: Efficient sorting with randomized pivot selection to avoid worst-case performance
|
||||
* ✅ **Performance Analysis**: Built-in utilities for comparing and analyzing algorithm performance
|
||||
* ✅ **Hash Table with Chaining**: Complete hash table implementation with dynamic resizing
|
||||
* ✅ **Comprehensive Test Suite**: Extensive test coverage including edge cases, stress tests, and performance benchmarks
|
||||
* ✅ **Well-Documented Code**: Clear comments, docstrings, and educational examples
|
||||
* ✅ **Production-Ready**: Robust error handling and comprehensive test coverage
|
||||
|
||||
## Architecture
|
||||
|
||||
### Randomized Quicksort Algorithm Flow
|
||||
|
||||
```
|
||||
Input Array: [64, 34, 25, 12, 22, 11, 90, 5]
|
||||
↓
|
||||
┌─────────────────────────────────────┐
|
||||
│ Randomized Quicksort Process │
|
||||
└─────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ Step 1: Randomly select pivot │
|
||||
│ Pivot: 25 (randomly selected) │
|
||||
│ Partition: [12, 22, 11, 5] | 25 | [64, 34, 90] │
|
||||
└─────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────┐
|
||||
│ Step 2: Recursively sort left │
|
||||
│ Array: [12, 22, 11, 5] │
|
||||
│ Pivot: 11 → [5, 11] | [12, 22] │
|
||||
└─────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────┐
|
||||
│ Step 3: Recursively sort right │
|
||||
│ Array: [64, 34, 90] │
|
||||
│ Pivot: 64 → [34, 64] | [90] │
|
||||
└─────────────────────────────────────┘
|
||||
↓
|
||||
Output Array: [5, 11, 12, 22, 25, 34, 64, 90]
|
||||
```
|
||||
|
||||
### Hash Table with Chaining Structure
|
||||
|
||||
```
|
||||
Hash Table (size=8)
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Bucket 0: [Key: 8, Value: "eight"] │
|
||||
│ [Key: 16, Value: "sixteen"] │
|
||||
│ Bucket 1: [Key: 9, Value: "nine"] │
|
||||
│ Bucket 2: [Key: 10, Value: "ten"] │
|
||||
│ [Key: 18, Value: "eighteen"] │
|
||||
│ Bucket 3: [Key: 11, Value: "eleven"] │
|
||||
│ Bucket 4: [Key: 12, Value: "twelve"] │
|
||||
│ Bucket 5: [Key: 13, Value: "thirteen"] │
|
||||
│ Bucket 6: [Key: 14, Value: "fourteen"] │
|
||||
│ Bucket 7: [Key: 15, Value: "fifteen"] │
|
||||
└─────────────────────────────────────────┘
|
||||
↓
|
||||
Collision Resolution via Chaining
|
||||
(Multiple keys hash to same bucket)
|
||||
```
|
||||
|
||||
### Core Algorithm Structure
|
||||
|
||||
#### Randomized Quicksort
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Randomized Quicksort │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Function: randomized_quicksort(arr) │
|
||||
│ Input: Array of comparable elements │
|
||||
│ Output: Array sorted in ascending order │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Algorithm Steps: │
|
||||
│ 1. If array has ≤ 1 element, return │
|
||||
│ 2. Randomly select pivot element │
|
||||
│ 3. Partition array around pivot │
|
||||
│ 4. Recursively sort left subarray │
|
||||
│ 5. Recursively sort right subarray │
|
||||
│ 6. Combine results │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
#### Hash Table with Chaining
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Hash Table with Chaining │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Class: HashTable │
|
||||
│ Operations: insert, get, delete, contains │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Key Operations: │
|
||||
│ 1. Hash function: h(k) = floor(m × (k × A mod 1)) │
|
||||
│ 2. Collision resolution: Chaining (linked lists) │
|
||||
│ 3. Load factor management: Resize when threshold exceeded │
|
||||
│ 4. Dynamic resizing: Double size when load > 0.75 │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Part 1: Randomized Quicksort
|
||||
|
||||
#### Core Functions
|
||||
|
||||
##### 1. `randomized_quicksort(arr)`
|
||||
|
||||
* **Purpose**: Sort array using randomized quicksort algorithm
|
||||
* **Parameters**: `arr` (list) - Input array to be sorted
|
||||
* **Returns**: `list` - New array sorted in ascending order
|
||||
* **Space Complexity**: O(n) - Creates a copy of the input array
|
||||
* **Time Complexity**:
|
||||
- Average: O(n log n)
|
||||
- Worst: O(n²) - rarely occurs due to randomization
|
||||
- Best: O(n log n)
|
||||
|
||||
##### 2. `randomized_partition(arr, low, high)`
|
||||
|
||||
* **Purpose**: Partition array using a randomly selected pivot
|
||||
* **Parameters**:
|
||||
- `arr` (list) - Array to partition
|
||||
- `low` (int) - Starting index
|
||||
- `high` (int) - Ending index
|
||||
* **Returns**: `int` - Final position of pivot element
|
||||
* **Key Feature**: Random pivot selection prevents worst-case O(n²) performance
|
||||
|
||||
##### 3. `compare_with_builtin(arr)`
|
||||
|
||||
* **Purpose**: Compare randomized quicksort with Python's built-in sort
|
||||
* **Returns**: Dictionary with timing metrics and correctness verification
|
||||
|
||||
##### 4. `analyze_performance(array_sizes)`
|
||||
|
||||
* **Purpose**: Analyze quicksort performance across different array sizes
|
||||
* **Returns**: List of performance metrics for each array size
|
||||
|
||||
#### Algorithm Logic
|
||||
|
||||
**Why Randomization?**
|
||||
|
||||
Standard quicksort can degrade to O(n²) when:
|
||||
- Pivot is always the smallest element (worst case)
|
||||
- Pivot is always the largest element (worst case)
|
||||
- Array is already sorted or reverse sorted
|
||||
|
||||
Randomization ensures:
|
||||
- Expected O(n log n) performance
|
||||
- Expected number of comparisons: 2n ln n ≈ 1.39n log₂ n
|
||||
- Very low probability of worst-case behavior
|
||||
|
||||
### Part 2: Hash Table with Chaining
|
||||
|
||||
#### Core Operations
|
||||
|
||||
##### 1. `insert(key, value)`
|
||||
|
||||
* **Purpose**: Insert or update a key-value pair
|
||||
* **Time Complexity**: O(1) average case, O(n) worst case
|
||||
* **Features**:
|
||||
- Automatically updates if key exists
|
||||
- Triggers resize when load factor exceeds threshold
|
||||
|
||||
##### 2. `get(key)`
|
||||
|
||||
* **Purpose**: Retrieve value associated with a key
|
||||
* **Time Complexity**: O(1) average case, O(n) worst case
|
||||
* **Returns**: Value if key exists, None otherwise
|
||||
|
||||
##### 3. `delete(key)`
|
||||
|
||||
* **Purpose**: Remove a key-value pair
|
||||
* **Time Complexity**: O(1) average case, O(n) worst case
|
||||
* **Returns**: True if key was found and deleted, False otherwise
|
||||
|
||||
##### 4. `contains(key)`
|
||||
|
||||
* **Purpose**: Check if a key exists in the hash table
|
||||
* **Time Complexity**: O(1) average case, O(n) worst case
|
||||
* **Pythonic**: Supports `in` operator
|
||||
|
||||
#### Hash Function
|
||||
|
||||
**Multiplication Method:**
|
||||
```
|
||||
h(k) = floor(m × (k × A mod 1))
|
||||
```
|
||||
where:
|
||||
- `m` = table size
|
||||
- `A` ≈ (√5 - 1) / 2 ≈ 0.618 (golden ratio)
|
||||
- Provides good distribution of keys across buckets
|
||||
|
||||
#### Collision Resolution
|
||||
|
||||
**Chaining Strategy:**
|
||||
- Each bucket contains a linked list of key-value pairs
|
||||
- When collision occurs, new element is appended to chain
|
||||
- Allows multiple elements per bucket
|
||||
- No clustering issues unlike open addressing
|
||||
|
||||
#### Dynamic Resizing
|
||||
|
||||
**Load Factor Management:**
|
||||
- Default threshold: 0.75
|
||||
- When load factor exceeds threshold, table size doubles
|
||||
- All elements are rehashed into new table
|
||||
- Maintains O(1) average performance
|
||||
|
||||
## Complexity Analysis
|
||||
|
||||
### Randomized Quicksort
|
||||
|
||||
| Aspect | Complexity | Description |
|
||||
| -------------------- | ---------- | -------------------------------------------------- |
|
||||
| **Time Complexity** | O(n log n) | Average case - randomized pivot selection |
|
||||
| **Worst Case** | O(n²) | Rarely occurs due to randomization |
|
||||
| **Best Case** | O(n log n) | Already sorted arrays |
|
||||
| **Space Complexity** | O(log n) | Average case recursion stack depth |
|
||||
| **Stability** | Not Stable | Equal elements may change relative order |
|
||||
|
||||
### Hash Table with Chaining
|
||||
|
||||
| Aspect | Complexity | Description |
|
||||
| -------------------- | ---------- | -------------------------------------------------- |
|
||||
| **Time Complexity** | O(1) | Average case for insert, get, delete |
|
||||
| **Worst Case** | O(n) | All keys hash to same bucket (rare) |
|
||||
| **Space Complexity** | O(n + m) | n elements + m buckets |
|
||||
| **Load Factor** | 0.75 | Threshold for automatic resizing |
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Usage - Randomized Quicksort
|
||||
|
||||
```python
|
||||
from src.quicksort import randomized_quicksort, compare_with_builtin
|
||||
|
||||
# Example 1: Basic sorting
|
||||
arr = [64, 34, 25, 12, 22, 11, 90, 5]
|
||||
sorted_arr = randomized_quicksort(arr)
|
||||
print(sorted_arr) # Output: [5, 11, 12, 22, 25, 34, 64, 90]
|
||||
|
||||
# Example 2: Performance comparison
|
||||
comparison = compare_with_builtin(arr)
|
||||
print(f"Quicksort time: {comparison['quicksort_time']:.6f} seconds")
|
||||
print(f"Built-in sort time: {comparison['builtin_time']:.6f} seconds")
|
||||
print(f"Speedup ratio: {comparison['speedup']:.2f}x")
|
||||
print(f"Results match: {comparison['is_correct']}")
|
||||
```
|
||||
|
||||
### Basic Usage - Hash Table
|
||||
|
||||
```python
|
||||
from src.hash_table import HashTable
|
||||
|
||||
# Create hash table
|
||||
ht = HashTable(initial_size=16)
|
||||
|
||||
# Insert key-value pairs
|
||||
ht.insert(1, "apple")
|
||||
ht.insert(2, "banana")
|
||||
ht.insert(3, "cherry")
|
||||
|
||||
# Retrieve values
|
||||
print(ht.get(1)) # "apple"
|
||||
|
||||
# Check if key exists
|
||||
print(2 in ht) # True
|
||||
|
||||
# Delete a key
|
||||
ht.delete(2)
|
||||
|
||||
# Get all items
|
||||
items = ht.get_all_items()
|
||||
print(items) # [(1, "apple"), (3, "cherry")]
|
||||
```
|
||||
|
||||
### Edge Cases Handled
|
||||
|
||||
#### Quicksort
|
||||
|
||||
```python
|
||||
# Empty array
|
||||
empty_arr = []
|
||||
result = randomized_quicksort(empty_arr)
|
||||
print(result) # Output: []
|
||||
|
||||
# Single element
|
||||
single = [42]
|
||||
result = randomized_quicksort(single)
|
||||
print(result) # Output: [42]
|
||||
|
||||
# Duplicate elements
|
||||
duplicates = [3, 3, 3, 3]
|
||||
result = randomized_quicksort(duplicates)
|
||||
print(result) # Output: [3, 3, 3, 3]
|
||||
|
||||
# Negative numbers
|
||||
negatives = [-5, -2, -8, 1, 3, -1, 0]
|
||||
result = randomized_quicksort(negatives)
|
||||
print(result) # Output: [-8, -5, -2, -1, 0, 1, 3]
|
||||
```
|
||||
|
||||
#### Hash Table
|
||||
|
||||
```python
|
||||
# Empty hash table
|
||||
ht = HashTable()
|
||||
print(len(ht)) # 0
|
||||
print(ht.get(1)) # None
|
||||
|
||||
# Collision handling
|
||||
ht = HashTable(initial_size=5)
|
||||
ht.insert(1, "one")
|
||||
ht.insert(6, "six") # May collide with 1
|
||||
ht.insert(11, "eleven") # May collide with 1 and 6
|
||||
# All keys are stored correctly via chaining
|
||||
|
||||
# Load factor management
|
||||
ht = HashTable(initial_size=4, load_factor_threshold=0.75)
|
||||
ht.insert(1, "a")
|
||||
ht.insert(2, "b")
|
||||
ht.insert(3, "c")
|
||||
ht.insert(4, "d") # Triggers resize (load factor = 1.0 > 0.75)
|
||||
print(ht.size) # 8 (doubled)
|
||||
```
|
||||
|
||||
## Running the Program
|
||||
|
||||
### Prerequisites
|
||||
|
||||
* Python 3.7 or higher
|
||||
* No external dependencies required (uses only Python standard library)
|
||||
|
||||
### Execution
|
||||
|
||||
#### Run Examples
|
||||
|
||||
```bash
|
||||
python3 -m src.examples
|
||||
```
|
||||
|
||||
#### Run Tests
|
||||
|
||||
**Quick Tests (Essential functionality):**
|
||||
```bash
|
||||
python3 run_tests.py --quick
|
||||
```
|
||||
|
||||
**Full Test Suite:**
|
||||
```bash
|
||||
python3 run_tests.py
|
||||
```
|
||||
|
||||
**Unit Tests Only:**
|
||||
```bash
|
||||
python3 run_tests.py --unit-only
|
||||
```
|
||||
|
||||
**Performance Benchmarks:**
|
||||
```bash
|
||||
python3 run_tests.py --benchmark
|
||||
```
|
||||
|
||||
**Stress Tests:**
|
||||
```bash
|
||||
python3 run_tests.py --stress
|
||||
```
|
||||
|
||||
**Negative Test Cases:**
|
||||
```bash
|
||||
python3 run_tests.py --negative
|
||||
```
|
||||
|
||||
**Using unittest directly:**
|
||||
```bash
|
||||
python3 -m unittest discover tests -v
|
||||
```
|
||||
|
||||
## Test Cases
|
||||
|
||||
### Randomized Quicksort Tests
|
||||
|
||||
The test suite includes comprehensive test cases covering:
|
||||
|
||||
#### ✅ **Functional Tests**
|
||||
|
||||
* Basic sorting functionality
|
||||
* Already sorted arrays (ascending/descending)
|
||||
* Empty arrays and single elements
|
||||
* Duplicate elements
|
||||
* Negative numbers and zero values
|
||||
* Large arrays (1000+ elements)
|
||||
|
||||
#### ✅ **Behavioral Tests**
|
||||
|
||||
* Non-destructive sorting (original array unchanged)
|
||||
* Correctness verification against built-in sort
|
||||
* Partition function correctness
|
||||
|
||||
#### ✅ **Performance Tests**
|
||||
|
||||
* Comparison with built-in sort
|
||||
* Performance analysis across different array sizes
|
||||
* Timing measurements
|
||||
|
||||
### Hash Table Tests
|
||||
|
||||
The test suite includes comprehensive test cases covering:
|
||||
|
||||
#### ✅ **Functional Tests**
|
||||
|
||||
* Basic insert, get, delete operations
|
||||
* Empty hash table operations
|
||||
* Collision handling
|
||||
* Load factor calculation
|
||||
* Dynamic resizing
|
||||
|
||||
#### ✅ **Behavioral Tests**
|
||||
|
||||
* Key existence checking (`in` operator)
|
||||
* Update existing keys
|
||||
* Delete from chains (middle of chain)
|
||||
* Get all items
|
||||
|
||||
#### ✅ **Edge Cases**
|
||||
|
||||
* Empty hash table
|
||||
* Single element
|
||||
* All keys hash to same bucket
|
||||
* Load factor threshold triggering resize
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
MSCS532_Assignment3/
|
||||
├── src/
|
||||
│ ├── __init__.py # Package initialization
|
||||
│ ├── quicksort.py # Randomized Quicksort implementation
|
||||
│ ├── hash_table.py # Hash Table with Chaining implementation
|
||||
│ └── examples.py # Example usage demonstrations
|
||||
├── tests/
|
||||
│ ├── __init__.py # Test package initialization
|
||||
│ ├── test_quicksort.py # Comprehensive quicksort tests
|
||||
│ └── test_hash_table.py # Comprehensive hash table tests
|
||||
├── run_tests.py # Test runner with various options
|
||||
├── README.md # This documentation
|
||||
├── LICENSE # MIT License
|
||||
├── .gitignore # Git ignore file
|
||||
└── requirements.txt # Python dependencies (none required)
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
### Test Coverage
|
||||
|
||||
The project includes **30+ comprehensive test cases** covering:
|
||||
|
||||
#### ✅ **Functional Tests**
|
||||
|
||||
* Basic functionality for both algorithms
|
||||
* Edge cases (empty, single element, duplicates)
|
||||
* Correctness verification
|
||||
|
||||
#### ✅ **Behavioral Tests**
|
||||
|
||||
* Non-destructive operations
|
||||
* In-place modifications
|
||||
* Collision resolution
|
||||
* Dynamic resizing
|
||||
|
||||
#### ✅ **Performance Tests**
|
||||
|
||||
* Timing comparisons
|
||||
* Performance analysis across different sizes
|
||||
* Benchmarking utilities
|
||||
|
||||
#### ✅ **Stress Tests**
|
||||
|
||||
* Large arrays (1000+ elements)
|
||||
* Many hash table operations
|
||||
* Boundary conditions
|
||||
|
||||
#### ✅ **Negative Test Cases**
|
||||
|
||||
* Invalid input types
|
||||
* Edge cases and boundary conditions
|
||||
* Error handling
|
||||
|
||||
### Running Tests
|
||||
|
||||
The project includes a comprehensive test runner (`run_tests.py`) with multiple options:
|
||||
|
||||
- **Quick Tests**: Essential functionality tests
|
||||
- **Full Suite**: All tests including edge cases
|
||||
- **Unit Tests**: Standard unittest tests only
|
||||
- **Benchmarks**: Performance comparison tests
|
||||
- **Stress Tests**: Large-scale and boundary tests
|
||||
- **Negative Tests**: Invalid input and error handling tests
|
||||
|
||||
## Educational Value
|
||||
|
||||
This implementation serves as an excellent learning resource for:
|
||||
|
||||
* **Algorithm Understanding**: Clear demonstration of quicksort and hash table mechanics
|
||||
* **Randomization Techniques**: Shows how randomization improves algorithm performance
|
||||
* **Data Structure Design**: Demonstrates hash table implementation with collision resolution
|
||||
* **Code Quality**: Demonstrates good practices in Python programming
|
||||
* **Testing**: Comprehensive test suite showing edge case handling
|
||||
* **Documentation**: Well-commented code with clear explanations
|
||||
* **Performance Analysis**: Tools for understanding algorithm efficiency
|
||||
|
||||
## Algorithm Analysis
|
||||
|
||||
### Randomized Quicksort
|
||||
|
||||
**Why Randomization?**
|
||||
- Standard quicksort can degrade to O(n²) when the pivot is always the smallest or largest element
|
||||
- Randomization ensures expected O(n log n) performance
|
||||
- Expected number of comparisons: 2n ln n ≈ 1.39n log₂ n
|
||||
|
||||
**Performance Characteristics:**
|
||||
- Excellent average-case performance
|
||||
- Non-destructive sorting (creates copy)
|
||||
- Cache-friendly due to good locality of reference
|
||||
|
||||
**Comparison with Other Algorithms:**
|
||||
- Faster than O(n²) algorithms (bubble, insertion, selection sort)
|
||||
- Comparable to merge sort but with better space efficiency
|
||||
- Generally slower than Python's built-in Timsort (optimized hybrid)
|
||||
|
||||
### Hash Table with Chaining
|
||||
|
||||
**Chaining vs. Open Addressing:**
|
||||
- Chaining stores multiple elements in the same bucket using linked lists
|
||||
- Handles collisions gracefully without clustering
|
||||
- Load factor threshold prevents performance degradation
|
||||
|
||||
**Hash Function:**
|
||||
- Uses multiplication method: h(k) = floor(m × (k × A mod 1))
|
||||
- A ≈ (√5 - 1) / 2 ≈ 0.618 (golden ratio)
|
||||
- Provides good distribution of keys across buckets
|
||||
|
||||
**Performance Considerations:**
|
||||
- O(1) average case performance
|
||||
- Dynamic resizing maintains efficiency
|
||||
- Trade-off between space and time efficiency
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
1. **Quicksort**:
|
||||
- Best for general-purpose sorting
|
||||
- Randomization prevents worst-case scenarios
|
||||
- Good for medium to large arrays
|
||||
|
||||
2. **Hash Table**:
|
||||
- Maintains O(1) average performance through load factor management
|
||||
- Resizing doubles table size when threshold is exceeded
|
||||
- Trade-off between space and time efficiency
|
||||
|
||||
## Contributing
|
||||
|
||||
This is an educational project demonstrating algorithm implementations. Feel free to:
|
||||
|
||||
* Add more test cases
|
||||
* Implement additional algorithms
|
||||
* Improve documentation
|
||||
* Optimize the implementations
|
||||
* Add visualization tools
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the MIT License - see the LICENSE file for details.
|
||||
|
||||
## Author
|
||||
|
||||
Created for MSCS532 Assignment 3: Understanding Algorithm Efficiency and Scalability
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
* Based on standard algorithm implementations from Introduction to Algorithms (CLRS)
|
||||
* Educational project for algorithm analysis and data structures course
|
||||
Reference in New Issue
Block a user