Initial commit: Heapsort and Priority Queue Implementation

- Implemented complete Heapsort algorithm with max-heap operations - Implemented binary heap-based Priority Queue with all core operations - Created Task class for task scheduling applications - Implemented Task Scheduler simulation using priority queue - Added comprehensive test suite (70+ tests, all passing) - Implemented sorting algorithm comparison utilities (Heapsort vs Quicksort vs Merge Sort) - Added detailed README with comprehensive analysis and documentation - Included demonstration scripts for all features - Generated performance comparison plots - Modular, well-documented code following academic standards Author: Carlos Gutierrez Email: cgutierrez44833@ucumberlands.edu
2025-11-09 21:54:13 -05:00
commit c1b0fd3aaf
26 changed files with 3552 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,568 @@
+# MSCS532 Assignment 4: Heapsort and Priority Queue Implementation
+
+**Author:** Carlos Gutierrez  
+**Email:** cgutierrez44833@ucumberlands.edu  
+**Course:** MSCS532 - Data Structures and Algorithms
+
+## Overview
+
+This project implements and analyzes two fundamental data structures and algorithms:
+
+1. **Heapsort Algorithm**: A complete implementation of the Heapsort sorting algorithm with detailed complexity analysis
+2. **Priority Queue**: A binary heap-based priority queue implementation with support for task scheduling
+
+The project includes comprehensive testing, empirical performance comparisons with other sorting algorithms, and detailed documentation suitable for academic submission.
+
+## Project Structure
+
+```
+MSCS532_Assignment4/
+├── src/
+│   ├── __init__.py           # Package initialization
+│   ├── heapsort.py            # Heapsort implementation
+│   ├── priority_queue.py      # Priority Queue implementation
+│   ├── task.py                # Task class definition
+│   ├── scheduler.py           # Task scheduler simulation
+│   └── comparison.py            # Sorting algorithm comparison utilities
+├── tests/
+│   ├── __init__.py
+│   ├── test_heapsort.py       # Tests for heapsort
+│   ├── test_priority_queue.py # Tests for priority queue
+│   ├── test_task.py           # Tests for task class
+│   ├── test_scheduler.py      # Tests for task scheduler
+│   └── test_comparison.py     # Tests for comparison utilities
+├── examples/
+│   ├── heapsort_demo.py       # Heapsort demonstration
+│   ├── priority_queue_demo.py # Priority queue demonstration
+│   ├── comparison_demo.py      # Sorting comparison demonstration
+│   ├── scheduler_simulation.py # Task scheduler simulation
+│   └── generate_plots.py      # Plot generation script
+├── docs/
+│   ├── sorting_comparison.png      # Performance comparison plots
+│   ├── sorting_comparison_bar.png  # Bar chart comparison
+│   └── algorithm_distributions.png # Algorithm distribution plots
+├── requirements.txt            # Python dependencies
+└── README.md                   # This file
+```
+
+## Features
+
+### Heapsort Implementation
+- Complete max-heap implementation
+- In-place and non-in-place sorting options
+- Support for custom key functions
+- Time complexity: O(n log n) in all cases
+- Space complexity: O(1) for in-place sorting
+
+### Priority Queue Implementation
+- Binary heap-based priority queue
+- Support for both max-heap and min-heap configurations
+- Core operations:
+  - `insert(task)`: O(log n)
+  - `extract_max()` / `extract_min()`: O(log n)
+  - `increase_key()` / `decrease_key()`: O(n) (can be optimized to O(log n))
+  - `is_empty()`: O(1)
+  - `peek()`: O(1)
+- Task class with priority, deadline, and timing information
+
+### Performance Comparison
+- Empirical comparison of Heapsort, Quicksort, and Merge Sort
+- Testing on different input distributions:
+  - Sorted arrays
+  - Reverse-sorted arrays
+  - Random arrays
+- Performance analysis and visualization
+
+![Sorting Algorithm Comparison](docs/sorting_comparison.png)
+
+*Performance comparison across different input distributions*
+
+![Sorting Algorithm Bar Chart](docs/sorting_comparison_bar.png)
+
+*Bar chart comparison on random input data*
+
+![Algorithm Performance by Distribution](docs/algorithm_distributions.png)
+
+*Individual algorithm performance across different input types*
+
+## Installation
+
+### Prerequisites
+- Python 3.7 or higher
+- pip (Python package manager)
+
+### Setup
+
+1. Clone the repository:
+```bash
+git clone https://github.com/CarGDev/MSCS532_Assignment4
+cd MSCS532_Assignment4
+```
+
+2. Install dependencies (if any):
+```bash
+pip install -r requirements.txt
+```
+
+Note: This project uses only Python standard library, so no external dependencies are required.
+
+## Usage
+
+### Running Tests
+
+Run all tests:
+```bash
+python -m pytest tests/ -v
+```
+
+Or using unittest:
+```bash
+python -m unittest discover tests -v
+```
+
+Run specific test modules:
+```bash
+python -m unittest tests.test_heapsort -v
+python -m unittest tests.test_priority_queue -v
+python -m unittest tests.test_task -v
+python -m unittest tests.test_comparison -v
+```
+
+### Heapsort Example
+
+```python
+from src.heapsort import heapsort
+
+# Sort an array
+arr = [12, 11, 13, 5, 6, 7]
+sorted_arr = heapsort(arr, inplace=False)
+print(sorted_arr)  # [5, 6, 7, 11, 12, 13]
+
+# In-place sorting
+arr = [3, 1, 4, 1, 5, 9, 2, 6]
+heapsort(arr, inplace=True)
+print(arr)  # [1, 1, 2, 3, 4, 5, 6, 9]
+```
+
+### Priority Queue Example
+
+```python
+from src.priority_queue import PriorityQueue
+from src.task import Task
+
+# Create a priority queue
+pq = PriorityQueue(is_max_heap=True)
+
+# Insert tasks
+pq.insert(Task("T1", priority=10, arrival_time=0.0))
+pq.insert(Task("T2", priority=5, arrival_time=1.0))
+pq.insert(Task("T3", priority=15, arrival_time=2.0))
+
+# Extract highest priority task
+task = pq.extract_max()
+print(task.task_id)  # "T3" (highest priority)
+
+# Check queue status
+print(pq.is_empty())  # False
+print(pq.size())      # 2
+```
+
+### Sorting Comparison Example
+
+```python
+from src.comparison import run_comparison
+
+# Compare sorting algorithms on different input sizes
+results = run_comparison(sizes=[100, 1000, 10000, 100000])
+```
+
+### Running Demo Scripts
+
+```bash
+# Heapsort demonstration
+python examples/heapsort_demo.py
+
+# Priority queue demonstration
+python examples/priority_queue_demo.py
+
+# Sorting algorithm comparison
+python examples/comparison_demo.py
+
+# Task scheduler simulation
+python examples/scheduler_simulation.py
+```
+
+## Time Complexity Analysis
+
+### Heapsort
+- **Worst Case**: O(n log n)
+- **Average Case**: O(n log n)
+- **Best Case**: O(n log n)
+- **Space Complexity**: O(1) for in-place, O(n) for non-in-place
+
+### Priority Queue Operations
+- **insert()**: O(log n)
+- **extract_max() / extract_min()**: O(log n)
+- **increase_key() / decrease_key()**: O(n) - can be optimized to O(log n) with hash map
+- **is_empty()**: O(1)
+- **peek()**: O(1)
+- **Space Complexity**: O(n)
+
+## Design Decisions
+
+### Data Structure Choice
+The priority queue is implemented using a Python list to represent the binary heap. This choice provides:
+- **Ease of Implementation**: Simple index calculations for parent/child relationships
+- **Efficiency**: O(1) access to any element, O(log n) for heap operations
+- **Memory Efficiency**: No overhead from node objects or pointers
+
+### Max-Heap vs Min-Heap
+The implementation supports both configurations:
+- **Max-Heap**: Higher priority values are extracted first (default)
+- **Min-Heap**: Lower priority values are extracted first
+
+### Task Representation
+The `Task` class uses a dataclass for clean, readable code with:
+- Task identification (ID)
+- Priority level
+- Timing information (arrival time, deadline)
+- Execution metadata
+
+## Testing
+
+The project includes comprehensive test coverage:
+- **Unit Tests**: Individual function and method testing
+- **Edge Cases**: Empty arrays, single elements, duplicates
+- **Integration Tests**: Full workflow testing
+- **Performance Tests**: Large-scale operation testing
+
+Run tests with:
+```bash
+python -m unittest discover tests -v
+```
+
+## Detailed Analysis and Report
+
+### Heapsort Implementation
+
+#### Algorithm Overview
+
+Heapsort is an in-place sorting algorithm that uses a binary heap data structure. The algorithm consists of two main phases:
+
+1. **Heap Construction**: Build a max-heap from the input array
+2. **Extraction Phase**: Repeatedly extract the maximum element and place it at the end of the array
+
+#### Implementation Details
+
+The implementation includes the following key functions:
+
+- **`_heapify(arr, n, i, key)`**: Maintains the max-heap property for a subtree rooted at index `i`. Time Complexity: O(log n)
+- **`_build_max_heap(arr, key)`**: Converts an unsorted array into a max-heap. Time Complexity: O(n)
+- **`heapsort(arr, key, inplace)`**: The main sorting function. Time Complexity: O(n log n)
+
+#### Heapsort Time Complexity Analysis
+
+**Worst Case: O(n log n)**
+- Heap construction: O(n)
+- n extractions × O(log n) per extraction: O(n log n)
+- **Total**: O(n) + O(n log n) = **O(n log n)**
+
+**Average Case: O(n log n)**
+- The average case follows the same pattern as the worst case
+
+**Best Case: O(n log n)**
+- Unlike some sorting algorithms, Heapsort does not have a best-case scenario that improves performance
+- Even if the input is already sorted, the algorithm must still build the heap (O(n)) and extract all elements (O(n log n))
+
+**Why O(n log n) in all cases?**
+
+Heapsort's time complexity is always O(n log n) because:
+- The heap structure requires maintaining the heap property regardless of input order
+- Each extraction requires O(log n) time to restore the heap property
+- There are n extractions, resulting in O(n log n) total time
+
+**Space Complexity:**
+- In-place version: O(1) - only uses a constant amount of extra space
+- Non-in-place version: O(n) - creates a copy of the input array
+
+**Additional Overheads:**
+1. Constant factors: Heapsort has higher constant factors than Quicksort
+2. Cache performance: The heap structure has poor cache locality compared to array-based algorithms
+3. Comparison overhead: More comparisons than Quicksort on average
+
+### Sorting Algorithm Comparison
+
+#### Algorithms Compared
+
+1. **Heapsort**: O(n log n) in all cases, in-place
+2. **Quicksort**: O(n log n) average, O(n²) worst case, in-place
+3. **Merge Sort**: O(n log n) in all cases, requires O(n) extra space
+
+#### Performance Characteristics
+
+1. **Heapsort**:
+   - Consistent performance across all input types
+   - Slightly slower than Quicksort on average due to constant factors
+   - More predictable than Quicksort (no worst-case degradation)
+
+2. **Quicksort**:
+   - Fastest on average for random inputs
+   - Degrades to O(n²) on sorted/reverse-sorted inputs (without optimizations)
+   - Excellent cache performance
+
+3. **Merge Sort**:
+   - Consistent O(n log n) performance
+   - Requires additional O(n) space
+   - Good for external sorting
+
+**When to Use Heapsort:**
+- Guaranteed O(n log n) performance is required
+- In-place sorting is necessary
+- Worst-case performance must be predictable
+- Memory constraints prevent using Merge Sort's O(n) space
+
+### Priority Queue Implementation Details
+
+#### Data Structure Choice
+
+The priority queue is implemented using a **binary heap** represented as a Python list. This choice is justified by:
+
+1. **Ease of Implementation**: Simple index calculations for parent/child relationships
+   - Parent of node at index `i`: `(i-1)//2`
+   - Left child of node at index `i`: `2*i+1`
+   - Right child of node at index `i`: `2*i+2`
+
+2. **Efficiency**: 
+   - O(1) access to any element
+   - O(log n) for heap operations
+   - No pointer overhead
+
+3. **Memory Efficiency**: Compact representation without node objects
+
+#### Task Class Design
+
+The `Task` class represents individual tasks with:
+- **task_id**: Unique identifier
+- **priority**: Priority level (higher = more important)
+- **arrival_time**: When the task enters the system
+- **deadline**: Optional deadline for completion
+- **execution_time**: Estimated execution duration
+- **description**: Optional task description
+
+The class implements comparison operators based on priority, enabling natural use in priority queues.
+
+#### Core Operations Analysis
+
+**`insert(task)`: O(log n)**
+- Add task to the end of the heap array
+- Bubble up to maintain heap property
+- Time Complexity: O(log n) - height of the heap
+
+**`extract_max()` / `extract_min()`: O(log n)**
+- Remove root element
+- Move last element to root
+- Bubble down to maintain heap property
+- Time Complexity: O(log n) - height of the heap
+
+**`increase_key(task, new_priority)`: O(n)**
+- Find the task in the heap (O(n))
+- Update priority
+- Bubble up if necessary (O(log n))
+- Time Complexity: O(n) - linear search dominates
+- **Note**: Can be optimized to O(log n) using a hash map for O(1) lookup
+
+**`decrease_key(task, new_priority)`: O(n)**
+- Similar to `increase_key`, but bubbles down instead of up
+- Time Complexity: O(n)
+
+**`is_empty()`: O(1)**
+- Simple check of heap size
+
+**`peek()`: O(1)**
+- Returns the root element without removal
+
+### Task Scheduler Simulation
+
+#### Implementation
+
+A complete task scheduler simulation has been implemented using the priority queue. The scheduler demonstrates practical applications of priority queues in operating systems and job scheduling systems.
+
+#### Scheduler Design
+
+The `TaskScheduler` class implements a priority-based scheduling algorithm:
+
+1. **Task Insertion**: All tasks are inserted into a priority queue (O(n log n))
+2. **Task Execution**: Tasks are extracted and executed in priority order (O(n log n))
+3. **Deadline Monitoring**: Each task's deadline is checked upon completion
+4. **Statistics Collection**: Comprehensive statistics are collected during scheduling
+
+#### Time Complexity Analysis
+
+- **schedule_tasks()**: O(n log n) where n is the number of tasks
+  - Inserting n tasks: O(n log n)
+  - Extracting n tasks: O(n log n)
+- **get_statistics()**: O(n) to calculate statistics from results
+- **Space Complexity**: O(n) for the priority queue
+
+#### Scheduling Results
+
+The scheduler simulation demonstrates:
+- **Priority-based execution**: High-priority tasks execute first
+- **Deadline tracking**: Tasks are monitored for deadline compliance
+- **Wait time calculation**: Tracks how long tasks wait before execution
+- **Performance metrics**: Throughput, average wait time, and deadline compliance rates
+
+The scheduler simulation shows that:
+- Priority-based scheduling ensures critical tasks execute first
+- Pure priority scheduling may miss deadlines for lower-priority tasks with tight deadlines
+- The scheduler efficiently handles large workloads (50+ tasks) using O(n log n) algorithm
+- Statistics provide valuable insights into scheduling performance
+
+### Design Decisions
+
+#### 1. List-Based Heap Representation
+
+**Decision**: Use Python list instead of node-based tree structure.
+
+**Rationale**:
+- Simpler implementation
+- Better memory locality
+- Easier index calculations
+- No pointer overhead
+
+**Trade-off**: Slightly less intuitive than tree structure, but more efficient.
+
+#### 2. Max-Heap vs Min-Heap Configuration
+
+**Decision**: Support both configurations via constructor parameter.
+
+**Rationale**:
+- Flexibility for different use cases
+- Single implementation for both
+- Clear API distinction
+
+#### 3. Task Class Design
+
+**Decision**: Use dataclass with comparison operators.
+
+**Rationale**:
+- Clean, readable code
+- Natural integration with priority queue
+- Easy to extend with additional fields
+
+#### 4. In-Place vs Non-In-Place Sorting
+
+**Decision**: Support both options in heapsort.
+
+**Rationale**:
+- Flexibility for different use cases
+- In-place for memory efficiency
+- Non-in-place for preserving original data
+
+#### 5. Linear Search for Key Updates
+
+**Decision**: Use linear search instead of hash map for `increase_key`/`decrease_key`.
+
+**Rationale**:
+- Simpler implementation
+- No additional space overhead
+- Acceptable for small to medium-sized queues
+- Can be optimized later if needed
+
+### Experimental Results
+
+#### Test Configuration
+
+Tests were conducted on:
+- **Input Sizes**: 100, 1,000, 10,000, 100,000 elements
+- **Distributions**: Sorted, reverse-sorted, random
+- **Algorithms**: Heapsort, Quicksort, Merge Sort
+
+#### Key Findings
+
+1. **Heapsort Performance**:
+   - Consistent O(n log n) behavior across all input types
+   - Approximately 1.5-2x slower than optimized Quicksort on random data
+   - More predictable than Quicksort (no worst-case degradation)
+
+2. **Priority Queue Performance**:
+   - Efficient insertion and extraction for large numbers of tasks
+   - Suitable for real-time scheduling applications
+   - Linear key updates acceptable for moderate queue sizes
+
+3. **Scalability**:
+   - Both implementations scale well with input size
+   - Performance matches theoretical predictions
+
+### Conclusion
+
+This project successfully implements and analyzes:
+
+1. **Heapsort Algorithm**: A robust, O(n log n) sorting algorithm suitable for scenarios requiring guaranteed performance
+2. **Priority Queue**: An efficient data structure for task scheduling and priority-based processing
+3. **Task Scheduler**: A complete simulation demonstrating practical applications
+
+#### Key Achievements
+
+- ✅ Complete, well-documented implementations
+- ✅ Comprehensive complexity analysis
+- ✅ Empirical performance comparisons
+- ✅ Extensive test coverage (70+ tests)
+- ✅ Modular, maintainable code structure
+- ✅ Task scheduler simulation with statistics
+
+#### Future Improvements
+
+1. **Optimize Key Updates**: Implement hash map for O(log n) key updates
+2. **Parallel Heapsort**: Explore parallel heap construction
+3. **Adaptive Heapsort**: Optimize for partially sorted inputs
+4. **Priority Queue Variants**: Implement binomial heap or Fibonacci heap
+
+## Results Summary
+
+### Heapsort Performance
+- Consistent O(n log n) performance across all input types
+- Slightly slower than Quicksort on average due to constant factors
+- More predictable than Quicksort (no worst-case O(n²) scenario)
+- Comparable to Merge Sort but with better space efficiency (in-place)
+
+The performance plots above demonstrate:
+- **Heapsort**: Consistent performance regardless of input distribution
+- **Quicksort**: Fastest on random data, but degrades significantly on sorted/reverse-sorted inputs
+- **Merge Sort**: Consistent performance but requires O(n) extra space
+
+### Priority Queue Performance
+- Efficient insertion and extraction operations
+- Suitable for task scheduling applications
+- Can handle large numbers of tasks efficiently
+
+### Generating Performance Plots
+
+To regenerate the performance comparison plots:
+
+```bash
+python3 examples/generate_plots.py
+```
+
+This will generate visualization plots in the `docs/` directory comparing all three sorting algorithms.
+
+## Contributing
+
+This is an academic assignment. For questions or issues, please contact:
+- **Carlos Gutierrez**
+- **Email**: cgutierrez44833@ucumberlands.edu
+
+## License
+
+See LICENSE file for details.
+
+## References
+
+1. Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). *Introduction to Algorithms* (3rd ed.). MIT Press.
+2. Sedgewick, R., & Wayne, K. (2011). *Algorithms* (4th ed.). Addison-Wesley Professional.
+3. Python Software Foundation. (2023). *Python Documentation*. https://docs.python.org/3/
+
+## Acknowledgments
+
+This implementation follows standard algorithms and data structures as described in classical computer science textbooks. The code is designed for educational purposes and academic submission.
+