# MSCS532 Assignment 4: Heapsort and Priority Queue Implementation **Author:** Carlos Gutierrez **Email:** cgutierrez44833@ucumberlands.edu **Course:** MSCS532 - Data Structures and Algorithms ## Overview This project implements and analyzes two fundamental data structures and algorithms: 1. **Heapsort Algorithm**: A complete implementation of the Heapsort sorting algorithm with detailed complexity analysis 2. **Priority Queue**: A binary heap-based priority queue implementation with support for task scheduling The project includes comprehensive testing, empirical performance comparisons with other sorting algorithms, and detailed documentation suitable for academic submission. ## Project Structure ``` MSCS532_Assignment4/ ├── src/ │ ├── __init__.py # Package initialization │ ├── heapsort.py # Heapsort implementation │ ├── priority_queue.py # Priority Queue implementation │ ├── task.py # Task class definition │ ├── scheduler.py # Task scheduler simulation │ └── comparison.py # Sorting algorithm comparison utilities ├── tests/ │ ├── __init__.py │ ├── test_heapsort.py # Tests for heapsort │ ├── test_priority_queue.py # Tests for priority queue │ ├── test_task.py # Tests for task class │ ├── test_scheduler.py # Tests for task scheduler │ └── test_comparison.py # Tests for comparison utilities ├── examples/ │ ├── heapsort_demo.py # Heapsort demonstration │ ├── priority_queue_demo.py # Priority queue demonstration │ ├── comparison_demo.py # Sorting comparison demonstration │ ├── scheduler_simulation.py # Task scheduler simulation │ └── generate_plots.py # Plot generation script ├── docs/ │ ├── sorting_comparison.png # Performance comparison plots │ ├── sorting_comparison_bar.png # Bar chart comparison │ └── algorithm_distributions.png # Algorithm distribution plots ├── requirements.txt # Python dependencies └── README.md # This file ``` ## Features ### Heapsort Implementation - Complete max-heap implementation - In-place and non-in-place sorting options - Support for custom key functions - Time complexity: O(n log n) in all cases - Space complexity: O(1) for in-place sorting ### Priority Queue Implementation - Binary heap-based priority queue - Support for both max-heap and min-heap configurations - Core operations: - `insert(task)`: O(log n) - `extract_max()` / `extract_min()`: O(log n) - `increase_key()` / `decrease_key()`: O(n) (can be optimized to O(log n)) - `is_empty()`: O(1) - `peek()`: O(1) - Task class with priority, deadline, and timing information ### Performance Comparison - Empirical comparison of Heapsort, Quicksort, and Merge Sort - Testing on different input distributions: - Sorted arrays - Reverse-sorted arrays - Random arrays - Performance analysis and visualization ![Sorting Algorithm Comparison](docs/sorting_comparison.png) *Performance comparison across different input distributions* ![Sorting Algorithm Bar Chart](docs/sorting_comparison_bar.png) *Bar chart comparison on random input data* ![Algorithm Performance by Distribution](docs/algorithm_distributions.png) *Individual algorithm performance across different input types* ## Installation ### Prerequisites - Python 3.7 or higher - pip (Python package manager) ### Setup 1. Clone the repository: ```bash git clone https://github.com/CarGDev/MSCS532_Assignment4 cd MSCS532_Assignment4 ``` 2. Install dependencies (if any): ```bash pip install -r requirements.txt ``` Note: This project uses only Python standard library, so no external dependencies are required. ## Usage ### Running Tests Run all tests: ```bash python -m pytest tests/ -v ``` Or using unittest: ```bash python -m unittest discover tests -v ``` Run specific test modules: ```bash python -m unittest tests.test_heapsort -v python -m unittest tests.test_priority_queue -v python -m unittest tests.test_task -v python -m unittest tests.test_comparison -v ``` ### Heapsort Example ```python from src.heapsort import heapsort # Sort an array arr = [12, 11, 13, 5, 6, 7] sorted_arr = heapsort(arr, inplace=False) print(sorted_arr) # [5, 6, 7, 11, 12, 13] # In-place sorting arr = [3, 1, 4, 1, 5, 9, 2, 6] heapsort(arr, inplace=True) print(arr) # [1, 1, 2, 3, 4, 5, 6, 9] ``` ### Priority Queue Example ```python from src.priority_queue import PriorityQueue from src.task import Task # Create a priority queue pq = PriorityQueue(is_max_heap=True) # Insert tasks pq.insert(Task("T1", priority=10, arrival_time=0.0)) pq.insert(Task("T2", priority=5, arrival_time=1.0)) pq.insert(Task("T3", priority=15, arrival_time=2.0)) # Extract highest priority task task = pq.extract_max() print(task.task_id) # "T3" (highest priority) # Check queue status print(pq.is_empty()) # False print(pq.size()) # 2 ``` ### Sorting Comparison Example ```python from src.comparison import run_comparison # Compare sorting algorithms on different input sizes results = run_comparison(sizes=[100, 1000, 10000, 100000]) ``` ### Running Demo Scripts ```bash # Heapsort demonstration python examples/heapsort_demo.py # Priority queue demonstration python examples/priority_queue_demo.py # Sorting algorithm comparison python examples/comparison_demo.py # Task scheduler simulation python examples/scheduler_simulation.py ``` ## Time Complexity Analysis ### Heapsort - **Worst Case**: O(n log n) - **Average Case**: O(n log n) - **Best Case**: O(n log n) - **Space Complexity**: O(1) for in-place, O(n) for non-in-place ### Priority Queue Operations - **insert()**: O(log n) - **extract_max() / extract_min()**: O(log n) - **increase_key() / decrease_key()**: O(n) - can be optimized to O(log n) with hash map - **is_empty()**: O(1) - **peek()**: O(1) - **Space Complexity**: O(n) ## Design Decisions ### Data Structure Choice The priority queue is implemented using a Python list to represent the binary heap. This choice provides: - **Ease of Implementation**: Simple index calculations for parent/child relationships - **Efficiency**: O(1) access to any element, O(log n) for heap operations - **Memory Efficiency**: No overhead from node objects or pointers ### Max-Heap vs Min-Heap The implementation supports both configurations: - **Max-Heap**: Higher priority values are extracted first (default) - **Min-Heap**: Lower priority values are extracted first ### Task Representation The `Task` class uses a dataclass for clean, readable code with: - Task identification (ID) - Priority level - Timing information (arrival time, deadline) - Execution metadata ## Testing The project includes comprehensive test coverage: - **Unit Tests**: Individual function and method testing - **Edge Cases**: Empty arrays, single elements, duplicates - **Integration Tests**: Full workflow testing - **Performance Tests**: Large-scale operation testing Run tests with: ```bash python -m unittest discover tests -v ``` ## Detailed Analysis and Report ### Heapsort Implementation #### Algorithm Overview Heapsort is an in-place sorting algorithm that uses a binary heap data structure. The algorithm consists of two main phases: 1. **Heap Construction**: Build a max-heap from the input array 2. **Extraction Phase**: Repeatedly extract the maximum element and place it at the end of the array #### Implementation Details The implementation includes the following key functions: - **`_heapify(arr, n, i, key)`**: Maintains the max-heap property for a subtree rooted at index `i`. Time Complexity: O(log n) - **`_build_max_heap(arr, key)`**: Converts an unsorted array into a max-heap. Time Complexity: O(n) - **`heapsort(arr, key, inplace)`**: The main sorting function. Time Complexity: O(n log n) #### Heapsort Time Complexity Analysis **Worst Case: O(n log n)** - Heap construction: O(n) - n extractions × O(log n) per extraction: O(n log n) - **Total**: O(n) + O(n log n) = **O(n log n)** **Average Case: O(n log n)** - The average case follows the same pattern as the worst case **Best Case: O(n log n)** - Unlike some sorting algorithms, Heapsort does not have a best-case scenario that improves performance - Even if the input is already sorted, the algorithm must still build the heap (O(n)) and extract all elements (O(n log n)) **Why O(n log n) in all cases?** Heapsort's time complexity is always O(n log n) because: - The heap structure requires maintaining the heap property regardless of input order - Each extraction requires O(log n) time to restore the heap property - There are n extractions, resulting in O(n log n) total time **Space Complexity:** - In-place version: O(1) - only uses a constant amount of extra space - Non-in-place version: O(n) - creates a copy of the input array **Additional Overheads:** 1. Constant factors: Heapsort has higher constant factors than Quicksort 2. Cache performance: The heap structure has poor cache locality compared to array-based algorithms 3. Comparison overhead: More comparisons than Quicksort on average ### Sorting Algorithm Comparison #### Algorithms Compared 1. **Heapsort**: O(n log n) in all cases, in-place 2. **Quicksort**: O(n log n) average, O(n²) worst case, in-place 3. **Merge Sort**: O(n log n) in all cases, requires O(n) extra space #### Performance Characteristics 1. **Heapsort**: - Consistent performance across all input types - Slightly slower than Quicksort on average due to constant factors - More predictable than Quicksort (no worst-case degradation) 2. **Quicksort**: - Fastest on average for random inputs - Degrades to O(n²) on sorted/reverse-sorted inputs (without optimizations) - Excellent cache performance 3. **Merge Sort**: - Consistent O(n log n) performance - Requires additional O(n) space - Good for external sorting **When to Use Heapsort:** - Guaranteed O(n log n) performance is required - In-place sorting is necessary - Worst-case performance must be predictable - Memory constraints prevent using Merge Sort's O(n) space ### Priority Queue Implementation Details #### Data Structure Choice The priority queue is implemented using a **binary heap** represented as a Python list. This choice is justified by: 1. **Ease of Implementation**: Simple index calculations for parent/child relationships - Parent of node at index `i`: `(i-1)//2` - Left child of node at index `i`: `2*i+1` - Right child of node at index `i`: `2*i+2` 2. **Efficiency**: - O(1) access to any element - O(log n) for heap operations - No pointer overhead 3. **Memory Efficiency**: Compact representation without node objects #### Task Class Design The `Task` class represents individual tasks with: - **task_id**: Unique identifier - **priority**: Priority level (higher = more important) - **arrival_time**: When the task enters the system - **deadline**: Optional deadline for completion - **execution_time**: Estimated execution duration - **description**: Optional task description The class implements comparison operators based on priority, enabling natural use in priority queues. #### Core Operations Analysis **`insert(task)`: O(log n)** - Add task to the end of the heap array - Bubble up to maintain heap property - Time Complexity: O(log n) - height of the heap **`extract_max()` / `extract_min()`: O(log n)** - Remove root element - Move last element to root - Bubble down to maintain heap property - Time Complexity: O(log n) - height of the heap **`increase_key(task, new_priority)`: O(n)** - Find the task in the heap (O(n)) - Update priority - Bubble up if necessary (O(log n)) - Time Complexity: O(n) - linear search dominates - **Note**: Can be optimized to O(log n) using a hash map for O(1) lookup **`decrease_key(task, new_priority)`: O(n)** - Similar to `increase_key`, but bubbles down instead of up - Time Complexity: O(n) **`is_empty()`: O(1)** - Simple check of heap size **`peek()`: O(1)** - Returns the root element without removal ### Task Scheduler Simulation #### Implementation A complete task scheduler simulation has been implemented using the priority queue. The scheduler demonstrates practical applications of priority queues in operating systems and job scheduling systems. #### Scheduler Design The `TaskScheduler` class implements a priority-based scheduling algorithm: 1. **Task Insertion**: All tasks are inserted into a priority queue (O(n log n)) 2. **Task Execution**: Tasks are extracted and executed in priority order (O(n log n)) 3. **Deadline Monitoring**: Each task's deadline is checked upon completion 4. **Statistics Collection**: Comprehensive statistics are collected during scheduling #### Time Complexity Analysis - **schedule_tasks()**: O(n log n) where n is the number of tasks - Inserting n tasks: O(n log n) - Extracting n tasks: O(n log n) - **get_statistics()**: O(n) to calculate statistics from results - **Space Complexity**: O(n) for the priority queue #### Scheduling Results The scheduler simulation demonstrates: - **Priority-based execution**: High-priority tasks execute first - **Deadline tracking**: Tasks are monitored for deadline compliance - **Wait time calculation**: Tracks how long tasks wait before execution - **Performance metrics**: Throughput, average wait time, and deadline compliance rates The scheduler simulation shows that: - Priority-based scheduling ensures critical tasks execute first - Pure priority scheduling may miss deadlines for lower-priority tasks with tight deadlines - The scheduler efficiently handles large workloads (50+ tasks) using O(n log n) algorithm - Statistics provide valuable insights into scheduling performance ### Design Decisions #### 1. List-Based Heap Representation **Decision**: Use Python list instead of node-based tree structure. **Rationale**: - Simpler implementation - Better memory locality - Easier index calculations - No pointer overhead **Trade-off**: Slightly less intuitive than tree structure, but more efficient. #### 2. Max-Heap vs Min-Heap Configuration **Decision**: Support both configurations via constructor parameter. **Rationale**: - Flexibility for different use cases - Single implementation for both - Clear API distinction #### 3. Task Class Design **Decision**: Use dataclass with comparison operators. **Rationale**: - Clean, readable code - Natural integration with priority queue - Easy to extend with additional fields #### 4. In-Place vs Non-In-Place Sorting **Decision**: Support both options in heapsort. **Rationale**: - Flexibility for different use cases - In-place for memory efficiency - Non-in-place for preserving original data #### 5. Linear Search for Key Updates **Decision**: Use linear search instead of hash map for `increase_key`/`decrease_key`. **Rationale**: - Simpler implementation - No additional space overhead - Acceptable for small to medium-sized queues - Can be optimized later if needed ### Experimental Results #### Test Configuration Tests were conducted on: - **Input Sizes**: 100, 1,000, 10,000, 100,000 elements - **Distributions**: Sorted, reverse-sorted, random - **Algorithms**: Heapsort, Quicksort, Merge Sort #### Key Findings 1. **Heapsort Performance**: - Consistent O(n log n) behavior across all input types - Approximately 1.5-2x slower than optimized Quicksort on random data - More predictable than Quicksort (no worst-case degradation) 2. **Priority Queue Performance**: - Efficient insertion and extraction for large numbers of tasks - Suitable for real-time scheduling applications - Linear key updates acceptable for moderate queue sizes 3. **Scalability**: - Both implementations scale well with input size - Performance matches theoretical predictions ### Conclusion This project successfully implements and analyzes: 1. **Heapsort Algorithm**: A robust, O(n log n) sorting algorithm suitable for scenarios requiring guaranteed performance 2. **Priority Queue**: An efficient data structure for task scheduling and priority-based processing 3. **Task Scheduler**: A complete simulation demonstrating practical applications #### Key Achievements - ✅ Complete, well-documented implementations - ✅ Comprehensive complexity analysis - ✅ Empirical performance comparisons - ✅ Extensive test coverage (70+ tests) - ✅ Modular, maintainable code structure - ✅ Task scheduler simulation with statistics #### Future Improvements 1. **Optimize Key Updates**: Implement hash map for O(log n) key updates 2. **Parallel Heapsort**: Explore parallel heap construction 3. **Adaptive Heapsort**: Optimize for partially sorted inputs 4. **Priority Queue Variants**: Implement binomial heap or Fibonacci heap ## Results Summary ### Heapsort Performance - Consistent O(n log n) performance across all input types - Slightly slower than Quicksort on average due to constant factors - More predictable than Quicksort (no worst-case O(n²) scenario) - Comparable to Merge Sort but with better space efficiency (in-place) The performance plots above demonstrate: - **Heapsort**: Consistent performance regardless of input distribution - **Quicksort**: Fastest on random data, but degrades significantly on sorted/reverse-sorted inputs - **Merge Sort**: Consistent performance but requires O(n) extra space ### Priority Queue Performance - Efficient insertion and extraction operations - Suitable for task scheduling applications - Can handle large numbers of tasks efficiently ### Generating Performance Plots To regenerate the performance comparison plots: ```bash python3 examples/generate_plots.py ``` This will generate visualization plots in the `docs/` directory comparing all three sorting algorithms. ## Contributing This is an academic assignment. For questions or issues, please contact: - **Carlos Gutierrez** - **Email**: cgutierrez44833@ucumberlands.edu ## License See LICENSE file for details. ## References 1. Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). *Introduction to Algorithms* (3rd ed.). MIT Press. 2. Sedgewick, R., & Wayne, K. (2011). *Algorithms* (4th ed.). Addison-Wesley Professional. 3. Python Software Foundation. (2023). *Python Documentation*. https://docs.python.org/3/ ## Acknowledgments This implementation follows standard algorithms and data structures as described in classical computer science textbooks. The code is designed for educational purposes and academic submission.