Files
MSCS532_Assignment4/README.md
Carlos Gutierrez c1b0fd3aaf Initial commit: Heapsort and Priority Queue Implementation
- Implemented complete Heapsort algorithm with max-heap operations
- Implemented binary heap-based Priority Queue with all core operations
- Created Task class for task scheduling applications
- Implemented Task Scheduler simulation using priority queue
- Added comprehensive test suite (70+ tests, all passing)
- Implemented sorting algorithm comparison utilities (Heapsort vs Quicksort vs Merge Sort)
- Added detailed README with comprehensive analysis and documentation
- Included demonstration scripts for all features
- Generated performance comparison plots
- Modular, well-documented code following academic standards

Author: Carlos Gutierrez
Email: cgutierrez44833@ucumberlands.edu
2025-11-09 21:54:13 -05:00

569 lines
18 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# MSCS532 Assignment 4: Heapsort and Priority Queue Implementation
**Author:** Carlos Gutierrez
**Email:** cgutierrez44833@ucumberlands.edu
**Course:** MSCS532 - Data Structures and Algorithms
## Overview
This project implements and analyzes two fundamental data structures and algorithms:
1. **Heapsort Algorithm**: A complete implementation of the Heapsort sorting algorithm with detailed complexity analysis
2. **Priority Queue**: A binary heap-based priority queue implementation with support for task scheduling
The project includes comprehensive testing, empirical performance comparisons with other sorting algorithms, and detailed documentation suitable for academic submission.
## Project Structure
```
MSCS532_Assignment4/
├── src/
│ ├── __init__.py # Package initialization
│ ├── heapsort.py # Heapsort implementation
│ ├── priority_queue.py # Priority Queue implementation
│ ├── task.py # Task class definition
│ ├── scheduler.py # Task scheduler simulation
│ └── comparison.py # Sorting algorithm comparison utilities
├── tests/
│ ├── __init__.py
│ ├── test_heapsort.py # Tests for heapsort
│ ├── test_priority_queue.py # Tests for priority queue
│ ├── test_task.py # Tests for task class
│ ├── test_scheduler.py # Tests for task scheduler
│ └── test_comparison.py # Tests for comparison utilities
├── examples/
│ ├── heapsort_demo.py # Heapsort demonstration
│ ├── priority_queue_demo.py # Priority queue demonstration
│ ├── comparison_demo.py # Sorting comparison demonstration
│ ├── scheduler_simulation.py # Task scheduler simulation
│ └── generate_plots.py # Plot generation script
├── docs/
│ ├── sorting_comparison.png # Performance comparison plots
│ ├── sorting_comparison_bar.png # Bar chart comparison
│ └── algorithm_distributions.png # Algorithm distribution plots
├── requirements.txt # Python dependencies
└── README.md # This file
```
## Features
### Heapsort Implementation
- Complete max-heap implementation
- In-place and non-in-place sorting options
- Support for custom key functions
- Time complexity: O(n log n) in all cases
- Space complexity: O(1) for in-place sorting
### Priority Queue Implementation
- Binary heap-based priority queue
- Support for both max-heap and min-heap configurations
- Core operations:
- `insert(task)`: O(log n)
- `extract_max()` / `extract_min()`: O(log n)
- `increase_key()` / `decrease_key()`: O(n) (can be optimized to O(log n))
- `is_empty()`: O(1)
- `peek()`: O(1)
- Task class with priority, deadline, and timing information
### Performance Comparison
- Empirical comparison of Heapsort, Quicksort, and Merge Sort
- Testing on different input distributions:
- Sorted arrays
- Reverse-sorted arrays
- Random arrays
- Performance analysis and visualization
![Sorting Algorithm Comparison](docs/sorting_comparison.png)
*Performance comparison across different input distributions*
![Sorting Algorithm Bar Chart](docs/sorting_comparison_bar.png)
*Bar chart comparison on random input data*
![Algorithm Performance by Distribution](docs/algorithm_distributions.png)
*Individual algorithm performance across different input types*
## Installation
### Prerequisites
- Python 3.7 or higher
- pip (Python package manager)
### Setup
1. Clone the repository:
```bash
git clone https://github.com/CarGDev/MSCS532_Assignment4
cd MSCS532_Assignment4
```
2. Install dependencies (if any):
```bash
pip install -r requirements.txt
```
Note: This project uses only Python standard library, so no external dependencies are required.
## Usage
### Running Tests
Run all tests:
```bash
python -m pytest tests/ -v
```
Or using unittest:
```bash
python -m unittest discover tests -v
```
Run specific test modules:
```bash
python -m unittest tests.test_heapsort -v
python -m unittest tests.test_priority_queue -v
python -m unittest tests.test_task -v
python -m unittest tests.test_comparison -v
```
### Heapsort Example
```python
from src.heapsort import heapsort
# Sort an array
arr = [12, 11, 13, 5, 6, 7]
sorted_arr = heapsort(arr, inplace=False)
print(sorted_arr) # [5, 6, 7, 11, 12, 13]
# In-place sorting
arr = [3, 1, 4, 1, 5, 9, 2, 6]
heapsort(arr, inplace=True)
print(arr) # [1, 1, 2, 3, 4, 5, 6, 9]
```
### Priority Queue Example
```python
from src.priority_queue import PriorityQueue
from src.task import Task
# Create a priority queue
pq = PriorityQueue(is_max_heap=True)
# Insert tasks
pq.insert(Task("T1", priority=10, arrival_time=0.0))
pq.insert(Task("T2", priority=5, arrival_time=1.0))
pq.insert(Task("T3", priority=15, arrival_time=2.0))
# Extract highest priority task
task = pq.extract_max()
print(task.task_id) # "T3" (highest priority)
# Check queue status
print(pq.is_empty()) # False
print(pq.size()) # 2
```
### Sorting Comparison Example
```python
from src.comparison import run_comparison
# Compare sorting algorithms on different input sizes
results = run_comparison(sizes=[100, 1000, 10000, 100000])
```
### Running Demo Scripts
```bash
# Heapsort demonstration
python examples/heapsort_demo.py
# Priority queue demonstration
python examples/priority_queue_demo.py
# Sorting algorithm comparison
python examples/comparison_demo.py
# Task scheduler simulation
python examples/scheduler_simulation.py
```
## Time Complexity Analysis
### Heapsort
- **Worst Case**: O(n log n)
- **Average Case**: O(n log n)
- **Best Case**: O(n log n)
- **Space Complexity**: O(1) for in-place, O(n) for non-in-place
### Priority Queue Operations
- **insert()**: O(log n)
- **extract_max() / extract_min()**: O(log n)
- **increase_key() / decrease_key()**: O(n) - can be optimized to O(log n) with hash map
- **is_empty()**: O(1)
- **peek()**: O(1)
- **Space Complexity**: O(n)
## Design Decisions
### Data Structure Choice
The priority queue is implemented using a Python list to represent the binary heap. This choice provides:
- **Ease of Implementation**: Simple index calculations for parent/child relationships
- **Efficiency**: O(1) access to any element, O(log n) for heap operations
- **Memory Efficiency**: No overhead from node objects or pointers
### Max-Heap vs Min-Heap
The implementation supports both configurations:
- **Max-Heap**: Higher priority values are extracted first (default)
- **Min-Heap**: Lower priority values are extracted first
### Task Representation
The `Task` class uses a dataclass for clean, readable code with:
- Task identification (ID)
- Priority level
- Timing information (arrival time, deadline)
- Execution metadata
## Testing
The project includes comprehensive test coverage:
- **Unit Tests**: Individual function and method testing
- **Edge Cases**: Empty arrays, single elements, duplicates
- **Integration Tests**: Full workflow testing
- **Performance Tests**: Large-scale operation testing
Run tests with:
```bash
python -m unittest discover tests -v
```
## Detailed Analysis and Report
### Heapsort Implementation
#### Algorithm Overview
Heapsort is an in-place sorting algorithm that uses a binary heap data structure. The algorithm consists of two main phases:
1. **Heap Construction**: Build a max-heap from the input array
2. **Extraction Phase**: Repeatedly extract the maximum element and place it at the end of the array
#### Implementation Details
The implementation includes the following key functions:
- **`_heapify(arr, n, i, key)`**: Maintains the max-heap property for a subtree rooted at index `i`. Time Complexity: O(log n)
- **`_build_max_heap(arr, key)`**: Converts an unsorted array into a max-heap. Time Complexity: O(n)
- **`heapsort(arr, key, inplace)`**: The main sorting function. Time Complexity: O(n log n)
#### Heapsort Time Complexity Analysis
**Worst Case: O(n log n)**
- Heap construction: O(n)
- n extractions × O(log n) per extraction: O(n log n)
- **Total**: O(n) + O(n log n) = **O(n log n)**
**Average Case: O(n log n)**
- The average case follows the same pattern as the worst case
**Best Case: O(n log n)**
- Unlike some sorting algorithms, Heapsort does not have a best-case scenario that improves performance
- Even if the input is already sorted, the algorithm must still build the heap (O(n)) and extract all elements (O(n log n))
**Why O(n log n) in all cases?**
Heapsort's time complexity is always O(n log n) because:
- The heap structure requires maintaining the heap property regardless of input order
- Each extraction requires O(log n) time to restore the heap property
- There are n extractions, resulting in O(n log n) total time
**Space Complexity:**
- In-place version: O(1) - only uses a constant amount of extra space
- Non-in-place version: O(n) - creates a copy of the input array
**Additional Overheads:**
1. Constant factors: Heapsort has higher constant factors than Quicksort
2. Cache performance: The heap structure has poor cache locality compared to array-based algorithms
3. Comparison overhead: More comparisons than Quicksort on average
### Sorting Algorithm Comparison
#### Algorithms Compared
1. **Heapsort**: O(n log n) in all cases, in-place
2. **Quicksort**: O(n log n) average, O(n²) worst case, in-place
3. **Merge Sort**: O(n log n) in all cases, requires O(n) extra space
#### Performance Characteristics
1. **Heapsort**:
- Consistent performance across all input types
- Slightly slower than Quicksort on average due to constant factors
- More predictable than Quicksort (no worst-case degradation)
2. **Quicksort**:
- Fastest on average for random inputs
- Degrades to O(n²) on sorted/reverse-sorted inputs (without optimizations)
- Excellent cache performance
3. **Merge Sort**:
- Consistent O(n log n) performance
- Requires additional O(n) space
- Good for external sorting
**When to Use Heapsort:**
- Guaranteed O(n log n) performance is required
- In-place sorting is necessary
- Worst-case performance must be predictable
- Memory constraints prevent using Merge Sort's O(n) space
### Priority Queue Implementation Details
#### Data Structure Choice
The priority queue is implemented using a **binary heap** represented as a Python list. This choice is justified by:
1. **Ease of Implementation**: Simple index calculations for parent/child relationships
- Parent of node at index `i`: `(i-1)//2`
- Left child of node at index `i`: `2*i+1`
- Right child of node at index `i`: `2*i+2`
2. **Efficiency**:
- O(1) access to any element
- O(log n) for heap operations
- No pointer overhead
3. **Memory Efficiency**: Compact representation without node objects
#### Task Class Design
The `Task` class represents individual tasks with:
- **task_id**: Unique identifier
- **priority**: Priority level (higher = more important)
- **arrival_time**: When the task enters the system
- **deadline**: Optional deadline for completion
- **execution_time**: Estimated execution duration
- **description**: Optional task description
The class implements comparison operators based on priority, enabling natural use in priority queues.
#### Core Operations Analysis
**`insert(task)`: O(log n)**
- Add task to the end of the heap array
- Bubble up to maintain heap property
- Time Complexity: O(log n) - height of the heap
**`extract_max()` / `extract_min()`: O(log n)**
- Remove root element
- Move last element to root
- Bubble down to maintain heap property
- Time Complexity: O(log n) - height of the heap
**`increase_key(task, new_priority)`: O(n)**
- Find the task in the heap (O(n))
- Update priority
- Bubble up if necessary (O(log n))
- Time Complexity: O(n) - linear search dominates
- **Note**: Can be optimized to O(log n) using a hash map for O(1) lookup
**`decrease_key(task, new_priority)`: O(n)**
- Similar to `increase_key`, but bubbles down instead of up
- Time Complexity: O(n)
**`is_empty()`: O(1)**
- Simple check of heap size
**`peek()`: O(1)**
- Returns the root element without removal
### Task Scheduler Simulation
#### Implementation
A complete task scheduler simulation has been implemented using the priority queue. The scheduler demonstrates practical applications of priority queues in operating systems and job scheduling systems.
#### Scheduler Design
The `TaskScheduler` class implements a priority-based scheduling algorithm:
1. **Task Insertion**: All tasks are inserted into a priority queue (O(n log n))
2. **Task Execution**: Tasks are extracted and executed in priority order (O(n log n))
3. **Deadline Monitoring**: Each task's deadline is checked upon completion
4. **Statistics Collection**: Comprehensive statistics are collected during scheduling
#### Time Complexity Analysis
- **schedule_tasks()**: O(n log n) where n is the number of tasks
- Inserting n tasks: O(n log n)
- Extracting n tasks: O(n log n)
- **get_statistics()**: O(n) to calculate statistics from results
- **Space Complexity**: O(n) for the priority queue
#### Scheduling Results
The scheduler simulation demonstrates:
- **Priority-based execution**: High-priority tasks execute first
- **Deadline tracking**: Tasks are monitored for deadline compliance
- **Wait time calculation**: Tracks how long tasks wait before execution
- **Performance metrics**: Throughput, average wait time, and deadline compliance rates
The scheduler simulation shows that:
- Priority-based scheduling ensures critical tasks execute first
- Pure priority scheduling may miss deadlines for lower-priority tasks with tight deadlines
- The scheduler efficiently handles large workloads (50+ tasks) using O(n log n) algorithm
- Statistics provide valuable insights into scheduling performance
### Design Decisions
#### 1. List-Based Heap Representation
**Decision**: Use Python list instead of node-based tree structure.
**Rationale**:
- Simpler implementation
- Better memory locality
- Easier index calculations
- No pointer overhead
**Trade-off**: Slightly less intuitive than tree structure, but more efficient.
#### 2. Max-Heap vs Min-Heap Configuration
**Decision**: Support both configurations via constructor parameter.
**Rationale**:
- Flexibility for different use cases
- Single implementation for both
- Clear API distinction
#### 3. Task Class Design
**Decision**: Use dataclass with comparison operators.
**Rationale**:
- Clean, readable code
- Natural integration with priority queue
- Easy to extend with additional fields
#### 4. In-Place vs Non-In-Place Sorting
**Decision**: Support both options in heapsort.
**Rationale**:
- Flexibility for different use cases
- In-place for memory efficiency
- Non-in-place for preserving original data
#### 5. Linear Search for Key Updates
**Decision**: Use linear search instead of hash map for `increase_key`/`decrease_key`.
**Rationale**:
- Simpler implementation
- No additional space overhead
- Acceptable for small to medium-sized queues
- Can be optimized later if needed
### Experimental Results
#### Test Configuration
Tests were conducted on:
- **Input Sizes**: 100, 1,000, 10,000, 100,000 elements
- **Distributions**: Sorted, reverse-sorted, random
- **Algorithms**: Heapsort, Quicksort, Merge Sort
#### Key Findings
1. **Heapsort Performance**:
- Consistent O(n log n) behavior across all input types
- Approximately 1.5-2x slower than optimized Quicksort on random data
- More predictable than Quicksort (no worst-case degradation)
2. **Priority Queue Performance**:
- Efficient insertion and extraction for large numbers of tasks
- Suitable for real-time scheduling applications
- Linear key updates acceptable for moderate queue sizes
3. **Scalability**:
- Both implementations scale well with input size
- Performance matches theoretical predictions
### Conclusion
This project successfully implements and analyzes:
1. **Heapsort Algorithm**: A robust, O(n log n) sorting algorithm suitable for scenarios requiring guaranteed performance
2. **Priority Queue**: An efficient data structure for task scheduling and priority-based processing
3. **Task Scheduler**: A complete simulation demonstrating practical applications
#### Key Achievements
- ✅ Complete, well-documented implementations
- ✅ Comprehensive complexity analysis
- ✅ Empirical performance comparisons
- ✅ Extensive test coverage (70+ tests)
- ✅ Modular, maintainable code structure
- ✅ Task scheduler simulation with statistics
#### Future Improvements
1. **Optimize Key Updates**: Implement hash map for O(log n) key updates
2. **Parallel Heapsort**: Explore parallel heap construction
3. **Adaptive Heapsort**: Optimize for partially sorted inputs
4. **Priority Queue Variants**: Implement binomial heap or Fibonacci heap
## Results Summary
### Heapsort Performance
- Consistent O(n log n) performance across all input types
- Slightly slower than Quicksort on average due to constant factors
- More predictable than Quicksort (no worst-case O(n²) scenario)
- Comparable to Merge Sort but with better space efficiency (in-place)
The performance plots above demonstrate:
- **Heapsort**: Consistent performance regardless of input distribution
- **Quicksort**: Fastest on random data, but degrades significantly on sorted/reverse-sorted inputs
- **Merge Sort**: Consistent performance but requires O(n) extra space
### Priority Queue Performance
- Efficient insertion and extraction operations
- Suitable for task scheduling applications
- Can handle large numbers of tasks efficiently
### Generating Performance Plots
To regenerate the performance comparison plots:
```bash
python3 examples/generate_plots.py
```
This will generate visualization plots in the `docs/` directory comparing all three sorting algorithms.
## Contributing
This is an academic assignment. For questions or issues, please contact:
- **Carlos Gutierrez**
- **Email**: cgutierrez44833@ucumberlands.edu
## License
See LICENSE file for details.
## References
1. Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). *Introduction to Algorithms* (3rd ed.). MIT Press.
2. Sedgewick, R., & Wayne, K. (2011). *Algorithms* (4th ed.). Addison-Wesley Professional.
3. Python Software Foundation. (2023). *Python Documentation*. https://docs.python.org/3/
## Acknowledgments
This implementation follows standard algorithms and data structures as described in classical computer science textbooks. The code is designed for educational purposes and academic submission.