adding report

This commit is contained in:
Carlos Gutierrez
2025-11-16 15:59:13 -05:00
parent 1b44cb389f
commit 0c9fb776ae
9 changed files with 1586 additions and 14 deletions

View File

@@ -17,7 +17,9 @@ MSCS532_Assignment5/
│ ├── quicksort_comparison.png # Comparative performance (line plots)
│ ├── quicksort_comparison_bar.png # Deterministic vs randomized comparison
│ ├── quicksort_scalability.png # Scalability on random inputs
── quicksort_worst_case.png # Worst-case behavior on sorted inputs
── quicksort_worst_case.png # Worst-case behavior on sorted inputs
│ ├── quicksort_3way_comparison.png # Three-way vs standard on duplicates (line plots)
│ └── quicksort_3way_bar.png # Three-way vs standard on duplicates (bar chart)
├── examples/
│ ├── quicksort_demo.py # Usage demonstrations
│ ├── comparison_demo.py # Benchmark walkthrough
@@ -52,18 +54,18 @@ MSCS532_Assignment5/
| Scenario | Deterministic Quicksort | Randomized Quicksort | Notes |
|---------------|-------------------------|----------------------|-------|
| Best Case | $\(O(n \log n)\)$ | $\(O(n \log n)\)$ | Balanced partitions from median pivots |
| Average Case | $\(O(n \log n)\)$ | $\(O(n \log n)\)$ | Expected logarithmic recursion depth |
| Worst Case | $\(O(n^2)\)$ | $\(O(n^2)\)$ | Occurs with highly unbalanced splits |
| Best Case | $O(n \log n)$ | $O(n \log n)$ | Balanced partitions from median pivots |
| Average Case | $O(n \log n)$ | $O(n \log n)$ | Expected logarithmic recursion depth |
| Worst Case | $O(n^2)$ | $O(n^2)$ | Occurs with highly unbalanced splits |
- **Average-case intuition:** Balanced partitions of size \(n/2\) produce the recurrence $\(T(n) = 2T(n/2) + O(n)\)$, which resolves to $\(O(n \log n)\)$.
- **Worst-case intuition:** Consistently poor pivots reduce the problem by one element, yielding $\(T(n) = T(n - 1) + O(n)\)$ and $\(O(n^2)\)$ behavior.
- **Space complexity:** $\(O(\log n)\)$ expected stack depth for balanced recursion, $\(O(n)\)$ in the worst case. Randomized pivot selection significantly decreases the probability of worst-case depth on adversarial inputs.
- **Average-case intuition:** Balanced partitions of size $n/2$ produce the recurrence $T(n) = 2T(n/2) + O(n)$, which resolves to $O(n \log n)$.
- **Worst-case intuition:** Consistently poor pivots reduce the problem by one element, yielding $T(n) = T(n - 1) + O(n)$ and $O(n^2)$ behavior.
- **Space complexity:** $O(\log n)$ expected stack depth for balanced recursion, $O(n)$ in the worst case. Randomized pivot selection significantly decreases the probability of worst-case depth on adversarial inputs.
## 3. Randomized Quicksort
- Randomization chooses pivots uniformly at random, ensuring that any specific pivot ordering is unlikely.
- While the theoretical worst case remains $\(O(n^2)\)$, the probability of encountering it drops exponentially with input size.
- While the theoretical worst case remains $O(n^2)$, the probability of encountering it drops exponentially with input size.
- The implementation exposes an optional `seed` to guarantee repeatable experimental runs while retaining stochastic behavior by default.
## 4. Empirical Analysis
@@ -78,7 +80,7 @@ MSCS532_Assignment5/
### Key Observations
- Randomized Quicksort consistently outperforms deterministic Quicksort on sorted and reverse-sorted arrays by avoiding degenerate partitions.
- Both versions exhibit $\(O(n \log n)\)$ scaling on random inputs, aligning with theoretical expectations.
- Both versions exhibit $O(n \log n)$ scaling on random inputs, aligning with theoretical expectations.
- Deterministic Quicksort degrades toward quadratic performance as inputs approach worst-case ordering; randomization flattens this curve.
- Three-way Quicksort (explored in examples/tests) provides strong performance on datasets with heavy duplication.
@@ -90,9 +92,9 @@ In some visualizations (particularly `quicksort_worst_case.png` and `quicksort_c
**Why execution times become infinite:**
1. **Worst-case complexity:** On sorted or reverse-sorted inputs, deterministic Quicksort (using the last element as pivot) creates highly unbalanced partitions, resulting in $\(O(n^2)\)$ time complexity.
1. **Worst-case complexity:** On sorted or reverse-sorted inputs, deterministic Quicksort (using the last element as pivot) creates highly unbalanced partitions, resulting in $O(n^2)$ time complexity.
2. **Recursion depth:** For large arrays (typically ≥ 1,000 elements), the algorithm requires $\(O(n)\)$ recursive calls, which can exceed Python's default recursion limit (usually 1,000) and raise a `RecursionError`.
2. **Recursion depth:** For large arrays (typically ≥ 1,000 elements), the algorithm requires $O(n)$ recursive calls, which can exceed Python's default recursion limit (usually 1,000) and raise a `RecursionError`.
3. **Timeout behavior:** Even when recursion limits are increased, the quadratic time complexity means execution times grow prohibitively large. For arrays of size 5,000 or 10,000, deterministic Quicksort may take minutes or hours to complete, making it impractical for benchmarking.
@@ -112,16 +114,24 @@ In some visualizations (particularly `quicksort_worst_case.png` and `quicksort_c
![Deterministic vs Randomized Comparison](docs/quicksort_comparison_bar.png)
<sub>*Figure 2. Runtime comparison on random, sorted, and reverse-sorted arrays (n = 5,000). Missing bars for deterministic Quicksort on sorted/reverse-sorted inputs indicate execution failures due to worst-case \(O(n^2)\) performance.*</sub>
<sub>*Figure 2. Runtime comparison on random, sorted, and reverse-sorted arrays (n = 5,000). Missing bars for deterministic Quicksort on sorted/reverse-sorted inputs indicate execution failures due to worst-case $O(n^2)$ performance.*</sub>
![Scalability Analysis](docs/quicksort_scalability.png)
<sub>*Figure 3. Log-log visualization of scalability on random inputs with \(O(n \log n)\) reference.*</sub>
<sub>*Figure 3. Log-log visualization of scalability on random inputs with $O(n \log n)$ reference.*</sub>
![Worst-Case Behavior](docs/quicksort_worst_case.png)
<sub>*Figure 4. Worst-case analysis contrasting sorted and reverse-sorted distributions. Missing bars for deterministic Quicksort at larger sizes (≥1,000) indicate execution failures due to recursion limits and quadratic time complexity.*</sub>
![Three-Way Quicksort Comparison](docs/quicksort_3way_comparison.png)
<sub>*Figure 5. Three-way Quicksort vs Standard Quicksort on duplicate-heavy data. Performance comparison across three duplicate configurations: 10 unique values, 5 unique values, and all equal elements. Three-way Quicksort demonstrates superior performance, especially as duplicate frequency increases.*</sub>
![Three-Way Quicksort Bar Chart](docs/quicksort_3way_bar.png)
<sub>*Figure 6. Detailed bar chart comparison of Three-way Quicksort vs Standard Quicksort on duplicate-heavy data at specific array sizes. Shows the performance advantage of three-way partitioning when dealing with many duplicate elements.*</sub>
## Getting Started
### Prerequisites

1400
REPORT.md Normal file

File diff suppressed because it is too large Load Diff

BIN
docs/quicksort_3way_bar.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 181 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 297 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 538 KiB

After

Width:  |  Height:  |  Size: 538 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 217 KiB

After

Width:  |  Height:  |  Size: 202 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 260 KiB

After

Width:  |  Height:  |  Size: 262 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 189 KiB

After

Width:  |  Height:  |  Size: 190 KiB

View File

@@ -16,7 +16,7 @@ from src.comparison import (
generate_array_with_duplicates,
compare_algorithms
)
from src.quicksort import quicksort, randomized_quicksort
from src.quicksort import quicksort, randomized_quicksort, quicksort_3way
import os
@@ -300,6 +300,168 @@ def generate_performance_plots():
print("Plots saved in the 'docs' directory.")
def generate_3way_quicksort_plot():
"""Generate visualization comparing standard Quicksort vs Three-Way Quicksort on duplicate-heavy data."""
print("\nGenerating Three-Way Quicksort comparison plot...")
# Ensure docs directory exists
os.makedirs('docs', exist_ok=True)
# Define algorithms for comparison
algorithms = {
'Standard Quicksort (Randomized)': lambda arr: randomized_quicksort(arr, seed=42),
'Three-Way Quicksort': lambda arr: quicksort_3way(arr)
}
# Test with different duplicate levels
sizes = [100, 500, 1000, 2000, 5000, 10000]
duplicate_configs = [
('10 Unique Values', lambda size: generate_array_with_duplicates(size, unique_count=10)),
('5 Unique Values', lambda size: generate_array_with_duplicates(size, unique_count=5)),
('All Equal', lambda size: [5] * size) # All elements are the same
]
print("Running benchmarks for Three-Way Quicksort comparison...")
# Run benchmarks once for all configurations
all_results = {}
for config_name, gen_func in duplicate_configs:
array_generators = {'Test': gen_func}
results = compare_algorithms(
algorithms=algorithms,
array_generators=array_generators,
sizes=sizes,
iterations=3
)
all_results[config_name] = results
# Create figure with subplots
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
fig.suptitle('Three-Way Quicksort vs Standard Quicksort on Duplicate-Heavy Data',
fontsize=16, fontweight='bold')
colors = ['#1f77b4', '#ff7f0e']
markers = ['o', 's']
for config_idx, (config_name, _) in enumerate(duplicate_configs):
ax = axes[config_idx]
results = all_results[config_name]
for algo_idx, algo_name in enumerate(algorithms.keys()):
if 'Test' in results[algo_name]:
sizes_list = sorted(results[algo_name]['Test'].keys())
# Filter out infinite values
valid_data = [(s, results[algo_name]['Test'][s]['mean'])
for s in sizes_list
if np.isfinite(results[algo_name]['Test'][s]['mean'])]
if valid_data:
valid_sizes, valid_times = zip(*valid_data)
ax.plot(valid_sizes, valid_times, marker=markers[algo_idx],
label=algo_name, color=colors[algo_idx],
linewidth=2.5, markersize=8)
ax.set_xlabel('Array Size', fontsize=11, fontweight='bold')
ax.set_ylabel('Time (seconds)', fontsize=11, fontweight='bold')
ax.set_title(f'{config_name}', fontsize=12, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_xscale('log')
ax.set_yscale('log')
plt.tight_layout()
plt.savefig('docs/quicksort_3way_comparison.png', dpi=300, bbox_inches='tight')
print("Saved: docs/quicksort_3way_comparison.png")
plt.close()
# Also create a bar chart for specific sizes
print("Generating Three-Way Quicksort bar chart...")
fig, ax = plt.subplots(figsize=(14, 8))
# Test specific sizes with different duplicate levels
test_sizes = [1000, 5000, 10000]
x = np.arange(len(test_sizes))
width = 0.25
configs_for_bar = [
('10 Unique', lambda size: generate_array_with_duplicates(size, unique_count=10)),
('5 Unique', lambda size: generate_array_with_duplicates(size, unique_count=5)),
('All Equal', lambda size: [5] * size)
]
bar_colors = ['#1f77b4', '#ff7f0e', '#2ca02c']
# Run benchmarks for bar chart
bar_results = {}
for config_name, gen_func in configs_for_bar:
array_generators = {'Test': gen_func}
results = compare_algorithms(
algorithms=algorithms,
array_generators=array_generators,
sizes=test_sizes,
iterations=3
)
bar_results[config_name] = results
for config_idx, (config_name, _) in enumerate(configs_for_bar):
results = bar_results[config_name]
standard_times = []
threeway_times = []
for size in test_sizes:
std_time = None
way3_time = None
if 'Test' in results['Standard Quicksort (Randomized)']:
if size in results['Standard Quicksort (Randomized)']['Test']:
mean_val = results['Standard Quicksort (Randomized)']['Test'][size]['mean']
if np.isfinite(mean_val):
std_time = mean_val
if 'Test' in results['Three-Way Quicksort']:
if size in results['Three-Way Quicksort']['Test']:
mean_val = results['Three-Way Quicksort']['Test'][size]['mean']
if np.isfinite(mean_val):
way3_time = mean_val
standard_times.append(std_time if std_time is not None else np.nan)
threeway_times.append(way3_time if way3_time is not None else np.nan)
offset = (config_idx - 1) * width
bars1 = ax.bar(x + offset, standard_times, width/2,
label=f'Standard ({config_name})',
color=bar_colors[config_idx], alpha=0.7)
bars2 = ax.bar(x + offset + width/2, threeway_times, width/2,
label=f'Three-Way ({config_name})',
color=bar_colors[config_idx], alpha=0.9)
# Add value labels
for bars in [bars1, bars2]:
for bar in bars:
height = bar.get_height()
if height > 0 and np.isfinite(height):
ax.text(bar.get_x() + bar.get_width()/2., height,
f'{height:.4f}s',
ha='center', va='bottom', fontsize=8, rotation=90)
ax.set_xlabel('Array Size', fontsize=12, fontweight='bold')
ax.set_ylabel('Time (seconds)', fontsize=12, fontweight='bold')
ax.set_title('Three-Way Quicksort Performance on Duplicate-Heavy Data',
fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels([str(s) for s in test_sizes])
ax.legend(fontsize=10, ncol=3, loc='upper left')
ax.grid(True, alpha=0.3, axis='y')
ax.set_yscale('log')
plt.tight_layout()
plt.savefig('docs/quicksort_3way_bar.png', dpi=300, bbox_inches='tight')
print("Saved: docs/quicksort_3way_bar.png")
plt.close()
print("Three-Way Quicksort plots generated successfully!")
if __name__ == '__main__':
generate_performance_plots()
generate_3way_quicksort_plot()