adding report

2025-11-16 15:59:13 -05:00
parent 1b44cb389f
commit 0c9fb776ae
9 changed files with 1586 additions and 14 deletions
--- a/README.md
+++ b/README.md
@@ -17,7 +17,9 @@ MSCS532_Assignment5/
 │   ├── quicksort_comparison.png           # Comparative performance (line plots)
 │   ├── quicksort_comparison_bar.png       # Deterministic vs randomized comparison
 │   ├── quicksort_scalability.png          # Scalability on random inputs
-│   └── quicksort_worst_case.png           # Worst-case behavior on sorted inputs
+│   ├── quicksort_worst_case.png           # Worst-case behavior on sorted inputs
+│   ├── quicksort_3way_comparison.png      # Three-way vs standard on duplicates (line plots)
+│   └── quicksort_3way_bar.png             # Three-way vs standard on duplicates (bar chart)
 ├── examples/
 │   ├── quicksort_demo.py                  # Usage demonstrations
 │   ├── comparison_demo.py                 # Benchmark walkthrough
@@ -52,18 +54,18 @@ MSCS532_Assignment5/

 | Scenario      | Deterministic Quicksort | Randomized Quicksort | Notes |
 |---------------|-------------------------|----------------------|-------|
-| Best Case     | $\(O(n \log n)\)$         | $\(O(n \log n)\)$      | Balanced partitions from median pivots |
-| Average Case  | $\(O(n \log n)\)$         | $\(O(n \log n)\)$      | Expected logarithmic recursion depth |
-| Worst Case    | $\(O(n^2)\)$              | $\(O(n^2)\)$           | Occurs with highly unbalanced splits |
+| Best Case     | $O(n \log n)$         | $O(n \log n)$      | Balanced partitions from median pivots |
+| Average Case  | $O(n \log n)$         | $O(n \log n)$      | Expected logarithmic recursion depth |
+| Worst Case    | $O(n^2)$              | $O(n^2)$           | Occurs with highly unbalanced splits |

- **Average-case intuition:** Balanced partitions of size \(n/2\) produce the recurrence $\(T(n) = 2T(n/2) + O(n)\)$, which resolves to $\(O(n \log n)\)$.
- **Worst-case intuition:** Consistently poor pivots reduce the problem by one element, yielding $\(T(n) = T(n - 1) + O(n)\)$ and $\(O(n^2)\)$ behavior.
- **Space complexity:** $\(O(\log n)\)$ expected stack depth for balanced recursion, $\(O(n)\)$ in the worst case. Randomized pivot selection significantly decreases the probability of worst-case depth on adversarial inputs.
+- **Average-case intuition:** Balanced partitions of size $n/2$ produce the recurrence $T(n) = 2T(n/2) + O(n)$, which resolves to $O(n \log n)$.
+- **Worst-case intuition:** Consistently poor pivots reduce the problem by one element, yielding $T(n) = T(n - 1) + O(n)$ and $O(n^2)$ behavior.
+- **Space complexity:** $O(\log n)$ expected stack depth for balanced recursion, $O(n)$ in the worst case. Randomized pivot selection significantly decreases the probability of worst-case depth on adversarial inputs.

 ## 3. Randomized Quicksort

 - Randomization chooses pivots uniformly at random, ensuring that any specific pivot ordering is unlikely.
- While the theoretical worst case remains $\(O(n^2)\)$, the probability of encountering it drops exponentially with input size.
+- While the theoretical worst case remains $O(n^2)$, the probability of encountering it drops exponentially with input size.
 - The implementation exposes an optional `seed` to guarantee repeatable experimental runs while retaining stochastic behavior by default.

 ## 4. Empirical Analysis
@@ -78,7 +80,7 @@ MSCS532_Assignment5/
 ### Key Observations

 - Randomized Quicksort consistently outperforms deterministic Quicksort on sorted and reverse-sorted arrays by avoiding degenerate partitions.
- Both versions exhibit $\(O(n \log n)\)$ scaling on random inputs, aligning with theoretical expectations.
+- Both versions exhibit $O(n \log n)$ scaling on random inputs, aligning with theoretical expectations.
 - Deterministic Quicksort degrades toward quadratic performance as inputs approach worst-case ordering; randomization flattens this curve.
 - Three-way Quicksort (explored in examples/tests) provides strong performance on datasets with heavy duplication.

@@ -90,9 +92,9 @@ In some visualizations (particularly `quicksort_worst_case.png` and `quicksort_c

 **Why execution times become infinite:**

-1. **Worst-case complexity:** On sorted or reverse-sorted inputs, deterministic Quicksort (using the last element as pivot) creates highly unbalanced partitions, resulting in $\(O(n^2)\)$ time complexity.
+1. **Worst-case complexity:** On sorted or reverse-sorted inputs, deterministic Quicksort (using the last element as pivot) creates highly unbalanced partitions, resulting in $O(n^2)$ time complexity.

-2. **Recursion depth:** For large arrays (typically ≥ 1,000 elements), the algorithm requires $\(O(n)\)$ recursive calls, which can exceed Python's default recursion limit (usually 1,000) and raise a `RecursionError`.
+2. **Recursion depth:** For large arrays (typically ≥ 1,000 elements), the algorithm requires $O(n)$ recursive calls, which can exceed Python's default recursion limit (usually 1,000) and raise a `RecursionError`.

 3. **Timeout behavior:** Even when recursion limits are increased, the quadratic time complexity means execution times grow prohibitively large. For arrays of size 5,000 or 10,000, deterministic Quicksort may take minutes or hours to complete, making it impractical for benchmarking.

@@ -112,16 +114,24 @@ In some visualizations (particularly `quicksort_worst_case.png` and `quicksort_c

 ![Deterministic vs Randomized Comparison](docs/quicksort_comparison_bar.png)

-<sub>*Figure 2. Runtime comparison on random, sorted, and reverse-sorted arrays (n = 5,000). Missing bars for deterministic Quicksort on sorted/reverse-sorted inputs indicate execution failures due to worst-case \(O(n^2)\) performance.*</sub>
+<sub>*Figure 2. Runtime comparison on random, sorted, and reverse-sorted arrays (n = 5,000). Missing bars for deterministic Quicksort on sorted/reverse-sorted inputs indicate execution failures due to worst-case $O(n^2)$ performance.*</sub>

 ![Scalability Analysis](docs/quicksort_scalability.png)

-<sub>*Figure 3. Log-log visualization of scalability on random inputs with \(O(n \log n)\) reference.*</sub>
+<sub>*Figure 3. Log-log visualization of scalability on random inputs with $O(n \log n)$ reference.*</sub>

 ![Worst-Case Behavior](docs/quicksort_worst_case.png)

 <sub>*Figure 4. Worst-case analysis contrasting sorted and reverse-sorted distributions. Missing bars for deterministic Quicksort at larger sizes (≥1,000) indicate execution failures due to recursion limits and quadratic time complexity.*</sub>

+![Three-Way Quicksort Comparison](docs/quicksort_3way_comparison.png)
+
+<sub>*Figure 5. Three-way Quicksort vs Standard Quicksort on duplicate-heavy data. Performance comparison across three duplicate configurations: 10 unique values, 5 unique values, and all equal elements. Three-way Quicksort demonstrates superior performance, especially as duplicate frequency increases.*</sub>
+
+![Three-Way Quicksort Bar Chart](docs/quicksort_3way_bar.png)
+
+<sub>*Figure 6. Detailed bar chart comparison of Three-way Quicksort vs Standard Quicksort on duplicate-heavy data at specific array sizes. Shows the performance advantage of three-way partitioning when dealing with many duplicate elements.*</sub>
+
 ## Getting Started

 ### Prerequisites
--- a/REPORT.md
+++ b/REPORT.md
--- a/docs/quicksort_3way_bar.png
+++ b/docs/quicksort_3way_bar.png
--- a/docs/quicksort_3way_comparison.png
+++ b/docs/quicksort_3way_comparison.png
--- a/docs/quicksort_comparison.png
+++ b/docs/quicksort_comparison.png
--- a/docs/quicksort_comparison_bar.png
+++ b/docs/quicksort_comparison_bar.png
--- a/docs/quicksort_scalability.png
+++ b/docs/quicksort_scalability.png
--- a/docs/quicksort_worst_case.png
+++ b/docs/quicksort_worst_case.png
--- a/examples/generate_plots.py
+++ b/examples/generate_plots.py
@@ -16,7 +16,7 @@ from src.comparison import (
    generate_array_with_duplicates,
    compare_algorithms
 )
-from src.quicksort import quicksort, randomized_quicksort
+from src.quicksort import quicksort, randomized_quicksort, quicksort_3way
 import os


@@ -300,6 +300,168 @@ def generate_performance_plots():
    print("Plots saved in the 'docs' directory.")


+def generate_3way_quicksort_plot():
+    """Generate visualization comparing standard Quicksort vs Three-Way Quicksort on duplicate-heavy data."""
+    print("\nGenerating Three-Way Quicksort comparison plot...")
+    
+    # Ensure docs directory exists
+    os.makedirs('docs', exist_ok=True)
+    
+    # Define algorithms for comparison
+    algorithms = {
+        'Standard Quicksort (Randomized)': lambda arr: randomized_quicksort(arr, seed=42),
+        'Three-Way Quicksort': lambda arr: quicksort_3way(arr)
+    }
+    
+    # Test with different duplicate levels
+    sizes = [100, 500, 1000, 2000, 5000, 10000]
+    duplicate_configs = [
+        ('10 Unique Values', lambda size: generate_array_with_duplicates(size, unique_count=10)),
+        ('5 Unique Values', lambda size: generate_array_with_duplicates(size, unique_count=5)),
+        ('All Equal', lambda size: [5] * size)  # All elements are the same
+    ]
+    
+    print("Running benchmarks for Three-Way Quicksort comparison...")
+    
+    # Run benchmarks once for all configurations
+    all_results = {}
+    for config_name, gen_func in duplicate_configs:
+        array_generators = {'Test': gen_func}
+        results = compare_algorithms(
+            algorithms=algorithms,
+            array_generators=array_generators,
+            sizes=sizes,
+            iterations=3
+        )
+        all_results[config_name] = results
+    
+    # Create figure with subplots
+    fig, axes = plt.subplots(1, 3, figsize=(18, 6))
+    fig.suptitle('Three-Way Quicksort vs Standard Quicksort on Duplicate-Heavy Data', 
+                 fontsize=16, fontweight='bold')
+    
+    colors = ['#1f77b4', '#ff7f0e']
+    markers = ['o', 's']
+    
+    for config_idx, (config_name, _) in enumerate(duplicate_configs):
+        ax = axes[config_idx]
+        results = all_results[config_name]
+        
+        for algo_idx, algo_name in enumerate(algorithms.keys()):
+            if 'Test' in results[algo_name]:
+                sizes_list = sorted(results[algo_name]['Test'].keys())
+                # Filter out infinite values
+                valid_data = [(s, results[algo_name]['Test'][s]['mean']) 
+                             for s in sizes_list 
+                             if np.isfinite(results[algo_name]['Test'][s]['mean'])]
+                if valid_data:
+                    valid_sizes, valid_times = zip(*valid_data)
+                    ax.plot(valid_sizes, valid_times, marker=markers[algo_idx], 
+                           label=algo_name, color=colors[algo_idx],
+                           linewidth=2.5, markersize=8)
+        
+        ax.set_xlabel('Array Size', fontsize=11, fontweight='bold')
+        ax.set_ylabel('Time (seconds)', fontsize=11, fontweight='bold')
+        ax.set_title(f'{config_name}', fontsize=12, fontweight='bold')
+        ax.legend(fontsize=10)
+        ax.grid(True, alpha=0.3)
+        ax.set_xscale('log')
+        ax.set_yscale('log')
+    
+    plt.tight_layout()
+    plt.savefig('docs/quicksort_3way_comparison.png', dpi=300, bbox_inches='tight')
+    print("Saved: docs/quicksort_3way_comparison.png")
+    plt.close()
+    
+    # Also create a bar chart for specific sizes
+    print("Generating Three-Way Quicksort bar chart...")
+    fig, ax = plt.subplots(figsize=(14, 8))
+    
+    # Test specific sizes with different duplicate levels
+    test_sizes = [1000, 5000, 10000]
+    x = np.arange(len(test_sizes))
+    width = 0.25
+    
+    configs_for_bar = [
+        ('10 Unique', lambda size: generate_array_with_duplicates(size, unique_count=10)),
+        ('5 Unique', lambda size: generate_array_with_duplicates(size, unique_count=5)),
+        ('All Equal', lambda size: [5] * size)
+    ]
+    
+    bar_colors = ['#1f77b4', '#ff7f0e', '#2ca02c']
+    
+    # Run benchmarks for bar chart
+    bar_results = {}
+    for config_name, gen_func in configs_for_bar:
+        array_generators = {'Test': gen_func}
+        results = compare_algorithms(
+            algorithms=algorithms,
+            array_generators=array_generators,
+            sizes=test_sizes,
+            iterations=3
+        )
+        bar_results[config_name] = results
+    
+    for config_idx, (config_name, _) in enumerate(configs_for_bar):
+        results = bar_results[config_name]
+        standard_times = []
+        threeway_times = []
+        
+        for size in test_sizes:
+            std_time = None
+            way3_time = None
+            
+            if 'Test' in results['Standard Quicksort (Randomized)']:
+                if size in results['Standard Quicksort (Randomized)']['Test']:
+                    mean_val = results['Standard Quicksort (Randomized)']['Test'][size]['mean']
+                    if np.isfinite(mean_val):
+                        std_time = mean_val
+            
+            if 'Test' in results['Three-Way Quicksort']:
+                if size in results['Three-Way Quicksort']['Test']:
+                    mean_val = results['Three-Way Quicksort']['Test'][size]['mean']
+                    if np.isfinite(mean_val):
+                        way3_time = mean_val
+            
+            standard_times.append(std_time if std_time is not None else np.nan)
+            threeway_times.append(way3_time if way3_time is not None else np.nan)
+        
+        offset = (config_idx - 1) * width
+        bars1 = ax.bar(x + offset, standard_times, width/2, 
+                      label=f'Standard ({config_name})', 
+                      color=bar_colors[config_idx], alpha=0.7)
+        bars2 = ax.bar(x + offset + width/2, threeway_times, width/2, 
+                      label=f'Three-Way ({config_name})', 
+                      color=bar_colors[config_idx], alpha=0.9)
+        
+        # Add value labels
+        for bars in [bars1, bars2]:
+            for bar in bars:
+                height = bar.get_height()
+                if height > 0 and np.isfinite(height):
+                    ax.text(bar.get_x() + bar.get_width()/2., height,
+                           f'{height:.4f}s',
+                           ha='center', va='bottom', fontsize=8, rotation=90)
+    
+    ax.set_xlabel('Array Size', fontsize=12, fontweight='bold')
+    ax.set_ylabel('Time (seconds)', fontsize=12, fontweight='bold')
+    ax.set_title('Three-Way Quicksort Performance on Duplicate-Heavy Data', 
+                fontsize=14, fontweight='bold')
+    ax.set_xticks(x)
+    ax.set_xticklabels([str(s) for s in test_sizes])
+    ax.legend(fontsize=10, ncol=3, loc='upper left')
+    ax.grid(True, alpha=0.3, axis='y')
+    ax.set_yscale('log')
+    
+    plt.tight_layout()
+    plt.savefig('docs/quicksort_3way_bar.png', dpi=300, bbox_inches='tight')
+    print("Saved: docs/quicksort_3way_bar.png")
+    plt.close()
+    
+    print("Three-Way Quicksort plots generated successfully!")
+
+
 if __name__ == '__main__':
    generate_performance_plots()
+    generate_3way_quicksort_plot()