Initial commit: Divide-and-conquer sorting algorithms benchmark

- Implement Merge Sort and Quick Sort algorithms with instrumentation - Add Quick Sort pivot strategies: first, last, median_of_three, random - Create dataset generators for 5 dataset types (sorted, reverse, random, nearly_sorted, duplicates_heavy) - Build comprehensive benchmarking CLI with metrics collection - Add performance measurement (time, memory, comparisons, swaps) - Configure logging with rotating file handlers - Generate plots for time and memory vs size - Include comprehensive test suite with pytest - Add full documentation in README.md
2025-10-30 21:14:37 -04:00
commit 10570af981
15 changed files with 1518 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,54 @@
 # Python
 __pycache__/
 *.py[cod]
 *$py.class
 *.so
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 # Virtual environments
 venv/
 env/
 ENV/
 .venv
 # IDE
 .vscode/
 .idea/
 *.swp
 *.swo
 *~
 # Project specific
 results/
 plots/
 *.log
 # OS
 .DS_Store
 Thumbs.db
 # Testing
 .pytest_cache/
 .coverage
 htmlcov/
 .tox/
 # Type checking
 .mypy_cache/
 .ruff_cache/
--- a/README.md
+++ b/README.md
@@ -0,0 +1,368 @@
 # Divide-and-Conquer Sorting Algorithms Benchmark
 A comprehensive Python project for benchmarking merge sort and quick sort algorithms across different dataset types and sizes, with detailed performance metrics, logging, and visualization.
 ## Project Overview
 This project implements two divide-and-conquer sorting algorithms (Merge Sort and Quick Sort) and provides a benchmarking framework to evaluate their performance across various dataset characteristics:
 - **Merge Sort**: Stable, O(n log n) worst-case time complexity
 - **Quick Sort**: In-place, O(n log n) average-case with configurable pivot strategies
 The benchmark suite measures:
 - Wall-clock time (using `time.perf_counter`)
 - Peak memory usage (using `tracemalloc` and `psutil`)
 - Comparison and swap counts (when instrumentation is enabled)
 - Correctness verification (comparing against Python's `sorted()`)
 ## Project Structure
 ```
 .
 ├── src/
 │   ├── algorithms/
 │   │   ├── merge_sort.py      # Merge sort implementation
 │   │   └── quick_sort.py       # Quick sort with pivot strategies
 │   └── bench/
 │       ├── benchmark.py         # Main CLI benchmark runner
 │       ├── datasets.py          # Dataset generators
 │       ├── metrics.py           # Performance measurement utilities
 │       └── logging_setup.py     # Logging configuration
 ├── tests/
 │   └── test_sorts.py            # Comprehensive test suite
 ├── scripts/
 │   └── run_benchmarks.sh        # Convenience script to run benchmarks
 ├── results/                     # Auto-created: CSV, JSON, logs
 ├── plots/                       # Auto-created: PNG visualizations
 ├── pyproject.toml               # Project configuration and dependencies
 ├── .gitignore
 └── README.md                    # This file
 ```
 ## Installation
 ### Prerequisites
 - Python 3.8 or higher
 - pip (Python package manager)
 ### Setup
 1. Clone the repository:
 ```bash
 git clone <repository-url>
 cd divide-and-conquer-analysis
 ```
 2. Install dependencies:
 ```bash
 pip install -e .
 ```
 Or install from `pyproject.toml`:
 ```bash
 pip install -e ".[dev]"  # Includes dev dependencies (mypy, ruff, black)
 ```
 ## Quick Start
 ### Run a Simple Benchmark
 ```bash
 python -m src.bench.benchmark \
    --algorithms merge,quick \
    --datasets sorted,reverse,random \
    --sizes 1000,5000,10000 \
    --runs 5 \
    --seed 42 \
    --instrument \
    --make-plots
 ```
 ### Use the Convenience Script
 ```bash
 ./scripts/run_benchmarks.sh
 ```
 ## CLI Usage
 The benchmark CLI (`src.bench.benchmark`) supports the following arguments:
 ### Required Arguments
 None (all have defaults)
 ### Optional Arguments
 - `--algorithms`: Comma-separated list of algorithms to benchmark
  - Options: `merge`, `quick`
  - Default: `merge,quick`
 - `--pivot`: Pivot strategy for Quick Sort
  - Options: `first`, `last`, `median_of_three`, `random`
  - Default: `random`
 - `--datasets`: Comma-separated list of dataset types
  - Options: `sorted`, `reverse`, `random`, `nearly_sorted`, `duplicates_heavy`
  - Default: `sorted,reverse,random,nearly_sorted,duplicates_heavy`
 - `--sizes`: Comma-separated list of dataset sizes
  - Default: `1000,5000,10000,50000`
  - Example: `--sizes 1000,5000,10000,50000,100000`
 - `--runs`: Number of runs per experiment (for statistical significance)
  - Default: `5`
 - `--seed`: Random seed for reproducibility
  - Default: `42`
 - `--outdir`: Output directory for results
  - Default: `results`
 - `--log-level`: Logging level
  - Options: `DEBUG`, `INFO`, `WARNING`, `ERROR`
  - Default: `INFO`
 - `--instrument`: Enable counting of comparisons and swaps
  - Flag (no value)
 - `--make-plots`: Generate plots after benchmarking
  - Flag (no value)
 ### Example CLI Commands
 **Basic benchmark with default settings:**
 ```bash
 python -m src.bench.benchmark
 ```
 **Full benchmark with all options:**
 ```bash
 python -m src.bench.benchmark \
    --algorithms merge,quick \
    --pivot random \
    --datasets sorted,reverse,random,nearly_sorted,duplicates_heavy \
    --sizes 1000,5000,10000,50000 \
    --runs 5 \
    --seed 42 \
    --instrument \
    --outdir results \
    --log-level INFO \
    --make-plots
 ```
 **Compare pivot strategies:**
 ```bash
 for pivot in first last median_of_three random; do
    python -m src.bench.benchmark \
        --algorithms quick \
        --pivot $pivot \
        --datasets random \
        --sizes 10000,50000 \
        --runs 10 \
        --seed 42
 done
 ```
 **Quick performance check:**
 ```bash
 python -m src.bench.benchmark \
    --algorithms merge,quick \
    --datasets random \
    --sizes 10000 \
    --runs 3 \
    --make-plots
 ```
 ## Output Files
 ### Results Directory (`results/`)
 - **`bench_results.csv`**: Detailed results in CSV format
  - Columns: `algorithm`, `pivot`, `dataset`, `size`, `run`, `time_s`, `peak_mem_bytes`, `comparisons`, `swaps`, `seed`
  - One row per run
 - **`summary.json`**: Aggregated statistics per (algorithm, dataset, size) combination
  - Includes: mean, std, best, worst times and memory
  - Comparison and swap statistics (if instrumentation enabled)
 - **`bench.log`**: Rotating log file (max 10MB, 5 backups)
  - Contains: system info, run metadata, progress logs, errors
 ### Plots Directory (`plots/`)
 - **`time_vs_size.png`**: Line chart showing sorting time vs array size
  - Separate subplot for each dataset type
  - One line per algorithm
 - **`memory_vs_size.png`**: Line chart showing memory usage vs array size
  - Separate subplot for each dataset type
  - One line per algorithm
 ## Reproducing Results
 ### Generate a Plot
 After running benchmarks:
 ```bash
 python -m src.bench.benchmark \
    --algorithms merge,quick \
    --datasets random \
    --sizes 1000,5000,10000,50000 \
    --runs 5 \
    --seed 42 \
    --make-plots
 ```
 Plots will be automatically generated in `plots/` directory.
 ### Generate CSV from Scratch
 ```bash
 python -m src.bench.benchmark \
    --algorithms merge \
    --datasets sorted,reverse,random \
    --sizes 1000,5000 \
    --runs 3 \
    --seed 42 \
    --outdir results
 ```
 Check `results/bench_results.csv` for the output.
 ## Logging
 Logging is configured via `src.bench.logging_setup.py`:
 - **Console output**: Formatted with timestamp, level, and message
 - **File output**: Detailed logs with function names and line numbers
 - **Rotation**: Log files rotate at 10MB, keeping 5 backups
 - **Metadata**: Logs include Python version, OS, architecture, and git commit (if available)
 ### Log Levels
 - `DEBUG`: Detailed diagnostic information
 - `INFO`: General informational messages (default)
 - `WARNING`: Warning messages
 - `ERROR`: Error messages
 ### Example Log Output
 ```
 2024-01-15 10:30:00 - __main__ - INFO - ================================================================================
 2024-01-15 10:30:00 - __main__ - INFO - Benchmark session started
 2024-01-15 10:30:00 - __main__ - INFO - Python version: 3.10.5
 2024-01-15 10:30:00 - __main__ - INFO - Platform: macOS-13.0
 2024-01-15 10:30:00 - __main__ - INFO - Running merge on random size=1000 run=1/5
 ```
 ## Testing
 Run the test suite:
 ```bash
 pytest tests/ -v
 ```
 Run with coverage:
 ```bash
 pytest tests/ --cov=src --cov-report=html
 ```
 ### Test Coverage
 The test suite includes:
 1. **Unit Tests**:
   - Empty arrays
   - Single element arrays
   - Already sorted arrays
   - Reverse sorted arrays
   - Random arrays
   - Arrays with duplicates
   - Large arrays
   - Instrumentation tests
 2. **Property Tests**:
   - Comparison with Python's `sorted()` on random arrays
   - Multiple sizes and pivot strategies
 ## Code Quality
 ### Type Checking
 ```bash
 mypy src/ tests/
 ```
 ### Linting
 ```bash
 ruff check src/ tests/
 ```
 ### Formatting
 ```bash
 ruff format src/ tests/
 ```
 Or using black:
 ```bash
 black src/ tests/
 ```
 ## Algorithm Details
 ### Merge Sort
 - **Time Complexity**: O(n log n) worst-case, average-case, best-case
 - **Space Complexity**: O(n)
 - **Stability**: Stable
 - **Implementation**: Recursive divide-and-conquer with merging
 ### Quick Sort
 - **Time Complexity**: O(n log n) average-case, O(n²) worst-case
 - **Space Complexity**: O(log n) average-case (recursion stack)
 - **Stability**: Not stable (in-place implementation)
 - **Pivot Strategies**:
  - `first`: Always use first element (O(n²) on sorted arrays)
  - `last`: Always use last element (O(n²) on reverse sorted arrays)
  - `median_of_three`: Use median of first, middle, last
  - `random`: Random pivot (good average performance)
 ## Dataset Types
 1. **sorted**: Array already in ascending order `[0, 1, 2, ..., n-1]`
 2. **reverse**: Array in descending order `[n-1, n-2, ..., 0]`
 3. **random**: Random integers from `[0, 10*n)` range
 4. **nearly_sorted**: Sorted array with ~1% of elements swapped
 5. **duplicates_heavy**: Array with many duplicate values (only `n/10` distinct values)
 ## Performance Considerations
 - Benchmarks use `time.perf_counter()` for high-resolution timing
 - Memory measurement uses both `tracemalloc` and `psutil` for accuracy
 - Multiple runs per experiment reduce variance
 - Seeded randomness ensures reproducibility
 ## Contributing
 1. Follow Python type hints (checked with mypy)
 2. Maintain test coverage
 3. Run linting before committing
 4. Update README for significant changes
 ## License
 [Specify your license here]
 ## Acknowledgments
 - Algorithms based on standard divide-and-conquer implementations
 - Benchmarking framework inspired by best practices in performance testing
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,49 @@
 [build-system]
 requires = ["setuptools>=61.0", "wheel"]
 build-backend = "setuptools.build_meta"
 [project]
 name = "algorithms-week2"
 version = "0.1.0"
 description = "Divide-and-conquer sorting algorithms benchmark"
 requires-python = ">=3.8"
 dependencies = [
    "numpy>=1.21.0",
    "psutil>=5.8.0",
    "matplotlib>=3.5.0",
    "pytest>=7.0.0",
    "pytest-cov>=3.0.0",
 ]
 [project.optional-dependencies]
 dev = [
    "mypy>=0.950",
    "ruff>=0.0.200",
    "black>=22.0.0",
 ]
 [tool.mypy]
 python_version = "3.8"
 warn_return_any = true
 warn_unused_configs = true
 disallow_untyped_defs = false
 check_untyped_defs = true
 [tool.ruff]
 line-length = 100
 target-version = "py38"
 [tool.ruff.lint]
 select = ["E", "F", "W", "I"]
 ignore = ["E501"]
 [tool.black]
 line-length = 100
 target-version = ["py38"]
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 python_files = "test_*.py"
 python_classes = "Test*"
 python_functions = "test_*"
--- a/scripts/run_benchmarks.sh
+++ b/scripts/run_benchmarks.sh
@@ -0,0 +1,20 @@
 #!/bin/bash
 # Run benchmarks script
 set -e
 echo "Running sorting algorithm benchmarks..."
 python -m src.bench.benchmark \
    --algorithms merge,quick \
    --datasets sorted,reverse,random,nearly_sorted,duplicates_heavy \
    --sizes 1000,5000,10000,50000 \
    --runs 5 \
    --seed 42 \
    --instrument \
    --outdir results \
    --log-level INFO \
    --make-plots
 echo "Benchmarks completed. Check results/ and plots/ directories."
--- a/src/init.py
+++ b/src/init.py
@@ -0,0 +1,2 @@
 """Algorithms Week 2: Divide-and-Conquer Sorting Benchmarks."""
--- a/src/algorithms/init.py
+++ b/src/algorithms/init.py
@@ -0,0 +1,2 @@
 """Sorting algorithm implementations."""
--- a/src/algorithms/merge_sort.py
+++ b/src/algorithms/merge_sort.py
@@ -0,0 +1,53 @@
 """Merge Sort implementation with instrumentation support."""
 from typing import List, Optional, Callable
 def merge_sort(
    arr: List[int],
    instrument: Optional[Callable[[str], None]] = None,
 ) -> List[int]:
    """
    Sort array using merge sort algorithm.
    Args:
        arr: List of integers to sort
        instrument: Optional callback function for counting operations.
                    Called with 'comparison' or 'swap' strings.
    Returns:
        Sorted copy of the input array.
    """
    if len(arr) <= 1:
        return arr[:]
    def _merge(left: List[int], right: List[int]) -> List[int]:
        """Merge two sorted arrays."""
        result: List[int] = []
        i, j = 0, 0
        while i < len(left) and j < len(right):
            if instrument:
                instrument("comparison")
            if left[i] <= right[j]:
                result.append(left[i])
                i += 1
            else:
                result.append(right[j])
                j += 1
        result.extend(left[i:])
        result.extend(right[j:])
        return result
    def _merge_sort_recursive(arr_inner: List[int]) -> List[int]:
        """Recursive merge sort helper."""
        if len(arr_inner) <= 1:
            return arr_inner[:]
        mid = len(arr_inner) // 2
        left = _merge_sort_recursive(arr_inner[:mid])
        right = _merge_sort_recursive(arr_inner[mid:])
        return _merge(left, right)
    return _merge_sort_recursive(arr)
--- a/src/algorithms/quick_sort.py
+++ b/src/algorithms/quick_sort.py
@@ -0,0 +1,97 @@
 """Quick Sort implementation with pivot strategies and instrumentation support."""
 from typing import List, Optional, Callable, Literal
 import random
 PivotStrategy = Literal["first", "last", "median_of_three", "random"]
 def quick_sort(
    arr: List[int],
    pivot_strategy: PivotStrategy = "first",
    instrument: Optional[Callable[[str], None]] = None,
    seed: Optional[int] = None,
 ) -> List[int]:
    """
    Sort array using quick sort algorithm.
    Args:
        arr: List of integers to sort
        pivot_strategy: Strategy for selecting pivot ('first', 'last', 
                        'median_of_three', 'random')
        instrument: Optional callback function for counting operations.
                    Called with 'comparison' or 'swap' strings.
        seed: Optional random seed for 'random' pivot strategy
    Returns:
        Sorted copy of the input array.
    """
    if len(arr) <= 1:
        return arr[:]
    arr_copy = arr[:]
    def _choose_pivot(left: int, right: int) -> int:
        """Choose pivot index based on strategy."""
        if pivot_strategy == "first":
            return left
        elif pivot_strategy == "last":
            return right
        elif pivot_strategy == "median_of_three":
            mid = (left + right) // 2
            if instrument:
                instrument("comparison")
                instrument("comparison")
            if arr_copy[left] <= arr_copy[mid] <= arr_copy[right] or \
               arr_copy[right] <= arr_copy[mid] <= arr_copy[left]:
                return mid
            elif arr_copy[mid] <= arr_copy[left] <= arr_copy[right] or \
                 arr_copy[right] <= arr_copy[left] <= arr_copy[mid]:
                return left
            else:
                return right
        elif pivot_strategy == "random":
            return random.randint(left, right)
        else:
            raise ValueError(f"Unknown pivot strategy: {pivot_strategy}")
    def _partition(left: int, right: int, pivot_idx: int) -> int:
        """Partition array around pivot and return final pivot position."""
        pivot_val = arr_copy[pivot_idx]
        # Move pivot to end
        arr_copy[pivot_idx], arr_copy[right] = arr_copy[right], arr_copy[pivot_idx]
        if instrument:
            instrument("swap")
        store_idx = left
        for i in range(left, right):
            if instrument:
                instrument("comparison")
            if arr_copy[i] <= pivot_val:
                if i != store_idx:
                    arr_copy[i], arr_copy[store_idx] = arr_copy[store_idx], arr_copy[i]
                    if instrument:
                        instrument("swap")
                store_idx += 1
        # Move pivot to final position
        arr_copy[store_idx], arr_copy[right] = arr_copy[right], arr_copy[store_idx]
        if instrument:
            instrument("swap")
        return store_idx
    def _quick_sort_recursive(left: int, right: int) -> None:
        """Recursive quick sort helper."""
        if left < right:
            pivot_idx = _choose_pivot(left, right)
            final_pivot = _partition(left, right, pivot_idx)
            _quick_sort_recursive(left, final_pivot - 1)
            _quick_sort_recursive(final_pivot + 1, right)
    if seed is not None:
        random.seed(seed)
    _quick_sort_recursive(0, len(arr_copy) - 1)
    return arr_copy
--- a/src/bench/init.py
+++ b/src/bench/init.py
@@ -0,0 +1,2 @@
 """Benchmarking utilities."""
--- a/src/bench/benchmark.py
+++ b/src/bench/benchmark.py
@@ -0,0 +1,433 @@
 """Benchmark CLI for sorting algorithms."""
 import argparse
 import csv
 import json
 import sys
 from pathlib import Path
 from typing import List, Dict, Any, Optional
 import random
 from src.algorithms.merge_sort import merge_sort
 from src.algorithms.quick_sort import quick_sort, PivotStrategy
 from src.bench.datasets import generate_dataset, DatasetType
 from src.bench.metrics import measure_sort_performance, Metrics, aggregate_metrics
 from src.bench.logging_setup import setup_logging, get_logger
 def parse_args() -> argparse.Namespace:
    """Parse command line arguments."""
    parser = argparse.ArgumentParser(
        description="Benchmark divide-and-conquer sorting algorithms",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    )
    parser.add_argument(
        "--algorithms",
        type=str,
        default="merge,quick",
        help="Comma-separated list of algorithms (merge, quick)",
    )
    parser.add_argument(
        "--pivot",
        type=str,
        default="random",
        choices=["first", "last", "median_of_three", "random"],
        help="Pivot strategy for Quick Sort",
    )
    parser.add_argument(
        "--datasets",
        type=str,
        default="sorted,reverse,random,nearly_sorted,duplicates_heavy",
        help="Comma-separated list of dataset types",
    )
    parser.add_argument(
        "--sizes",
        type=str,
        default="1000,5000,10000,50000",
        help="Comma-separated list of dataset sizes",
    )
    parser.add_argument(
        "--runs",
        type=int,
        default=5,
        help="Number of runs per experiment",
    )
    parser.add_argument(
        "--seed",
        type=int,
        default=42,
        help="Random seed for reproducibility",
    )
    parser.add_argument(
        "--outdir",
        type=str,
        default="results",
        help="Output directory for results",
    )
    parser.add_argument(
        "--log-level",
        type=str,
        default="INFO",
        choices=["DEBUG", "INFO", "WARNING", "ERROR"],
        help="Logging level",
    )
    parser.add_argument(
        "--instrument",
        action="store_true",
        help="Count comparisons and swaps",
    )
    parser.add_argument(
        "--make-plots",
        action="store_true",
        help="Generate plots after benchmarking",
    )
    return parser.parse_args()
 def run_benchmark(
    algorithm: str,
    pivot_strategy: Optional[str],
    dataset_type: DatasetType,
    size: int,
    runs: int,
    seed: int,
    instrument: bool,
    logger: Any,
 ) -> List[Dict[str, Any]]:
    """
    Run benchmark for a single algorithm/dataset/size combination.
    Returns:
        List of result dictionaries, one per run
    """
    results: List[Dict[str, Any]] = []
    # Get sort function
    if algorithm == "merge":
        sort_func = merge_sort
        sort_kwargs: Dict[str, Any] = {}
    elif algorithm == "quick":
        sort_func = quick_sort
        sort_kwargs = {
            "pivot_strategy": pivot_strategy or "first",
        }
        # Only pass seed for random pivot strategy
        if pivot_strategy == "random":
            sort_kwargs["seed"] = seed
    else:
        raise ValueError(f"Unknown algorithm: {algorithm}")
    for run_idx in range(runs):
        logger.info(
            f"Running {algorithm} on {dataset_type} size={size} run={run_idx+1}/{runs}"
        )
        # Generate dataset with unique seed per run
        dataset_seed = seed + run_idx * 1000 if seed is not None else None
        arr = generate_dataset(size, dataset_type, seed=dataset_seed)
        # For quick sort with random pivot, use unique seed per run
        if algorithm == "quick" and pivot_strategy == "random":
            sort_kwargs["seed"] = (seed + run_idx * 1000) if seed is not None else None
        # Run benchmark
        sorted_arr, metrics = measure_sort_performance(
            sort_func,
            arr,
            instrument=instrument,
            **sort_kwargs,
        )
        # Verify correctness
        expected = sorted(arr)
        if sorted_arr != expected:
            logger.error(
                f"Correctness check failed for {algorithm} on {dataset_type} "
                f"size={size} run={run_idx+1}"
            )
            logger.error(f"Expected: {expected[:10]}...")
            logger.error(f"Got: {sorted_arr[:10]}...")
            return []  # Return empty to indicate failure
        # Store result
        result = {
            "algorithm": algorithm,
            "pivot": pivot_strategy if algorithm == "quick" else None,
            "dataset": dataset_type,
            "size": size,
            "run": run_idx + 1,
            "time_s": metrics.time_seconds,
            "peak_mem_bytes": metrics.peak_memory_bytes,
            "comparisons": metrics.comparisons if instrument else None,
            "swaps": metrics.swaps if instrument else None,
            "seed": seed,
        }
        results.append(result)
    return results
 def save_results_csv(results: List[Dict[str, Any]], csv_path: Path) -> None:
    """Save results to CSV file."""
    if not results:
        return
    file_exists = csv_path.exists()
    with open(csv_path, "a", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=results[0].keys())
        if not file_exists:
            writer.writeheader()
        writer.writerows(results)
 def save_summary_json(results: List[Dict[str, Any]], json_path: Path) -> None:
    """Save aggregated summary to JSON file."""
    if not results:
        return
    # Group by (algorithm, pivot, dataset, size)
    grouped: Dict[tuple, List[Metrics]] = {}
    for result in results:
        key = (
            result["algorithm"],
            result.get("pivot"),
            result["dataset"],
            result["size"],
        )
        metrics = Metrics()
        metrics.time_seconds = result["time_s"]
        metrics.peak_memory_bytes = result["peak_mem_bytes"]
        metrics.comparisons = result.get("comparisons") or 0
        metrics.swaps = result.get("swaps") or 0
        if key not in grouped:
            grouped[key] = []
        grouped[key].append(metrics)
    # Aggregate
    summary: Dict[str, Any] = {}
    for key, metrics_list in grouped.items():
        algo, pivot, dataset, size = key
        key_str = f"{algo}_{pivot or 'N/A'}_{dataset}_{size}"
        summary[key_str] = aggregate_metrics(metrics_list)
        summary[key_str]["algorithm"] = algo
        summary[key_str]["pivot"] = pivot
        summary[key_str]["dataset"] = dataset
        summary[key_str]["size"] = size
    # Merge with existing summary if it exists
    if json_path.exists():
        with open(json_path, "r") as f:
            existing = json.load(f)
        existing.update(summary)
        summary = existing
    with open(json_path, "w") as f:
        json.dump(summary, f, indent=2)
 def generate_plots(results: List[Dict[str, Any]], plots_dir: Path, logger: Any) -> None:
    """Generate plots from results."""
    try:
        import matplotlib.pyplot as plt
        import matplotlib
        matplotlib.use('Agg')  # Non-interactive backend
    except ImportError:
        logger.warning("matplotlib not available, skipping plots")
        return
    plots_dir.mkdir(parents=True, exist_ok=True)
    if not results:
        logger.warning("No results to plot")
        return
    # Group results by algorithm and dataset
    algorithms = sorted(set(r["algorithm"] for r in results))
    datasets = sorted(set(r["dataset"] for r in results))
    sizes = sorted(set(r["size"] for r in results))
    # Time vs size plots
    fig, axes = plt.subplots(len(datasets), 1, figsize=(10, 5 * len(datasets)))
    if len(datasets) == 1:
        axes = [axes]
    for idx, dataset in enumerate(datasets):
        ax = axes[idx]
        for algo in algorithms:
            algo_results = [
                r for r in results
                if r["algorithm"] == algo and r["dataset"] == dataset
            ]
            if not algo_results:
                continue
            # Average time per size
            size_times: Dict[int, List[float]] = {}
            for r in algo_results:
                size = r["size"]
                if size not in size_times:
                    size_times[size] = []
                size_times[size].append(r["time_s"])
            avg_times = [sum(size_times[s]) / len(size_times[s]) for s in sizes if s in size_times]
            plot_sizes = [s for s in sizes if s in size_times]
            ax.plot(plot_sizes, avg_times, marker="o", label=algo)
        ax.set_xlabel("Array Size")
        ax.set_ylabel("Time (seconds)")
        ax.set_title(f"Sorting Time vs Size - {dataset}")
        ax.legend()
        ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.savefig(plots_dir / "time_vs_size.png", dpi=150)
    plt.close()
    # Memory vs size plots
    fig, axes = plt.subplots(len(datasets), 1, figsize=(10, 5 * len(datasets)))
    if len(datasets) == 1:
        axes = [axes]
    for idx, dataset in enumerate(datasets):
        ax = axes[idx]
        for algo in algorithms:
            algo_results = [
                r for r in results
                if r["algorithm"] == algo and r["dataset"] == dataset
            ]
            if not algo_results:
                continue
            # Average memory per size
            size_memories: Dict[int, List[int]] = {}
            for r in algo_results:
                size = r["size"]
                if size not in size_memories:
                    size_memories[size] = []
                size_memories[size].append(r["peak_mem_bytes"])
            avg_memories = [
                sum(size_memories[s]) / len(size_memories[s])
                for s in sizes if s in size_memories
            ]
            plot_sizes = [s for s in sizes if s in size_memories]
            ax.plot(plot_sizes, avg_memories, marker="o", label=algo)
        ax.set_xlabel("Array Size")
        ax.set_ylabel("Peak Memory (bytes)")
        ax.set_title(f"Memory Usage vs Size - {dataset}")
        ax.legend()
        ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.savefig(plots_dir / "memory_vs_size.png", dpi=150)
    plt.close()
    logger.info(f"Plots saved to {plots_dir}")
 def main() -> int:
    """Main entry point."""
    args = parse_args()
    # Setup paths
    outdir = Path(args.outdir)
    outdir.mkdir(parents=True, exist_ok=True)
    plots_dir = Path("plots")
    # Setup logging
    setup_logging(outdir, args.log_level)
    logger = get_logger(__name__)
    # Parse arguments
    algorithms = [a.strip() for a in args.algorithms.split(",")]
    datasets = [d.strip() for d in args.datasets.split(",")]
    sizes = [int(s.strip()) for s in args.sizes.split(",")]
    # Validate algorithms
    valid_algorithms = {"merge", "quick"}
    for algo in algorithms:
        if algo not in valid_algorithms:
            logger.error(f"Invalid algorithm: {algo}")
            return 1
    # Set random seed
    if args.seed is not None:
        random.seed(args.seed)
    # Run benchmarks
    all_results: List[Dict[str, Any]] = []
    correctness_failed = False
    for algorithm in algorithms:
        pivot_strategy = args.pivot if algorithm == "quick" else None
        for dataset_type in datasets:
            for size in sizes:
                try:
                    results = run_benchmark(
                        algorithm,
                        pivot_strategy,
                        dataset_type,  # type: ignore
                        size,
                        args.runs,
                        args.seed,
                        args.instrument,
                        logger,
                    )
                    if not results:
                        correctness_failed = True
                    else:
                        all_results.extend(results)
                except Exception as e:
                    logger.error(
                        f"Error running benchmark: {algorithm}, {dataset_type}, {size}",
                        exc_info=True,
                    )
                    correctness_failed = True
    # Save results
    csv_path = outdir / "bench_results.csv"
    json_path = outdir / "summary.json"
    if all_results:
        save_results_csv(all_results, csv_path)
        save_summary_json(all_results, json_path)
        logger.info(f"Results saved to {csv_path} and {json_path}")
    # Generate plots
    if args.make_plots or all_results:
        generate_plots(all_results, plots_dir, logger)
    # Exit with error if correctness failed
    if correctness_failed:
        logger.error("Benchmark failed due to correctness check failures")
        return 1
    logger.info("Benchmark completed successfully")
    return 0
 if __name__ == "__main__":
    sys.exit(main())
--- a/src/bench/datasets.py
+++ b/src/bench/datasets.py
@@ -0,0 +1,54 @@
 """Dataset generators for benchmarking."""
 from typing import List, Literal, Optional
 import random
 DatasetType = Literal["sorted", "reverse", "random", "nearly_sorted", "duplicates_heavy"]
 def generate_dataset(
    size: int,
    dataset_type: DatasetType,
    seed: Optional[int] = None,
 ) -> List[int]:
    """
    Generate a dataset of specified type and size.
    Args:
        size: Number of elements in the dataset
        dataset_type: Type of dataset to generate
        seed: Random seed for reproducibility
    Returns:
        List of integers with the specified characteristics
    """
    if seed is not None:
        random.seed(seed)
    if dataset_type == "sorted":
        return list(range(size))
    elif dataset_type == "reverse":
        return list(range(size - 1, -1, -1))
    elif dataset_type == "random":
        return [random.randint(0, size * 10) for _ in range(size)]
    elif dataset_type == "nearly_sorted":
        arr = list(range(size))
        # Perform a few swaps (about 1% of elements)
        num_swaps = max(1, size // 100)
        for _ in range(num_swaps):
            i = random.randint(0, size - 1)
            j = random.randint(0, size - 1)
            arr[i], arr[j] = arr[j], arr[i]
        return arr
    elif dataset_type == "duplicates_heavy":
        # Generate array with many duplicate values
        # Use only a small set of distinct values
        distinct_values = max(1, size // 10)
        return [random.randint(0, distinct_values - 1) for _ in range(size)]
    else:
        raise ValueError(f"Unknown dataset type: {dataset_type}")
--- a/src/bench/logging_setup.py
+++ b/src/bench/logging_setup.py
@@ -0,0 +1,96 @@
 """Logging configuration for benchmarks."""
 import logging
 import sys
 from pathlib import Path
 from logging.handlers import RotatingFileHandler
 from typing import Optional
 import platform
 def setup_logging(
    log_dir: Path,
    log_level: str = "INFO",
    log_file: str = "bench.log",
 ) -> None:
    """
    Configure logging to both console and rotating file.
    Args:
        log_dir: Directory to write log files
        log_level: Logging level (DEBUG, INFO, WARNING, ERROR)
        log_file: Name of the log file
    """
    log_dir.mkdir(parents=True, exist_ok=True)
    log_path = log_dir / log_file
    # Convert string level to logging constant
    numeric_level = getattr(logging, log_level.upper(), logging.INFO)
    # Create logger
    logger = logging.getLogger()
    logger.setLevel(numeric_level)
    # Remove existing handlers to avoid duplicates
    logger.handlers.clear()
    # Console handler
    console_handler = logging.StreamHandler(sys.stdout)
    console_handler.setLevel(numeric_level)
    console_format = logging.Formatter(
        '%(asctime)s - %(name)s - %(levelname)s - %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S'
    )
    console_handler.setFormatter(console_format)
    logger.addHandler(console_handler)
    # File handler with rotation (10MB max, keep 5 backups)
    file_handler = RotatingFileHandler(
        log_path,
        maxBytes=10 * 1024 * 1024,
        backupCount=5,
        encoding='utf-8',
    )
    file_handler.setLevel(numeric_level)
    file_format = logging.Formatter(
        '%(asctime)s - %(name)s - %(levelname)s - %(funcName)s:%(lineno)d - %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S'
    )
    file_handler.setFormatter(file_format)
    logger.addHandler(file_handler)
    # Log system information
    logger.info("=" * 80)
    logger.info("Benchmark session started")
    logger.info(f"Python version: {sys.version}")
    logger.info(f"Platform: {platform.platform()}")
    logger.info(f"Architecture: {platform.machine()}")
    # Try to get git commit if available
    try:
        import subprocess
        result = subprocess.run(
            ["git", "rev-parse", "HEAD"],
            capture_output=True,
            text=True,
            timeout=2,
        )
        if result.returncode == 0:
            logger.info(f"Git commit: {result.stdout.strip()}")
    except (subprocess.TimeoutExpired, FileNotFoundError, subprocess.SubprocessError):
        pass
    logger.info("=" * 80)
 def get_logger(name: Optional[str] = None) -> logging.Logger:
    """
    Get a logger instance.
    Args:
        name: Logger name (defaults to calling module)
    Returns:
        Logger instance
    """
    return logging.getLogger(name)
--- a/src/bench/metrics.py
+++ b/src/bench/metrics.py
@@ -0,0 +1,125 @@
 """Performance metrics collection."""
 import time
 import tracemalloc
 from typing import Dict, Any, Optional, List
 import psutil
 import os
 class Metrics:
    """Container for benchmark metrics."""
    def __init__(self) -> None:
        self.time_seconds: float = 0.0
        self.peak_memory_bytes: int = 0
        self.comparisons: int = 0
        self.swaps: int = 0
    def to_dict(self) -> Dict[str, Any]:
        """Convert metrics to dictionary."""
        return {
            "time_s": self.time_seconds,
            "peak_mem_bytes": self.peak_memory_bytes,
            "comparisons": self.comparisons,
            "swaps": self.swaps,
        }
 def measure_sort_performance(
    sort_func,
    arr: List[int],
    *args,
    instrument: bool = False,
    **kwargs,
 ) -> tuple[List[int], Metrics]:
    """
    Measure performance of a sorting function.
    Args:
        sort_func: Sorting function to benchmark
        arr: Input array to sort
        *args: Additional positional arguments for sort_func
        instrument: Whether to count comparisons and swaps
        **kwargs: Additional keyword arguments for sort_func
    Returns:
        Tuple of (sorted_array, metrics)
    """
    metrics = Metrics()
    # Setup instrumentation
    if instrument:
        counters: Dict[str, int] = {"comparison": 0, "swap": 0}
        def instrument_callback(op: str) -> None:
            if op in counters:
                counters[op] += 1
        if "instrument" not in kwargs:
            kwargs["instrument"] = instrument_callback
    # Measure memory before
    process = psutil.Process(os.getpid())
    tracemalloc.start()
    # Measure time
    start_time = time.perf_counter()
    sorted_arr = sort_func(arr, *args, **kwargs)
    end_time = time.perf_counter()
    # Measure memory
    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    rss_memory = process.memory_info().rss
    metrics.time_seconds = end_time - start_time
    metrics.peak_memory_bytes = max(peak, rss_memory)
    if instrument:
        metrics.comparisons = counters.get("comparison", 0)
        metrics.swaps = counters.get("swap", 0)
    return sorted_arr, metrics
 def aggregate_metrics(metrics_list: List[Metrics]) -> Dict[str, Any]:
    """
    Aggregate metrics across multiple runs.
    Args:
        metrics_list: List of Metrics objects from multiple runs
    Returns:
        Dictionary with aggregated statistics
    """
    if not metrics_list:
        return {}
    times = [m.time_seconds for m in metrics_list]
    memories = [m.peak_memory_bytes for m in metrics_list]
    comparisons = [m.comparisons for m in metrics_list if m.comparisons > 0]
    swaps = [m.swaps for m in metrics_list if m.swaps > 0]
    import statistics
    result: Dict[str, Any] = {
        "time_mean_s": statistics.mean(times),
        "time_std_s": statistics.stdev(times) if len(times) > 1 else 0.0,
        "time_best_s": min(times),
        "time_worst_s": max(times),
        "memory_mean_bytes": statistics.mean(memories),
        "memory_std_bytes": statistics.stdev(memories) if len(memories) > 1 else 0.0,
        "memory_peak_bytes": max(memories),
        "runs": len(metrics_list),
    }
    if comparisons:
        result["comparisons_mean"] = statistics.mean(comparisons)
        result["comparisons_std"] = statistics.stdev(comparisons) if len(comparisons) > 1 else 0.0
    if swaps:
        result["swaps_mean"] = statistics.mean(swaps)
        result["swaps_std"] = statistics.stdev(swaps) if len(swaps) > 1 else 0.0
    return result
--- a/tests/init.py
+++ b/tests/init.py
@@ -0,0 +1,2 @@
 """Tests for sorting algorithms."""
--- a/tests/test_sorts.py
+++ b/tests/test_sorts.py
@@ -0,0 +1,161 @@
 """Tests for sorting algorithms."""
 import pytest
 from typing import List
 import random
 from src.algorithms.merge_sort import merge_sort
 from src.algorithms.quick_sort import quick_sort, PivotStrategy
 class TestMergeSort:
    """Tests for merge sort algorithm."""
    def test_empty_array(self) -> None:
        """Test sorting empty array."""
        assert merge_sort([]) == []
    def test_single_element(self) -> None:
        """Test sorting array with single element."""
        assert merge_sort([42]) == [42]
    def test_already_sorted(self) -> None:
        """Test sorting already sorted array."""
        arr = [1, 2, 3, 4, 5]
        assert merge_sort(arr) == [1, 2, 3, 4, 5]
        # Original should not be modified
        assert arr == [1, 2, 3, 4, 5]
    def test_reverse_sorted(self) -> None:
        """Test sorting reverse sorted array."""
        arr = [5, 4, 3, 2, 1]
        assert merge_sort(arr) == [1, 2, 3, 4, 5]
    def test_random_array(self) -> None:
        """Test sorting random array."""
        arr = [3, 1, 4, 1, 5, 9, 2, 6, 5]
        assert merge_sort(arr) == [1, 1, 2, 3, 4, 5, 5, 6, 9]
    def test_duplicates(self) -> None:
        """Test sorting array with duplicates."""
        arr = [5, 5, 5, 3, 3, 1]
        assert merge_sort(arr) == [1, 3, 3, 5, 5, 5]
    def test_large_array(self) -> None:
        """Test sorting large array."""
        arr = list(range(1000, 0, -1))
        result = merge_sort(arr)
        assert result == list(range(1, 1001))
    def test_instrumentation(self) -> None:
        """Test instrumentation callback."""
        counters: dict = {"comparison": 0, "swap": 0}
        def instrument(op: str) -> None:
            if op in counters:
                counters[op] += 1
        arr = [3, 1, 4, 1, 5]
        result = merge_sort(arr, instrument=instrument)
        assert result == [1, 1, 3, 4, 5]
        assert counters["comparison"] > 0
        # Merge sort doesn't do swaps in traditional sense
        assert counters["swap"] == 0
 class TestQuickSort:
    """Tests for quick sort algorithm."""
    @pytest.mark.parametrize("pivot", ["first", "last", "median_of_three", "random"])
    def test_empty_array(self, pivot: PivotStrategy) -> None:
        """Test sorting empty array."""
        assert quick_sort([], pivot_strategy=pivot) == []
    @pytest.mark.parametrize("pivot", ["first", "last", "median_of_three", "random"])
    def test_single_element(self, pivot: PivotStrategy) -> None:
        """Test sorting array with single element."""
        assert quick_sort([42], pivot_strategy=pivot) == [42]
    @pytest.mark.parametrize("pivot", ["first", "last", "median_of_three", "random"])
    def test_already_sorted(self, pivot: PivotStrategy) -> None:
        """Test sorting already sorted array."""
        arr = [1, 2, 3, 4, 5]
        result = quick_sort(arr, pivot_strategy=pivot, seed=42)
        assert result == [1, 2, 3, 4, 5]
        # Original should not be modified
        assert arr == [1, 2, 3, 4, 5]
    @pytest.mark.parametrize("pivot", ["first", "last", "median_of_three", "random"])
    def test_reverse_sorted(self, pivot: PivotStrategy) -> None:
        """Test sorting reverse sorted array."""
        arr = [5, 4, 3, 2, 1]
        result = quick_sort(arr, pivot_strategy=pivot, seed=42)
        assert result == [1, 2, 3, 4, 5]
    @pytest.mark.parametrize("pivot", ["first", "last", "median_of_three", "random"])
    def test_random_array(self, pivot: PivotStrategy) -> None:
        """Test sorting random array."""
        arr = [3, 1, 4, 1, 5, 9, 2, 6, 5]
        result = quick_sort(arr, pivot_strategy=pivot, seed=42)
        assert result == [1, 1, 2, 3, 4, 5, 5, 6, 9]
    @pytest.mark.parametrize("pivot", ["first", "last", "median_of_three", "random"])
    def test_duplicates(self, pivot: PivotStrategy) -> None:
        """Test sorting array with duplicates."""
        arr = [5, 5, 5, 3, 3, 1]
        result = quick_sort(arr, pivot_strategy=pivot, seed=42)
        assert result == [1, 3, 3, 5, 5, 5]
    @pytest.mark.parametrize("pivot", ["first", "last", "median_of_three", "random"])
    def test_large_array(self, pivot: PivotStrategy) -> None:
        """Test sorting large array."""
        arr = list(range(1000, 0, -1))
        result = quick_sort(arr, pivot_strategy=pivot, seed=42)
        assert result == list(range(1, 1001))
    def test_instrumentation(self) -> None:
        """Test instrumentation callback."""
        counters: dict = {"comparison": 0, "swap": 0}
        def instrument(op: str) -> None:
            if op in counters:
                counters[op] += 1
        arr = [3, 1, 4, 1, 5]
        result = quick_sort(arr, pivot_strategy="first", instrument=instrument, seed=42)
        assert result == [1, 1, 3, 4, 5]
        assert counters["comparison"] > 0
        assert counters["swap"] > 0
 class TestPropertyTests:
    """Property-based tests comparing to Python's sorted()."""
    @pytest.mark.parametrize("size", [10, 100, 1000])
    def test_merge_sort_property(self, size: int) -> None:
        """Property test: merge_sort should match sorted() for random arrays."""
        random.seed(42)
        arr = [random.randint(-1000, 1000) for _ in range(size)]
        result = merge_sort(arr)
        expected = sorted(arr)
        assert result == expected
    @pytest.mark.parametrize("pivot", ["first", "last", "median_of_three", "random"])
    @pytest.mark.parametrize("size", [10, 100, 1000])
    def test_quick_sort_property(self, pivot: PivotStrategy, size: int) -> None:
        """Property test: quick_sort should match sorted() for random arrays."""
        random.seed(42)
        arr = [random.randint(-1000, 1000) for _ in range(size)]
        result = quick_sort(arr, pivot_strategy=pivot, seed=42)
        expected = sorted(arr)
        assert result == expected
 if __name__ == "__main__":
    pytest.main([__file__, "-v"])
		`@@ -0,0 +1,2 @@`
							`"""Algorithms Week 2: Divide-and-Conquer Sorting Benchmarks."""`