# SmartEdgeAI - IoT LLM Simulation with gem5 A comprehensive gem5-based simulation framework for IoT LLM workloads, featuring 16GB RAM configuration and 24k token processing capabilities. ## 🎯 Project Overview This project simulates IoT (Internet of Things) systems running Large Language Models (LLMs) using the gem5 computer architecture simulator. The simulation includes: - **IoT LLM Workload**: Simulates processing 24k tokens with memory allocation patterns typical of LLM inference - **16GB RAM Configuration**: Full-system simulation with realistic memory constraints - **Multiple CPU Architectures**: Support for big/little core configurations - **Comprehensive Statistics**: Detailed performance metrics and energy analysis ## πŸš€ Quick Start ### Prerequisites ```bash # Install required dependencies sudo apt update sudo apt install python3-matplotlib python3-pydot python3-pip python3-venv # Verify gem5 installation ls /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt ``` ### Run Complete Workflow ```bash # Run everything automatically sh run_all.sh # Or run individual steps sh scripts/check_gem5.sh # Verify prerequisites sh scripts/env.sh # Setup environment sh scripts/build_workloads.sh # Compile workloads sh scripts/run_one.sh iot_llm_sim big high 0 1MB # Run simulation ``` ## πŸ“ Project Structure ``` SmartEdgeAI/ β”œβ”€β”€ scripts/ # Automation scripts β”‚ β”œβ”€β”€ env.sh # Environment setup β”‚ β”œβ”€β”€ build_workloads.sh # Compile workloads β”‚ β”œβ”€β”€ run_one.sh # Single simulation run β”‚ β”œβ”€β”€ sweep.sh # Parameter sweep β”‚ β”œβ”€β”€ extract_csv.sh # Extract statistics β”‚ β”œβ”€β”€ energy_post.py # Energy analysis β”‚ └── bundle_logs.sh # Log collection β”œβ”€β”€ workloads/ # C source code β”‚ β”œβ”€β”€ tinyml_kws.c # TinyML keyword spotting β”‚ β”œβ”€β”€ sensor_fusion.c # Sensor data fusion β”‚ β”œβ”€β”€ aes_ccm.c # AES encryption β”‚ └── attention_kernel.c # Attention mechanism β”œβ”€β”€ iot_llm_sim.c # Main IoT LLM simulation β”œβ”€β”€ run_all.sh # Master workflow script └── README.md # This file ``` ## πŸ”§ Script Explanations ### Core Scripts #### `scripts/env.sh` **Purpose**: Sets up environment variables and paths for the entire workflow. **Key Variables**: - `ROOT`: Base gem5 installation path - `CFG`: gem5 configuration script (x86-ubuntu-run.py) - `GEM5_BIN`: Path to gem5 binary (X86 build) - `RUN`: Directory for compiled workloads - `OUT_DATA`: Simulation results directory - `LOG_DATA`: Log files directory #### `scripts/build_workloads.sh` **Purpose**: Compiles all C workloads into x86_64 binaries. **What it does**: - Compiles `tinyml_kws.c`, `sensor_fusion.c`, `aes_ccm.c`, `attention_kernel.c` - Creates `iot_llm_sim` binary for LLM simulation - Uses `gcc -O2 -static` for optimized static binaries #### `scripts/run_one.sh` **Purpose**: Executes a single gem5 simulation with specified parameters. **Parameters**: - `workload`: Which binary to run (e.g., `iot_llm_sim`) - `core`: CPU type (`big`=O3CPU, `little`=TimingSimpleCPU) - `dvfs`: Frequency setting (`high`=2GHz, `low`=1GHz) - `drowsy`: Cache drowsy mode (0=off, 1=on) - `l2`: L2 cache size (e.g., `1MB`) **Key Features**: - Maps core types to gem5 CPU models - Copies stats from `m5out/stats.txt` to output directory - Mirrors results to repository directories #### `iot_llm_sim.c` **Purpose**: Simulates IoT LLM inference with 24k token processing. **What it simulates**: - Memory allocation for 24k tokens (1KB per token) - Token processing loop with memory operations - Realistic LLM inference patterns - Memory cleanup and resource management ## πŸ› Problem-Solving Journey ### Initial Challenges #### 1. **Empty stats.txt Files** **Problem**: Simulations were running but generating empty statistics files. **Root Cause**: ARM binaries were hitting unsupported system calls (syscall 398 = futex). **Solution**: Switched from ARM to x86_64 architecture for better gem5 compatibility. #### 2. **Syscall Compatibility Issues** **Problem**: `fatal: Syscall 398 out of range` errors with ARM binaries. **Root Cause**: gem5's syscall emulation mode doesn't support all Linux system calls, particularly newer ones like futex. **Solution**: - Tried multiple ARM configurations (starter_se.py, baremetal.py) - Ultimately switched to x86_64 full-system simulation - Used `x86-ubuntu-run.py` for reliable Ubuntu-based simulation #### 3. **Configuration Complexity** **Problem**: Custom gem5 configurations were failing with various errors. **Root Cause**: - Deprecated port names (`slave`/`master` β†’ `cpu_side_ports`/`mem_side_ports`) - Missing cache parameters (`tag_latency`, `data_latency`, etc.) - Workload object creation issues **Solution**: Used gem5's built-in `x86-ubuntu-run.py` configuration instead of custom scripts. #### 4. **Stats Collection Issues** **Problem**: Statistics were generated in `m5out/stats.txt` but scripts expected them elsewhere. **Root Cause**: x86-ubuntu-run.py outputs to default `m5out/` directory. **Solution**: Added automatic copying of stats from `m5out/stats.txt` to expected output directory. ### Key Learnings 1. **Architecture Choice Matters**: x86_64 is much more reliable than ARM for gem5 simulations 2. **Full-System vs Syscall Emulation**: Full-system simulation is more robust than syscall emulation 3. **Use Built-in Configurations**: gem5's built-in configs are more reliable than custom ones 4. **Path Management**: Always verify and handle gem5's default output paths ## πŸ—οΈ How the Project Works ### Simulation Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ IoT LLM App │───▢│ gem5 X86 │───▢│ Statistics β”‚ β”‚ (24k tokens) β”‚ β”‚ Full-System β”‚ β”‚ (482KB) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Workflow Process 1. **Environment Setup**: Configure paths and verify gem5 installation 2. **Workload Compilation**: Compile C workloads to x86_64 binaries 3. **Simulation Execution**: Run gem5 with Ubuntu Linux and workload 4. **Statistics Collection**: Extract performance metrics from gem5 output 5. **Analysis**: Process statistics for energy, performance, and efficiency metrics ### Memory Configuration - **Total RAM**: 16GB (as requested for IoT configuration) - **Memory Controllers**: 2x DDR3 controllers with 8GB each - **Cache Hierarchy**: L1I (48KB), L1D (32KB), L2 (1MB) - **Memory Access**: Timing-based simulation with realistic latencies ## πŸ“Š Simulation Results ### Sample Output (iot_llm_sim) ``` simSeconds 3.875651 # Simulation time (3.88 seconds) simInsts 2665005563 # Instructions executed (2.67 billion) simOps 5787853650 # Operations (5.79 billion including micro-ops) hostInstRate 476936 # Instructions per second (477K inst/s) hostOpRate 1035809 # Operations per second (1.04M op/s) hostMemory 11323568 # Host memory usage (11.3 MB) hostSeconds 5587.76 # Real time elapsed (93 minutes) ``` ### Performance Metrics - **Simulation Speed**: 477K instructions/second - **Total Instructions**: 2.67 billion for 24k token processing - **Cache Performance**: 98.75% hit rate, 1.25% miss rate - **Memory Efficiency**: 57.4M cache misses out of 4.58B total accesses - **Energy Consumption**: 568.4 mJ total (212.8 pJ per instruction) - **Power Consumption**: 146.5 mW average ## πŸ› οΈ Usage Guide ### Basic Usage ```bash # Run IoT LLM simulation sh scripts/run_one.sh iot_llm_sim big high 0 1MB # Run with different CPU types sh scripts/run_one.sh iot_llm_sim little high 0 1MB # TimingSimpleCPU sh scripts/run_one.sh iot_llm_sim big low 0 1MB # Low frequency # Run parameter sweep sh scripts/sweep.sh ``` ### Advanced Usage ```bash # Custom memory size sh scripts/run_one.sh iot_llm_sim big high 0 1MB 32GB # Enable drowsy cache sh scripts/run_one.sh iot_llm_sim big high 1 1MB # Run specific workload sh scripts/run_one.sh tinyml_kws big high 0 1MB ``` ### Analysis Commands ```bash # Extract CSV statistics sh scripts/extract_csv.sh # Energy analysis python3 scripts/energy_post.py # Generate plots python3 scripts/plot_epi.py python3 scripts/plot_edp_tinyml.py # Bundle logs sh scripts/bundle_logs.sh ``` ## πŸ” Troubleshooting ### Common Issues #### Empty stats.txt ```bash # Check if simulation completed ls -la m5out/stats.txt # If empty, check logs cat logs/*.stderr.log ``` #### gem5 Binary Not Found ```bash # Verify installation ls /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt # Build if missing cd /home/carlos/projects/gem5/gem5src/gem5 scons build/X86/gem5.opt -j$(nproc) ``` #### Compilation Errors ```bash # Check compiler gcc --version # Rebuild workloads sh scripts/build_workloads.sh ``` ### Debug Commands ```bash # Check environment sh scripts/env.sh # Verify prerequisites sh scripts/check_gem5.sh # Manual gem5 run /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt \ /home/carlos/projects/gem5/gem5src/gem5/configs/example/gem5_library/x86-ubuntu-run.py \ --command=./iot_llm_sim --mem-size=16GB ``` ## πŸ“ˆ Performance Analysis ### Key Metrics - **simSeconds**: Total simulation time (3.88s for IoT LLM) - **simInsts**: Instructions executed (2.67B for 24k tokens) - **simOps**: Operations (5.79B including micro-ops) - **hostInstRate**: Simulation speed (477K inst/s) - **Cache Miss Rates**: 1.25% miss rate, 98.75% hit rate - **Memory Bandwidth**: 4.58B cache transactions processed ### Energy Analysis **Actual IoT LLM Results**: - **Energy per Instruction (EPI)**: 212.8 pJ - **Total Energy**: 568.4 mJ for 24k token processing - **Power Consumption**: 146.5 mW average - **Memory Energy**: 34.4 mJ (6% of total energy) - **Energy-Delay Product (EDP)**: 2.204 JΒ·s **Optimization Potential**: - **Drowsy Cache**: 15% energy reduction (483 mJ) - **Little Core**: 55% energy reduction (254 mJ) - **Hybrid+Drowsy**: 47% energy reduction (302 mJ) ## 🎯 Future Enhancements 1. **Multi-core Support**: Extend to multi-core IoT configurations 2. **Real LLM Models**: Integrate actual transformer models 3. **Power Modeling**: Add detailed power consumption analysis 4. **Network Simulation**: Include IoT communication patterns 5. **Edge Computing**: Simulate edge-to-cloud interactions ## πŸ“š References - [gem5 Documentation](https://www.gem5.org/documentation/) - [gem5 Learning Resources](https://www.gem5.org/documentation/learning_gem5/) - [ARM Research Starter Kit](http://www.arm.com/ResearchEnablement/SystemModeling) ## 🀝 Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Test with `sh run_all.sh` 5. Submit a pull request ## πŸ“„ License This project is licensed under the MIT License - see the LICENSE file for details. --- **Note**: This project was developed through iterative problem-solving, switching from ARM to x86_64 architecture and using gem5's built-in configurations for maximum reliability. The final solution provides a robust IoT LLM simulation framework with comprehensive statistics and analysis capabilities.