SmartEdgeAI - IoT LLM Simulation with gem5
A comprehensive gem5-based simulation framework for IoT LLM workloads, featuring 16GB RAM configuration and 24k token processing capabilities.
🎯 Project Overview
This project simulates IoT (Internet of Things) systems running Large Language Models (LLMs) using the gem5 computer architecture simulator. The simulation includes:
- IoT LLM Workload: Simulates processing 24k tokens with memory allocation patterns typical of LLM inference
- 16GB RAM Configuration: Full-system simulation with realistic memory constraints
- Multiple CPU Architectures: Support for big/little core configurations
- Comprehensive Statistics: Detailed performance metrics and energy analysis
🚀 Quick Start
Prerequisites
# Install required dependencies
sudo apt update
sudo apt install python3-matplotlib python3-pydot python3-pip python3-venv
# Verify gem5 installation
ls /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt
Run Complete Workflow
# Run everything automatically
sh run_all.sh
# Or run individual steps
sh scripts/check_gem5.sh # Verify prerequisites
sh scripts/env.sh # Setup environment
sh scripts/build_workloads.sh # Compile workloads
sh scripts/run_one.sh iot_llm_sim big high 0 1MB # Run simulation
📁 Project Structure
SmartEdgeAI/
├── scripts/ # Automation scripts
│ ├── env.sh # Environment setup
│ ├── build_workloads.sh # Compile workloads
│ ├── run_one.sh # Single simulation run
│ ├── sweep.sh # Parameter sweep
│ ├── extract_csv.sh # Extract statistics
│ ├── energy_post.py # Energy analysis
│ └── bundle_logs.sh # Log collection
├── workloads/ # C source code
│ ├── tinyml_kws.c # TinyML keyword spotting
│ ├── sensor_fusion.c # Sensor data fusion
│ ├── aes_ccm.c # AES encryption
│ └── attention_kernel.c # Attention mechanism
├── iot_llm_sim.c # Main IoT LLM simulation
├── run_all.sh # Master workflow script
└── README.md # This file
🔧 Script Explanations
Core Scripts
scripts/env.sh
Purpose: Sets up environment variables and paths for the entire workflow.
Key Variables:
ROOT: Base gem5 installation pathCFG: gem5 configuration script (x86-ubuntu-run.py)GEM5_BIN: Path to gem5 binary (X86 build)RUN: Directory for compiled workloadsOUT_DATA: Simulation results directoryLOG_DATA: Log files directory
scripts/build_workloads.sh
Purpose: Compiles all C workloads into x86_64 binaries.
What it does:
- Compiles
tinyml_kws.c,sensor_fusion.c,aes_ccm.c,attention_kernel.c - Creates
iot_llm_simbinary for LLM simulation - Uses
gcc -O2 -staticfor optimized static binaries
scripts/run_one.sh
Purpose: Executes a single gem5 simulation with specified parameters.
Parameters:
workload: Which binary to run (e.g.,iot_llm_sim)core: CPU type (big=O3CPU,little=TimingSimpleCPU)dvfs: Frequency setting (high=2GHz,low=1GHz)drowsy: Cache drowsy mode (0=off, 1=on)l2: L2 cache size (e.g.,1MB)
Key Features:
- Maps core types to gem5 CPU models
- Copies stats from
m5out/stats.txtto output directory - Mirrors results to repository directories
iot_llm_sim.c
Purpose: Simulates IoT LLM inference with 24k token processing.
What it simulates:
- Memory allocation for 24k tokens (1KB per token)
- Token processing loop with memory operations
- Realistic LLM inference patterns
- Memory cleanup and resource management
🐛 Problem-Solving Journey
Initial Challenges
1. Empty stats.txt Files
Problem: Simulations were running but generating empty statistics files.
Root Cause: ARM binaries were hitting unsupported system calls (syscall 398 = futex).
Solution: Switched from ARM to x86_64 architecture for better gem5 compatibility.
2. Syscall Compatibility Issues
Problem: fatal: Syscall 398 out of range errors with ARM binaries.
Root Cause: gem5's syscall emulation mode doesn't support all Linux system calls, particularly newer ones like futex.
Solution:
- Tried multiple ARM configurations (starter_se.py, baremetal.py)
- Ultimately switched to x86_64 full-system simulation
- Used
x86-ubuntu-run.pyfor reliable Ubuntu-based simulation
3. Configuration Complexity
Problem: Custom gem5 configurations were failing with various errors.
Root Cause:
- Deprecated port names (
slave/master→cpu_side_ports/mem_side_ports) - Missing cache parameters (
tag_latency,data_latency, etc.) - Workload object creation issues
Solution: Used gem5's built-in x86-ubuntu-run.py configuration instead of custom scripts.
4. Stats Collection Issues
Problem: Statistics were generated in m5out/stats.txt but scripts expected them elsewhere.
Root Cause: x86-ubuntu-run.py outputs to default m5out/ directory.
Solution: Added automatic copying of stats from m5out/stats.txt to expected output directory.
Key Learnings
- Architecture Choice Matters: x86_64 is much more reliable than ARM for gem5 simulations
- Full-System vs Syscall Emulation: Full-system simulation is more robust than syscall emulation
- Use Built-in Configurations: gem5's built-in configs are more reliable than custom ones
- Path Management: Always verify and handle gem5's default output paths
🏗️ How the Project Works
Simulation Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ IoT LLM App │───▶│ gem5 X86 │───▶│ Statistics │
│ (24k tokens) │ │ Full-System │ │ (482KB) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Workflow Process
- Environment Setup: Configure paths and verify gem5 installation
- Workload Compilation: Compile C workloads to x86_64 binaries
- Simulation Execution: Run gem5 with Ubuntu Linux and workload
- Statistics Collection: Extract performance metrics from gem5 output
- Analysis: Process statistics for energy, performance, and efficiency metrics
Memory Configuration
- Total RAM: 16GB (as requested for IoT configuration)
- Memory Controllers: 2x DDR3 controllers with 8GB each
- Cache Hierarchy: L1I (48KB), L1D (32KB), L2 (1MB)
- Memory Access: Timing-based simulation with realistic latencies
📊 Simulation Results
Sample Output (iot_llm_sim)
simSeconds 3.875651 # Simulation time
simInsts 2665005563 # Instructions executed
simOps 5787853650 # Operations (including micro-ops)
hostInstRate 474335 # Instructions per second
Performance Metrics
- Simulation Speed: ~474K instructions/second
- Memory Usage: Successfully processes 24k tokens (24MB allocation)
- CPU Utilization: O3CPU with realistic pipeline behavior
- Cache Performance: Detailed L1/L2 hit/miss statistics
🛠️ Usage Guide
Basic Usage
# Run IoT LLM simulation
sh scripts/run_one.sh iot_llm_sim big high 0 1MB
# Run with different CPU types
sh scripts/run_one.sh iot_llm_sim little high 0 1MB # TimingSimpleCPU
sh scripts/run_one.sh iot_llm_sim big low 0 1MB # Low frequency
# Run parameter sweep
sh scripts/sweep.sh
Advanced Usage
# Custom memory size
sh scripts/run_one.sh iot_llm_sim big high 0 1MB 32GB
# Enable drowsy cache
sh scripts/run_one.sh iot_llm_sim big high 1 1MB
# Run specific workload
sh scripts/run_one.sh tinyml_kws big high 0 1MB
Analysis Commands
# Extract CSV statistics
sh scripts/extract_csv.sh
# Energy analysis
python3 scripts/energy_post.py
# Generate plots
python3 scripts/plot_epi.py
python3 scripts/plot_edp_tinyml.py
# Bundle logs
sh scripts/bundle_logs.sh
🔍 Troubleshooting
Common Issues
Empty stats.txt
# Check if simulation completed
ls -la m5out/stats.txt
# If empty, check logs
cat logs/*.stderr.log
gem5 Binary Not Found
# Verify installation
ls /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt
# Build if missing
cd /home/carlos/projects/gem5/gem5src/gem5
scons build/X86/gem5.opt -j$(nproc)
Compilation Errors
# Check compiler
gcc --version
# Rebuild workloads
sh scripts/build_workloads.sh
Debug Commands
# Check environment
sh scripts/env.sh
# Verify prerequisites
sh scripts/check_gem5.sh
# Manual gem5 run
/home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt \
/home/carlos/projects/gem5/gem5src/gem5/configs/example/gem5_library/x86-ubuntu-run.py \
--command=./iot_llm_sim --mem-size=16GB
📈 Performance Analysis
Key Metrics
- simSeconds: Total simulation time
- simInsts: Instructions executed
- simOps: Operations (including micro-ops)
- hostInstRate: Simulation speed
- Cache Miss Rates: L1/L2 performance
- Memory Bandwidth: DRAM utilization
Energy Analysis
The project includes energy post-processing scripts that calculate:
- Energy per Instruction (EPI)
- Power consumption
- Energy-Delay Product (EDP)
- Drowsy vs Non-drowsy comparisons
🎯 Future Enhancements
- Multi-core Support: Extend to multi-core IoT configurations
- Real LLM Models: Integrate actual transformer models
- Power Modeling: Add detailed power consumption analysis
- Network Simulation: Include IoT communication patterns
- Edge Computing: Simulate edge-to-cloud interactions
📚 References
🤝 Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test with
sh run_all.sh - Submit a pull request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Note: This project was developed through iterative problem-solving, switching from ARM to x86_64 architecture and using gem5's built-in configurations for maximum reliability. The final solution provides a robust IoT LLM simulation framework with comprehensive statistics and analysis capabilities.