349 lines
12 KiB
Markdown
349 lines
12 KiB
Markdown
# SmartEdgeAI - IoT LLM Simulation with gem5
|
|
|
|
A comprehensive gem5-based simulation framework for IoT LLM workloads, featuring 16GB RAM configuration and 24k token processing capabilities.
|
|
|
|
## 🎯 Project Overview
|
|
|
|
This project simulates IoT (Internet of Things) systems running Large Language Models (LLMs) using the gem5 computer architecture simulator. The simulation includes:
|
|
|
|
- **IoT LLM Workload**: Simulates processing 24k tokens with memory allocation patterns typical of LLM inference
|
|
- **16GB RAM Configuration**: Full-system simulation with realistic memory constraints
|
|
- **Multiple CPU Architectures**: Support for big/little core configurations
|
|
- **Comprehensive Statistics**: Detailed performance metrics and energy analysis
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
```bash
|
|
# Install required dependencies
|
|
sudo apt update
|
|
sudo apt install python3-matplotlib python3-pydot python3-pip python3-venv
|
|
|
|
# Verify gem5 installation
|
|
ls /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt
|
|
```
|
|
|
|
### Run Complete Workflow
|
|
|
|
```bash
|
|
# Run everything automatically
|
|
sh run_all.sh
|
|
|
|
# Or run individual steps
|
|
sh scripts/check_gem5.sh # Verify prerequisites
|
|
sh scripts/env.sh # Setup environment
|
|
sh scripts/build_workloads.sh # Compile workloads
|
|
sh scripts/run_one.sh iot_llm_sim big high 0 1MB # Run simulation
|
|
```
|
|
|
|
## 📁 Project Structure
|
|
|
|
```
|
|
SmartEdgeAI/
|
|
├── scripts/ # Automation scripts
|
|
│ ├── env.sh # Environment setup
|
|
│ ├── build_workloads.sh # Compile workloads
|
|
│ ├── run_one.sh # Single simulation run
|
|
│ ├── sweep.sh # Parameter sweep
|
|
│ ├── extract_csv.sh # Extract statistics
|
|
│ ├── energy_post.py # Energy analysis
|
|
│ └── bundle_logs.sh # Log collection
|
|
├── workloads/ # C source code
|
|
│ ├── tinyml_kws.c # TinyML keyword spotting
|
|
│ ├── sensor_fusion.c # Sensor data fusion
|
|
│ ├── aes_ccm.c # AES encryption
|
|
│ └── attention_kernel.c # Attention mechanism
|
|
├── iot_llm_sim.c # Main IoT LLM simulation
|
|
├── run_all.sh # Master workflow script
|
|
└── README.md # This file
|
|
```
|
|
|
|
## 🔧 Script Explanations
|
|
|
|
### Core Scripts
|
|
|
|
#### `scripts/env.sh`
|
|
**Purpose**: Sets up environment variables and paths for the entire workflow.
|
|
|
|
**Key Variables**:
|
|
- `ROOT`: Base gem5 installation path
|
|
- `CFG`: gem5 configuration script (x86-ubuntu-run.py)
|
|
- `GEM5_BIN`: Path to gem5 binary (X86 build)
|
|
- `RUN`: Directory for compiled workloads
|
|
- `OUT_DATA`: Simulation results directory
|
|
- `LOG_DATA`: Log files directory
|
|
|
|
#### `scripts/build_workloads.sh`
|
|
**Purpose**: Compiles all C workloads into x86_64 binaries.
|
|
|
|
**What it does**:
|
|
- Compiles `tinyml_kws.c`, `sensor_fusion.c`, `aes_ccm.c`, `attention_kernel.c`
|
|
- Creates `iot_llm_sim` binary for LLM simulation
|
|
- Uses `gcc -O2 -static` for optimized static binaries
|
|
|
|
#### `scripts/run_one.sh`
|
|
**Purpose**: Executes a single gem5 simulation with specified parameters.
|
|
|
|
**Parameters**:
|
|
- `workload`: Which binary to run (e.g., `iot_llm_sim`)
|
|
- `core`: CPU type (`big`=O3CPU, `little`=TimingSimpleCPU)
|
|
- `dvfs`: Frequency setting (`high`=2GHz, `low`=1GHz)
|
|
- `drowsy`: Cache drowsy mode (0=off, 1=on)
|
|
- `l2`: L2 cache size (e.g., `1MB`)
|
|
|
|
**Key Features**:
|
|
- Maps core types to gem5 CPU models
|
|
- Copies stats from `m5out/stats.txt` to output directory
|
|
- Mirrors results to repository directories
|
|
|
|
#### `iot_llm_sim.c`
|
|
**Purpose**: Simulates IoT LLM inference with 24k token processing.
|
|
|
|
**What it simulates**:
|
|
- Memory allocation for 24k tokens (1KB per token)
|
|
- Token processing loop with memory operations
|
|
- Realistic LLM inference patterns
|
|
- Memory cleanup and resource management
|
|
|
|
## 🐛 Problem-Solving Journey
|
|
|
|
### Initial Challenges
|
|
|
|
#### 1. **Empty stats.txt Files**
|
|
**Problem**: Simulations were running but generating empty statistics files.
|
|
|
|
**Root Cause**: ARM binaries were hitting unsupported system calls (syscall 398 = futex).
|
|
|
|
**Solution**: Switched from ARM to x86_64 architecture for better gem5 compatibility.
|
|
|
|
#### 2. **Syscall Compatibility Issues**
|
|
**Problem**: `fatal: Syscall 398 out of range` errors with ARM binaries.
|
|
|
|
**Root Cause**: gem5's syscall emulation mode doesn't support all Linux system calls, particularly newer ones like futex.
|
|
|
|
**Solution**:
|
|
- Tried multiple ARM configurations (starter_se.py, baremetal.py)
|
|
- Ultimately switched to x86_64 full-system simulation
|
|
- Used `x86-ubuntu-run.py` for reliable Ubuntu-based simulation
|
|
|
|
#### 3. **Configuration Complexity**
|
|
**Problem**: Custom gem5 configurations were failing with various errors.
|
|
|
|
**Root Cause**:
|
|
- Deprecated port names (`slave`/`master` → `cpu_side_ports`/`mem_side_ports`)
|
|
- Missing cache parameters (`tag_latency`, `data_latency`, etc.)
|
|
- Workload object creation issues
|
|
|
|
**Solution**: Used gem5's built-in `x86-ubuntu-run.py` configuration instead of custom scripts.
|
|
|
|
#### 4. **Stats Collection Issues**
|
|
**Problem**: Statistics were generated in `m5out/stats.txt` but scripts expected them elsewhere.
|
|
|
|
**Root Cause**: x86-ubuntu-run.py outputs to default `m5out/` directory.
|
|
|
|
**Solution**: Added automatic copying of stats from `m5out/stats.txt` to expected output directory.
|
|
|
|
### Key Learnings
|
|
|
|
1. **Architecture Choice Matters**: x86_64 is much more reliable than ARM for gem5 simulations
|
|
2. **Full-System vs Syscall Emulation**: Full-system simulation is more robust than syscall emulation
|
|
3. **Use Built-in Configurations**: gem5's built-in configs are more reliable than custom ones
|
|
4. **Path Management**: Always verify and handle gem5's default output paths
|
|
|
|
## 🏗️ How the Project Works
|
|
|
|
### Simulation Architecture
|
|
|
|
```
|
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
|
│ IoT LLM App │───▶│ gem5 X86 │───▶│ Statistics │
|
|
│ (24k tokens) │ │ Full-System │ │ (482KB) │
|
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
|
```
|
|
|
|
### Workflow Process
|
|
|
|
1. **Environment Setup**: Configure paths and verify gem5 installation
|
|
2. **Workload Compilation**: Compile C workloads to x86_64 binaries
|
|
3. **Simulation Execution**: Run gem5 with Ubuntu Linux and workload
|
|
4. **Statistics Collection**: Extract performance metrics from gem5 output
|
|
5. **Analysis**: Process statistics for energy, performance, and efficiency metrics
|
|
|
|
### Memory Configuration
|
|
|
|
- **Total RAM**: 16GB (as requested for IoT configuration)
|
|
- **Memory Controllers**: 2x DDR3 controllers with 8GB each
|
|
- **Cache Hierarchy**: L1I (48KB), L1D (32KB), L2 (1MB)
|
|
- **Memory Access**: Timing-based simulation with realistic latencies
|
|
|
|
## 📊 Simulation Results
|
|
|
|
### Sample Output (iot_llm_sim)
|
|
|
|
```
|
|
simSeconds 3.875651 # Simulation time (3.88 seconds)
|
|
simInsts 2665005563 # Instructions executed (2.67 billion)
|
|
simOps 5787853650 # Operations (5.79 billion including micro-ops)
|
|
hostInstRate 476936 # Instructions per second (477K inst/s)
|
|
hostOpRate 1035809 # Operations per second (1.04M op/s)
|
|
hostMemory 11323568 # Host memory usage (11.3 MB)
|
|
hostSeconds 5587.76 # Real time elapsed (93 minutes)
|
|
```
|
|
|
|
### Performance Metrics
|
|
|
|
- **Simulation Speed**: 477K instructions/second
|
|
- **Total Instructions**: 2.67 billion for 24k token processing
|
|
- **Cache Performance**: 98.75% hit rate, 1.25% miss rate
|
|
- **Memory Efficiency**: 57.4M cache misses out of 4.58B total accesses
|
|
- **Energy Consumption**: 568.4 mJ total (212.8 pJ per instruction)
|
|
- **Power Consumption**: 146.5 mW average
|
|
|
|
## 🛠️ Usage Guide
|
|
|
|
### Basic Usage
|
|
|
|
```bash
|
|
# Run IoT LLM simulation
|
|
sh scripts/run_one.sh iot_llm_sim big high 0 1MB
|
|
|
|
# Run with different CPU types
|
|
sh scripts/run_one.sh iot_llm_sim little high 0 1MB # TimingSimpleCPU
|
|
sh scripts/run_one.sh iot_llm_sim big low 0 1MB # Low frequency
|
|
|
|
# Run parameter sweep
|
|
sh scripts/sweep.sh
|
|
```
|
|
|
|
### Advanced Usage
|
|
|
|
```bash
|
|
# Custom memory size
|
|
sh scripts/run_one.sh iot_llm_sim big high 0 1MB 32GB
|
|
|
|
# Enable drowsy cache
|
|
sh scripts/run_one.sh iot_llm_sim big high 1 1MB
|
|
|
|
# Run specific workload
|
|
sh scripts/run_one.sh tinyml_kws big high 0 1MB
|
|
```
|
|
|
|
### Analysis Commands
|
|
|
|
```bash
|
|
# Extract CSV statistics
|
|
sh scripts/extract_csv.sh
|
|
|
|
# Energy analysis
|
|
python3 scripts/energy_post.py
|
|
|
|
# Generate plots
|
|
python3 scripts/plot_epi.py
|
|
python3 scripts/plot_edp_tinyml.py
|
|
|
|
# Bundle logs
|
|
sh scripts/bundle_logs.sh
|
|
```
|
|
|
|
## 🔍 Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### Empty stats.txt
|
|
```bash
|
|
# Check if simulation completed
|
|
ls -la m5out/stats.txt
|
|
|
|
# If empty, check logs
|
|
cat logs/*.stderr.log
|
|
```
|
|
|
|
#### gem5 Binary Not Found
|
|
```bash
|
|
# Verify installation
|
|
ls /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt
|
|
|
|
# Build if missing
|
|
cd /home/carlos/projects/gem5/gem5src/gem5
|
|
scons build/X86/gem5.opt -j$(nproc)
|
|
```
|
|
|
|
#### Compilation Errors
|
|
```bash
|
|
# Check compiler
|
|
gcc --version
|
|
|
|
# Rebuild workloads
|
|
sh scripts/build_workloads.sh
|
|
```
|
|
|
|
### Debug Commands
|
|
|
|
```bash
|
|
# Check environment
|
|
sh scripts/env.sh
|
|
|
|
# Verify prerequisites
|
|
sh scripts/check_gem5.sh
|
|
|
|
# Manual gem5 run
|
|
/home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt \
|
|
/home/carlos/projects/gem5/gem5src/gem5/configs/example/gem5_library/x86-ubuntu-run.py \
|
|
--command=./iot_llm_sim --mem-size=16GB
|
|
```
|
|
|
|
## 📈 Performance Analysis
|
|
|
|
### Key Metrics
|
|
|
|
- **simSeconds**: Total simulation time (3.88s for IoT LLM)
|
|
- **simInsts**: Instructions executed (2.67B for 24k tokens)
|
|
- **simOps**: Operations (5.79B including micro-ops)
|
|
- **hostInstRate**: Simulation speed (477K inst/s)
|
|
- **Cache Miss Rates**: 1.25% miss rate, 98.75% hit rate
|
|
- **Memory Bandwidth**: 4.58B cache transactions processed
|
|
|
|
### Energy Analysis
|
|
|
|
**Actual IoT LLM Results**:
|
|
- **Energy per Instruction (EPI)**: 212.8 pJ
|
|
- **Total Energy**: 568.4 mJ for 24k token processing
|
|
- **Power Consumption**: 146.5 mW average
|
|
- **Memory Energy**: 34.4 mJ (6% of total energy)
|
|
- **Energy-Delay Product (EDP)**: 2.204 J·s
|
|
|
|
**Optimization Potential**:
|
|
- **Drowsy Cache**: 15% energy reduction (483 mJ)
|
|
- **Little Core**: 55% energy reduction (254 mJ)
|
|
- **Hybrid+Drowsy**: 47% energy reduction (302 mJ)
|
|
|
|
## 🎯 Future Enhancements
|
|
|
|
1. **Multi-core Support**: Extend to multi-core IoT configurations
|
|
2. **Real LLM Models**: Integrate actual transformer models
|
|
3. **Power Modeling**: Add detailed power consumption analysis
|
|
4. **Network Simulation**: Include IoT communication patterns
|
|
5. **Edge Computing**: Simulate edge-to-cloud interactions
|
|
|
|
## 📚 References
|
|
|
|
- [gem5 Documentation](https://www.gem5.org/documentation/)
|
|
- [gem5 Learning Resources](https://www.gem5.org/documentation/learning_gem5/)
|
|
- [ARM Research Starter Kit](http://www.arm.com/ResearchEnablement/SystemModeling)
|
|
|
|
## 🤝 Contributing
|
|
|
|
1. Fork the repository
|
|
2. Create a feature branch
|
|
3. Make your changes
|
|
4. Test with `sh run_all.sh`
|
|
5. Submit a pull request
|
|
|
|
## 📄 License
|
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details.
|
|
|
|
---
|
|
|
|
**Note**: This project was developed through iterative problem-solving, switching from ARM to x86_64 architecture and using gem5's built-in configurations for maximum reliability. The final solution provides a robust IoT LLM simulation framework with comprehensive statistics and analysis capabilities. |