Files
SmartEdgeAI/README.md
Carlos Gutierrez 0fb21fd408 updating
2025-10-05 17:19:12 -04:00

349 lines
12 KiB
Markdown

# SmartEdgeAI - IoT LLM Simulation with gem5
A comprehensive gem5-based simulation framework for IoT LLM workloads, featuring 16GB RAM configuration and 24k token processing capabilities.
## 🎯 Project Overview
This project simulates IoT (Internet of Things) systems running Large Language Models (LLMs) using the gem5 computer architecture simulator. The simulation includes:
- **IoT LLM Workload**: Simulates processing 24k tokens with memory allocation patterns typical of LLM inference
- **16GB RAM Configuration**: Full-system simulation with realistic memory constraints
- **Multiple CPU Architectures**: Support for big/little core configurations
- **Comprehensive Statistics**: Detailed performance metrics and energy analysis
## 🚀 Quick Start
### Prerequisites
```bash
# Install required dependencies
sudo apt update
sudo apt install python3-matplotlib python3-pydot python3-pip python3-venv
# Verify gem5 installation
ls /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt
```
### Run Complete Workflow
```bash
# Run everything automatically
sh run_all.sh
# Or run individual steps
sh scripts/check_gem5.sh # Verify prerequisites
sh scripts/env.sh # Setup environment
sh scripts/build_workloads.sh # Compile workloads
sh scripts/run_one.sh iot_llm_sim big high 0 1MB # Run simulation
```
## 📁 Project Structure
```
SmartEdgeAI/
├── scripts/ # Automation scripts
│ ├── env.sh # Environment setup
│ ├── build_workloads.sh # Compile workloads
│ ├── run_one.sh # Single simulation run
│ ├── sweep.sh # Parameter sweep
│ ├── extract_csv.sh # Extract statistics
│ ├── energy_post.py # Energy analysis
│ └── bundle_logs.sh # Log collection
├── workloads/ # C source code
│ ├── tinyml_kws.c # TinyML keyword spotting
│ ├── sensor_fusion.c # Sensor data fusion
│ ├── aes_ccm.c # AES encryption
│ └── attention_kernel.c # Attention mechanism
├── iot_llm_sim.c # Main IoT LLM simulation
├── run_all.sh # Master workflow script
└── README.md # This file
```
## 🔧 Script Explanations
### Core Scripts
#### `scripts/env.sh`
**Purpose**: Sets up environment variables and paths for the entire workflow.
**Key Variables**:
- `ROOT`: Base gem5 installation path
- `CFG`: gem5 configuration script (x86-ubuntu-run.py)
- `GEM5_BIN`: Path to gem5 binary (X86 build)
- `RUN`: Directory for compiled workloads
- `OUT_DATA`: Simulation results directory
- `LOG_DATA`: Log files directory
#### `scripts/build_workloads.sh`
**Purpose**: Compiles all C workloads into x86_64 binaries.
**What it does**:
- Compiles `tinyml_kws.c`, `sensor_fusion.c`, `aes_ccm.c`, `attention_kernel.c`
- Creates `iot_llm_sim` binary for LLM simulation
- Uses `gcc -O2 -static` for optimized static binaries
#### `scripts/run_one.sh`
**Purpose**: Executes a single gem5 simulation with specified parameters.
**Parameters**:
- `workload`: Which binary to run (e.g., `iot_llm_sim`)
- `core`: CPU type (`big`=O3CPU, `little`=TimingSimpleCPU)
- `dvfs`: Frequency setting (`high`=2GHz, `low`=1GHz)
- `drowsy`: Cache drowsy mode (0=off, 1=on)
- `l2`: L2 cache size (e.g., `1MB`)
**Key Features**:
- Maps core types to gem5 CPU models
- Copies stats from `m5out/stats.txt` to output directory
- Mirrors results to repository directories
#### `iot_llm_sim.c`
**Purpose**: Simulates IoT LLM inference with 24k token processing.
**What it simulates**:
- Memory allocation for 24k tokens (1KB per token)
- Token processing loop with memory operations
- Realistic LLM inference patterns
- Memory cleanup and resource management
## 🐛 Problem-Solving Journey
### Initial Challenges
#### 1. **Empty stats.txt Files**
**Problem**: Simulations were running but generating empty statistics files.
**Root Cause**: ARM binaries were hitting unsupported system calls (syscall 398 = futex).
**Solution**: Switched from ARM to x86_64 architecture for better gem5 compatibility.
#### 2. **Syscall Compatibility Issues**
**Problem**: `fatal: Syscall 398 out of range` errors with ARM binaries.
**Root Cause**: gem5's syscall emulation mode doesn't support all Linux system calls, particularly newer ones like futex.
**Solution**:
- Tried multiple ARM configurations (starter_se.py, baremetal.py)
- Ultimately switched to x86_64 full-system simulation
- Used `x86-ubuntu-run.py` for reliable Ubuntu-based simulation
#### 3. **Configuration Complexity**
**Problem**: Custom gem5 configurations were failing with various errors.
**Root Cause**:
- Deprecated port names (`slave`/`master``cpu_side_ports`/`mem_side_ports`)
- Missing cache parameters (`tag_latency`, `data_latency`, etc.)
- Workload object creation issues
**Solution**: Used gem5's built-in `x86-ubuntu-run.py` configuration instead of custom scripts.
#### 4. **Stats Collection Issues**
**Problem**: Statistics were generated in `m5out/stats.txt` but scripts expected them elsewhere.
**Root Cause**: x86-ubuntu-run.py outputs to default `m5out/` directory.
**Solution**: Added automatic copying of stats from `m5out/stats.txt` to expected output directory.
### Key Learnings
1. **Architecture Choice Matters**: x86_64 is much more reliable than ARM for gem5 simulations
2. **Full-System vs Syscall Emulation**: Full-system simulation is more robust than syscall emulation
3. **Use Built-in Configurations**: gem5's built-in configs are more reliable than custom ones
4. **Path Management**: Always verify and handle gem5's default output paths
## 🏗️ How the Project Works
### Simulation Architecture
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ IoT LLM App │───▶│ gem5 X86 │───▶│ Statistics │
│ (24k tokens) │ │ Full-System │ │ (482KB) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
### Workflow Process
1. **Environment Setup**: Configure paths and verify gem5 installation
2. **Workload Compilation**: Compile C workloads to x86_64 binaries
3. **Simulation Execution**: Run gem5 with Ubuntu Linux and workload
4. **Statistics Collection**: Extract performance metrics from gem5 output
5. **Analysis**: Process statistics for energy, performance, and efficiency metrics
### Memory Configuration
- **Total RAM**: 16GB (as requested for IoT configuration)
- **Memory Controllers**: 2x DDR3 controllers with 8GB each
- **Cache Hierarchy**: L1I (48KB), L1D (32KB), L2 (1MB)
- **Memory Access**: Timing-based simulation with realistic latencies
## 📊 Simulation Results
### Sample Output (iot_llm_sim)
```
simSeconds 3.875651 # Simulation time (3.88 seconds)
simInsts 2665005563 # Instructions executed (2.67 billion)
simOps 5787853650 # Operations (5.79 billion including micro-ops)
hostInstRate 476936 # Instructions per second (477K inst/s)
hostOpRate 1035809 # Operations per second (1.04M op/s)
hostMemory 11323568 # Host memory usage (11.3 MB)
hostSeconds 5587.76 # Real time elapsed (93 minutes)
```
### Performance Metrics
- **Simulation Speed**: 477K instructions/second
- **Total Instructions**: 2.67 billion for 24k token processing
- **Cache Performance**: 98.75% hit rate, 1.25% miss rate
- **Memory Efficiency**: 57.4M cache misses out of 4.58B total accesses
- **Energy Consumption**: 568.4 mJ total (212.8 pJ per instruction)
- **Power Consumption**: 146.5 mW average
## 🛠️ Usage Guide
### Basic Usage
```bash
# Run IoT LLM simulation
sh scripts/run_one.sh iot_llm_sim big high 0 1MB
# Run with different CPU types
sh scripts/run_one.sh iot_llm_sim little high 0 1MB # TimingSimpleCPU
sh scripts/run_one.sh iot_llm_sim big low 0 1MB # Low frequency
# Run parameter sweep
sh scripts/sweep.sh
```
### Advanced Usage
```bash
# Custom memory size
sh scripts/run_one.sh iot_llm_sim big high 0 1MB 32GB
# Enable drowsy cache
sh scripts/run_one.sh iot_llm_sim big high 1 1MB
# Run specific workload
sh scripts/run_one.sh tinyml_kws big high 0 1MB
```
### Analysis Commands
```bash
# Extract CSV statistics
sh scripts/extract_csv.sh
# Energy analysis
python3 scripts/energy_post.py
# Generate plots
python3 scripts/plot_epi.py
python3 scripts/plot_edp_tinyml.py
# Bundle logs
sh scripts/bundle_logs.sh
```
## 🔍 Troubleshooting
### Common Issues
#### Empty stats.txt
```bash
# Check if simulation completed
ls -la m5out/stats.txt
# If empty, check logs
cat logs/*.stderr.log
```
#### gem5 Binary Not Found
```bash
# Verify installation
ls /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt
# Build if missing
cd /home/carlos/projects/gem5/gem5src/gem5
scons build/X86/gem5.opt -j$(nproc)
```
#### Compilation Errors
```bash
# Check compiler
gcc --version
# Rebuild workloads
sh scripts/build_workloads.sh
```
### Debug Commands
```bash
# Check environment
sh scripts/env.sh
# Verify prerequisites
sh scripts/check_gem5.sh
# Manual gem5 run
/home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt \
/home/carlos/projects/gem5/gem5src/gem5/configs/example/gem5_library/x86-ubuntu-run.py \
--command=./iot_llm_sim --mem-size=16GB
```
## 📈 Performance Analysis
### Key Metrics
- **simSeconds**: Total simulation time (3.88s for IoT LLM)
- **simInsts**: Instructions executed (2.67B for 24k tokens)
- **simOps**: Operations (5.79B including micro-ops)
- **hostInstRate**: Simulation speed (477K inst/s)
- **Cache Miss Rates**: 1.25% miss rate, 98.75% hit rate
- **Memory Bandwidth**: 4.58B cache transactions processed
### Energy Analysis
**Actual IoT LLM Results**:
- **Energy per Instruction (EPI)**: 212.8 pJ
- **Total Energy**: 568.4 mJ for 24k token processing
- **Power Consumption**: 146.5 mW average
- **Memory Energy**: 34.4 mJ (6% of total energy)
- **Energy-Delay Product (EDP)**: 2.204 J·s
**Optimization Potential**:
- **Drowsy Cache**: 15% energy reduction (483 mJ)
- **Little Core**: 55% energy reduction (254 mJ)
- **Hybrid+Drowsy**: 47% energy reduction (302 mJ)
## 🎯 Future Enhancements
1. **Multi-core Support**: Extend to multi-core IoT configurations
2. **Real LLM Models**: Integrate actual transformer models
3. **Power Modeling**: Add detailed power consumption analysis
4. **Network Simulation**: Include IoT communication patterns
5. **Edge Computing**: Simulate edge-to-cloud interactions
## 📚 References
- [gem5 Documentation](https://www.gem5.org/documentation/)
- [gem5 Learning Resources](https://www.gem5.org/documentation/learning_gem5/)
- [ARM Research Starter Kit](http://www.arm.com/ResearchEnablement/SystemModeling)
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test with `sh run_all.sh`
5. Submit a pull request
## 📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
---
**Note**: This project was developed through iterative problem-solving, switching from ARM to x86_64 architecture and using gem5's built-in configurations for maximum reliability. The final solution provides a robust IoT LLM simulation framework with comprehensive statistics and analysis capabilities.