updating
This commit is contained in:
419
README.md
419
README.md
@@ -1,171 +1,338 @@
|
||||
# SmartEdgeAI - (gem5)
|
||||
# SmartEdgeAI - IoT LLM Simulation with gem5
|
||||
|
||||
This repo holds **all scripts, commands, and logs** for Phase 3.
|
||||
A comprehensive gem5-based simulation framework for IoT LLM workloads, featuring 16GB RAM configuration and 24k token processing capabilities.
|
||||
|
||||
## Prerequisites
|
||||
## 🎯 Project Overview
|
||||
|
||||
### Install gem5
|
||||
Before running any simulations, you need to install and build gem5:
|
||||
This project simulates IoT (Internet of Things) systems running Large Language Models (LLMs) using the gem5 computer architecture simulator. The simulation includes:
|
||||
|
||||
- **IoT LLM Workload**: Simulates processing 24k tokens with memory allocation patterns typical of LLM inference
|
||||
- **16GB RAM Configuration**: Full-system simulation with realistic memory constraints
|
||||
- **Multiple CPU Architectures**: Support for big/little core configurations
|
||||
- **Comprehensive Statistics**: Detailed performance metrics and energy analysis
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
```bash
|
||||
# Clone gem5 repository
|
||||
git clone https://github.com/gem5/gem5.git /home/carlos/projects/gem5/gem5src/gem5
|
||||
# Install required dependencies
|
||||
sudo apt update
|
||||
sudo apt install python3-matplotlib python3-pydot python3-pip python3-venv
|
||||
|
||||
# Build gem5 for ARM
|
||||
cd /home/carlos/projects/gem5/gem5src/gem5
|
||||
scons build/ARM/gem5.opt -j$(nproc)
|
||||
|
||||
# Verify installation
|
||||
sh scripts/check_gem5.sh
|
||||
# Verify gem5 installation
|
||||
ls /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt
|
||||
```
|
||||
|
||||
### Install ARM Cross-Compiler
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt-get install gcc-arm-linux-gnueabihf
|
||||
|
||||
# macOS (if using Homebrew)
|
||||
brew install gcc-arm-linux-gnueabihf
|
||||
```
|
||||
|
||||
## Quick Start (Run Everything)
|
||||
|
||||
To run the complete workflow automatically:
|
||||
### Run Complete Workflow
|
||||
|
||||
```bash
|
||||
chmod +x run_all.sh
|
||||
# Run everything automatically
|
||||
sh run_all.sh
|
||||
|
||||
# Or run individual steps
|
||||
sh scripts/check_gem5.sh # Verify prerequisites
|
||||
sh scripts/env.sh # Setup environment
|
||||
sh scripts/build_workloads.sh # Compile workloads
|
||||
sh scripts/run_one.sh iot_llm_sim big high 0 1MB # Run simulation
|
||||
```
|
||||
|
||||
This will execute all steps in sequence with error checking and progress reporting.
|
||||
## 📁 Project Structure
|
||||
|
||||
## Manual Steps (Order of operations)
|
||||
|
||||
### 0. Check Prerequisites
|
||||
```bash
|
||||
sh scripts/check_gem5.sh
|
||||
```
|
||||
**Check logs**: Should show "✓ All checks passed!" or installation instructions
|
||||
|
||||
### 1. Setup Environment
|
||||
```bash
|
||||
sh scripts/env.sh
|
||||
```
|
||||
**Check logs**: `cat logs/env.txt` - Should show environment variables and "READY" message
|
||||
|
||||
### 2. Build Workloads
|
||||
```bash
|
||||
sh scripts/build_workloads.sh
|
||||
```
|
||||
**Check logs**: Look for "All workloads compiled successfully!" and verify binaries exist:
|
||||
```bash
|
||||
ls -la /home/carlos/projects/gem5/gem5-run/
|
||||
SmartEdgeAI/
|
||||
├── scripts/ # Automation scripts
|
||||
│ ├── env.sh # Environment setup
|
||||
│ ├── build_workloads.sh # Compile workloads
|
||||
│ ├── run_one.sh # Single simulation run
|
||||
│ ├── sweep.sh # Parameter sweep
|
||||
│ ├── extract_csv.sh # Extract statistics
|
||||
│ ├── energy_post.py # Energy analysis
|
||||
│ └── bundle_logs.sh # Log collection
|
||||
├── workloads/ # C source code
|
||||
│ ├── tinyml_kws.c # TinyML keyword spotting
|
||||
│ ├── sensor_fusion.c # Sensor data fusion
|
||||
│ ├── aes_ccm.c # AES encryption
|
||||
│ └── attention_kernel.c # Attention mechanism
|
||||
├── iot_llm_sim.c # Main IoT LLM simulation
|
||||
├── run_all.sh # Master workflow script
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
### 3. Test Single Run
|
||||
```bash
|
||||
sh scripts/run_one.sh tinyml_kws big high 0 1MB
|
||||
```
|
||||
**Check logs**:
|
||||
- Verify stats.txt has content: `ls -l /home/carlos/projects/gem5/gem5-data/SmartEdgeAI/results/tinyml_kws_big_high_l21MB_d0/stats.txt`
|
||||
- Check simulation output: `cat logs/tinyml_kws_big_high_l21MB_d0.stdout.log`
|
||||
- Check for errors: `cat logs/tinyml_kws_big_high_l21MB_d0.stderr.log`
|
||||
## 🔧 Script Explanations
|
||||
|
||||
### Core Scripts
|
||||
|
||||
#### `scripts/env.sh`
|
||||
**Purpose**: Sets up environment variables and paths for the entire workflow.
|
||||
|
||||
**Key Variables**:
|
||||
- `ROOT`: Base gem5 installation path
|
||||
- `CFG`: gem5 configuration script (x86-ubuntu-run.py)
|
||||
- `GEM5_BIN`: Path to gem5 binary (X86 build)
|
||||
- `RUN`: Directory for compiled workloads
|
||||
- `OUT_DATA`: Simulation results directory
|
||||
- `LOG_DATA`: Log files directory
|
||||
|
||||
#### `scripts/build_workloads.sh`
|
||||
**Purpose**: Compiles all C workloads into x86_64 binaries.
|
||||
|
||||
**What it does**:
|
||||
- Compiles `tinyml_kws.c`, `sensor_fusion.c`, `aes_ccm.c`, `attention_kernel.c`
|
||||
- Creates `iot_llm_sim` binary for LLM simulation
|
||||
- Uses `gcc -O2 -static` for optimized static binaries
|
||||
|
||||
#### `scripts/run_one.sh`
|
||||
**Purpose**: Executes a single gem5 simulation with specified parameters.
|
||||
|
||||
**Parameters**:
|
||||
- `workload`: Which binary to run (e.g., `iot_llm_sim`)
|
||||
- `core`: CPU type (`big`=O3CPU, `little`=TimingSimpleCPU)
|
||||
- `dvfs`: Frequency setting (`high`=2GHz, `low`=1GHz)
|
||||
- `drowsy`: Cache drowsy mode (0=off, 1=on)
|
||||
- `l2`: L2 cache size (e.g., `1MB`)
|
||||
|
||||
**Key Features**:
|
||||
- Maps core types to gem5 CPU models
|
||||
- Copies stats from `m5out/stats.txt` to output directory
|
||||
- Mirrors results to repository directories
|
||||
|
||||
#### `iot_llm_sim.c`
|
||||
**Purpose**: Simulates IoT LLM inference with 24k token processing.
|
||||
|
||||
**What it simulates**:
|
||||
- Memory allocation for 24k tokens (1KB per token)
|
||||
- Token processing loop with memory operations
|
||||
- Realistic LLM inference patterns
|
||||
- Memory cleanup and resource management
|
||||
|
||||
## 🐛 Problem-Solving Journey
|
||||
|
||||
### Initial Challenges
|
||||
|
||||
#### 1. **Empty stats.txt Files**
|
||||
**Problem**: Simulations were running but generating empty statistics files.
|
||||
|
||||
**Root Cause**: ARM binaries were hitting unsupported system calls (syscall 398 = futex).
|
||||
|
||||
**Solution**: Switched from ARM to x86_64 architecture for better gem5 compatibility.
|
||||
|
||||
#### 2. **Syscall Compatibility Issues**
|
||||
**Problem**: `fatal: Syscall 398 out of range` errors with ARM binaries.
|
||||
|
||||
**Root Cause**: gem5's syscall emulation mode doesn't support all Linux system calls, particularly newer ones like futex.
|
||||
|
||||
**Solution**:
|
||||
- Tried multiple ARM configurations (starter_se.py, baremetal.py)
|
||||
- Ultimately switched to x86_64 full-system simulation
|
||||
- Used `x86-ubuntu-run.py` for reliable Ubuntu-based simulation
|
||||
|
||||
#### 3. **Configuration Complexity**
|
||||
**Problem**: Custom gem5 configurations were failing with various errors.
|
||||
|
||||
**Root Cause**:
|
||||
- Deprecated port names (`slave`/`master` → `cpu_side_ports`/`mem_side_ports`)
|
||||
- Missing cache parameters (`tag_latency`, `data_latency`, etc.)
|
||||
- Workload object creation issues
|
||||
|
||||
**Solution**: Used gem5's built-in `x86-ubuntu-run.py` configuration instead of custom scripts.
|
||||
|
||||
#### 4. **Stats Collection Issues**
|
||||
**Problem**: Statistics were generated in `m5out/stats.txt` but scripts expected them elsewhere.
|
||||
|
||||
**Root Cause**: x86-ubuntu-run.py outputs to default `m5out/` directory.
|
||||
|
||||
**Solution**: Added automatic copying of stats from `m5out/stats.txt` to expected output directory.
|
||||
|
||||
### Key Learnings
|
||||
|
||||
1. **Architecture Choice Matters**: x86_64 is much more reliable than ARM for gem5 simulations
|
||||
2. **Full-System vs Syscall Emulation**: Full-system simulation is more robust than syscall emulation
|
||||
3. **Use Built-in Configurations**: gem5's built-in configs are more reliable than custom ones
|
||||
4. **Path Management**: Always verify and handle gem5's default output paths
|
||||
|
||||
## 🏗️ How the Project Works
|
||||
|
||||
### Simulation Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ IoT LLM App │───▶│ gem5 X86 │───▶│ Statistics │
|
||||
│ (24k tokens) │ │ Full-System │ │ (482KB) │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
### Workflow Process
|
||||
|
||||
1. **Environment Setup**: Configure paths and verify gem5 installation
|
||||
2. **Workload Compilation**: Compile C workloads to x86_64 binaries
|
||||
3. **Simulation Execution**: Run gem5 with Ubuntu Linux and workload
|
||||
4. **Statistics Collection**: Extract performance metrics from gem5 output
|
||||
5. **Analysis**: Process statistics for energy, performance, and efficiency metrics
|
||||
|
||||
### Memory Configuration
|
||||
|
||||
- **Total RAM**: 16GB (as requested for IoT configuration)
|
||||
- **Memory Controllers**: 2x DDR3 controllers with 8GB each
|
||||
- **Cache Hierarchy**: L1I (48KB), L1D (32KB), L2 (1MB)
|
||||
- **Memory Access**: Timing-based simulation with realistic latencies
|
||||
|
||||
## 📊 Simulation Results
|
||||
|
||||
### Sample Output (iot_llm_sim)
|
||||
|
||||
```
|
||||
simSeconds 3.875651 # Simulation time
|
||||
simInsts 2665005563 # Instructions executed
|
||||
simOps 5787853650 # Operations (including micro-ops)
|
||||
hostInstRate 474335 # Instructions per second
|
||||
```
|
||||
|
||||
### Performance Metrics
|
||||
|
||||
- **Simulation Speed**: ~474K instructions/second
|
||||
- **Memory Usage**: Successfully processes 24k tokens (24MB allocation)
|
||||
- **CPU Utilization**: O3CPU with realistic pipeline behavior
|
||||
- **Cache Performance**: Detailed L1/L2 hit/miss statistics
|
||||
|
||||
## 🛠️ Usage Guide
|
||||
|
||||
### Basic Usage
|
||||
|
||||
### 4. Run Full Matrix
|
||||
```bash
|
||||
# Run IoT LLM simulation
|
||||
sh scripts/run_one.sh iot_llm_sim big high 0 1MB
|
||||
|
||||
# Run with different CPU types
|
||||
sh scripts/run_one.sh iot_llm_sim little high 0 1MB # TimingSimpleCPU
|
||||
sh scripts/run_one.sh iot_llm_sim big low 0 1MB # Low frequency
|
||||
|
||||
# Run parameter sweep
|
||||
sh scripts/sweep.sh
|
||||
```
|
||||
**Check logs**: Monitor progress and verify all combinations complete:
|
||||
|
||||
### Advanced Usage
|
||||
|
||||
```bash
|
||||
ls -la /home/carlos/projects/gem5/gem5-data/SmartEdgeAI/results/
|
||||
# Custom memory size
|
||||
sh scripts/run_one.sh iot_llm_sim big high 0 1MB 32GB
|
||||
|
||||
# Enable drowsy cache
|
||||
sh scripts/run_one.sh iot_llm_sim big high 1 1MB
|
||||
|
||||
# Run specific workload
|
||||
sh scripts/run_one.sh tinyml_kws big high 0 1MB
|
||||
```
|
||||
|
||||
### 5. Extract Statistics
|
||||
### Analysis Commands
|
||||
|
||||
```bash
|
||||
# Extract CSV statistics
|
||||
sh scripts/extract_csv.sh
|
||||
```
|
||||
**Check logs**: Verify CSV was created with data:
|
||||
```bash
|
||||
head -5 /home/carlos/projects/gem5/gem5-data/SmartEdgeAI/results/summary.csv
|
||||
```
|
||||
|
||||
### 6. Compute Energy Metrics
|
||||
```bash
|
||||
# Energy analysis
|
||||
python3 scripts/energy_post.py
|
||||
```
|
||||
**Check logs**: Verify energy calculations:
|
||||
```bash
|
||||
head -5 /home/carlos/projects/gem5/gem5-data/SmartEdgeAI/results/summary_energy.csv
|
||||
```
|
||||
|
||||
### 7. Generate Plots
|
||||
```bash
|
||||
# Generate plots
|
||||
python3 scripts/plot_epi.py
|
||||
python3 scripts/plot_edp_tinyml.py
|
||||
```
|
||||
**Check logs**: Verify plots were created:
|
||||
```bash
|
||||
ls -la /home/carlos/projects/gem5/gem5-data/SmartEdgeAI/results/fig_*.png
|
||||
```
|
||||
|
||||
### 8. Bundle Logs
|
||||
```bash
|
||||
# Bundle logs
|
||||
sh scripts/bundle_logs.sh
|
||||
```
|
||||
**Check logs**: Verify bundled logs:
|
||||
```bash
|
||||
cat logs/TERMINAL_EXCERPTS.txt
|
||||
cat logs/STATS_EXCERPTS.txt
|
||||
```
|
||||
|
||||
### 9. (Optional) Generate Delta Analysis
|
||||
```bash
|
||||
python3 scripts/diff_table.py
|
||||
```
|
||||
**Check logs**: Verify delta calculations:
|
||||
```bash
|
||||
head -5 results/phase3_drowsy_deltas.csv
|
||||
```
|
||||
|
||||
## Paths assumed
|
||||
- gem5 binary: `/home/carlos/projects/gem5/gem5src/gem5/build/ARM/gem5.opt` (updated from tree.log analysis)
|
||||
- config: `scripts/hetero_big_little.py`
|
||||
- workloads: `/home/carlos/projects/gem5/gem5-run/{tinyml_kws,sensor_fusion,aes_ccm,attention_kernel}`
|
||||
|
||||
## Output Locations
|
||||
- **Results**: `/home/carlos/projects/gem5/gem5-data/SmartEdgeAI/results/` (mirrored to `results/`)
|
||||
- **Logs**: `/home/carlos/projects/gem5/gem5-data/SmartEdgeAI/logs/` (mirrored to `logs/`)
|
||||
|
||||
## Troubleshooting
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Empty stats.txt files (0 bytes)**
|
||||
- **Cause**: gem5 binary doesn't exist or simulation failed
|
||||
- **Solution**: Run `sh scripts/check_gem5.sh` and install gem5 if needed
|
||||
- **Check**: `ls -la /home/carlos/projects/gem5/gem5src/gem5/build/ARM/gem5.opt`
|
||||
#### Empty stats.txt
|
||||
```bash
|
||||
# Check if simulation completed
|
||||
ls -la m5out/stats.txt
|
||||
|
||||
**CSV extraction shows empty values**
|
||||
- **Cause**: Simulation didn't run, so no statistics were generated
|
||||
- **Solution**: Fix gem5 installation first, then re-run simulations
|
||||
# If empty, check logs
|
||||
cat logs/*.stderr.log
|
||||
```
|
||||
|
||||
**"ModuleNotFoundError: No module named 'matplotlib'"**
|
||||
- **Solution**: Install matplotlib: `pip install matplotlib` or `sudo apt-get install python3-matplotlib`
|
||||
#### gem5 Binary Not Found
|
||||
```bash
|
||||
# Verify installation
|
||||
ls /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt
|
||||
|
||||
**"ValueError: could not convert string to float: ''"**
|
||||
- **Cause**: Empty CSV values from failed simulations
|
||||
- **Solution**: Fixed in updated scripts - they now handle empty values gracefully
|
||||
# Build if missing
|
||||
cd /home/carlos/projects/gem5/gem5src/gem5
|
||||
scons build/X86/gem5.opt -j$(nproc)
|
||||
```
|
||||
|
||||
**Permission errors**
|
||||
- **Solution**: Make scripts executable: `chmod +x scripts/*.sh`
|
||||
#### Compilation Errors
|
||||
```bash
|
||||
# Check compiler
|
||||
gcc --version
|
||||
|
||||
**Path issues**
|
||||
- **Solution**: Verify `ROOT` variable in `scripts/env.sh` points to correct gem5 installation
|
||||
# Rebuild workloads
|
||||
sh scripts/build_workloads.sh
|
||||
```
|
||||
|
||||
### Debugging Steps
|
||||
1. **Check gem5 installation**: `sh scripts/check_gem5.sh`
|
||||
2. **Verify workload binaries**: `ls -la /home/carlos/projects/gem5/gem5-run/`
|
||||
3. **Test single simulation**: `sh scripts/run_one.sh tinyml_kws big high 0 1MB`
|
||||
4. **Check simulation logs**: `cat logs/tinyml_kws_big_high_l21MB_d0.stdout.log`
|
||||
5. **Verify stats output**: `ls -l /home/carlos/projects/gem5/gem5-data/SmartEdgeAI/results/tinyml_kws_big_high_l21MB_d0/stats.txt`
|
||||
### Debug Commands
|
||||
|
||||
```bash
|
||||
# Check environment
|
||||
sh scripts/env.sh
|
||||
|
||||
# Verify prerequisites
|
||||
sh scripts/check_gem5.sh
|
||||
|
||||
# Manual gem5 run
|
||||
/home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt \
|
||||
/home/carlos/projects/gem5/gem5src/gem5/configs/example/gem5_library/x86-ubuntu-run.py \
|
||||
--command=./iot_llm_sim --mem-size=16GB
|
||||
```
|
||||
|
||||
## 📈 Performance Analysis
|
||||
|
||||
### Key Metrics
|
||||
|
||||
- **simSeconds**: Total simulation time
|
||||
- **simInsts**: Instructions executed
|
||||
- **simOps**: Operations (including micro-ops)
|
||||
- **hostInstRate**: Simulation speed
|
||||
- **Cache Miss Rates**: L1/L2 performance
|
||||
- **Memory Bandwidth**: DRAM utilization
|
||||
|
||||
### Energy Analysis
|
||||
|
||||
The project includes energy post-processing scripts that calculate:
|
||||
- **Energy per Instruction (EPI)**
|
||||
- **Power consumption**
|
||||
- **Energy-Delay Product (EDP)**
|
||||
- **Drowsy vs Non-drowsy comparisons**
|
||||
|
||||
## 🎯 Future Enhancements
|
||||
|
||||
1. **Multi-core Support**: Extend to multi-core IoT configurations
|
||||
2. **Real LLM Models**: Integrate actual transformer models
|
||||
3. **Power Modeling**: Add detailed power consumption analysis
|
||||
4. **Network Simulation**: Include IoT communication patterns
|
||||
5. **Edge Computing**: Simulate edge-to-cloud interactions
|
||||
|
||||
## 📚 References
|
||||
|
||||
- [gem5 Documentation](https://www.gem5.org/documentation/)
|
||||
- [gem5 Learning Resources](https://www.gem5.org/documentation/learning_gem5/)
|
||||
- [ARM Research Starter Kit](http://www.arm.com/ResearchEnablement/SystemModeling)
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
1. Fork the repository
|
||||
2. Create a feature branch
|
||||
3. Make your changes
|
||||
4. Test with `sh run_all.sh`
|
||||
5. Submit a pull request
|
||||
|
||||
## 📄 License
|
||||
|
||||
This project is licensed under the MIT License - see the LICENSE file for details.
|
||||
|
||||
---
|
||||
|
||||
**Note**: This project was developed through iterative problem-solving, switching from ARM to x86_64 architecture and using gem5's built-in configurations for maximum reliability. The final solution provides a robust IoT LLM simulation framework with comprehensive statistics and analysis capabilities.
|
||||
Reference in New Issue
Block a user