This project simulates IoT (Internet of Things) systems running Large Language Models (LLMs) using the gem5 computer architecture simulator. The simulation includes:

IoT LLM Workload: Simulates processing 24k tokens with memory allocation patterns typical of LLM inference
16GB RAM Configuration: Full-system simulation with realistic memory constraints
Multiple CPU Architectures: Support for big/little core configurations
Comprehensive Statistics: Detailed performance metrics and energy analysis

🚀 Quick Start

Prerequisites

# Install required dependencies
sudo apt update
sudo apt install python3-matplotlib python3-pydot python3-pip python3-venv

# Verify gem5 installation
ls /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt

Run Complete Workflow

# Run everything automatically
sh run_all.sh

# Or run individual steps
sh scripts/check_gem5.sh      # Verify prerequisites
sh scripts/env.sh             # Setup environment
sh scripts/build_workloads.sh # Compile workloads
sh scripts/run_one.sh iot_llm_sim big high 0 1MB  # Run simulation

📁 Project Structure

SmartEdgeAI/
├── scripts/                    # Automation scripts
│   ├── env.sh                 # Environment setup
│   ├── build_workloads.sh     # Compile workloads
│   ├── run_one.sh            # Single simulation run
│   ├── sweep.sh              # Parameter sweep
│   ├── extract_csv.sh        # Extract statistics
│   ├── energy_post.py        # Energy analysis
│   └── bundle_logs.sh        # Log collection
├── workloads/                 # C source code
│   ├── tinyml_kws.c          # TinyML keyword spotting
│   ├── sensor_fusion.c       # Sensor data fusion
│   ├── aes_ccm.c            # AES encryption
│   └── attention_kernel.c   # Attention mechanism
├── iot_llm_sim.c             # Main IoT LLM simulation
├── run_all.sh                # Master workflow script
└── README.md                 # This file

🔧 Script Explanations

Core Scripts

`scripts/env.sh`

Purpose: Sets up environment variables and paths for the entire workflow.

Key Variables:

ROOT: Base gem5 installation path
CFG: gem5 configuration script (x86-ubuntu-run.py)
GEM5_BIN: Path to gem5 binary (X86 build)
RUN: Directory for compiled workloads
OUT_DATA: Simulation results directory
LOG_DATA: Log files directory

`scripts/build_workloads.sh`

Purpose: Compiles all C workloads into x86_64 binaries.

What it does:

Compiles tinyml_kws.c, sensor_fusion.c, aes_ccm.c, attention_kernel.c
Creates iot_llm_sim binary for LLM simulation
Uses gcc -O2 -static for optimized static binaries

`scripts/run_one.sh`

Purpose: Executes a single gem5 simulation with specified parameters.

Parameters:

workload: Which binary to run (e.g., iot_llm_sim)
core: CPU type (big=O3CPU, little=TimingSimpleCPU)
dvfs: Frequency setting (high=2GHz, low=1GHz)
drowsy: Cache drowsy mode (0=off, 1=on)
l2: L2 cache size (e.g., 1MB)

Key Features:

Maps core types to gem5 CPU models
Copies stats from m5out/stats.txt to output directory
Mirrors results to repository directories

`iot_llm_sim.c`

Purpose: Simulates IoT LLM inference with 24k token processing.

What it simulates:

Memory allocation for 24k tokens (1KB per token)
Token processing loop with memory operations
Realistic LLM inference patterns
Memory cleanup and resource management

🐛 Problem-Solving Journey

Initial Challenges

1. Empty stats.txt Files

Problem: Simulations were running but generating empty statistics files.

Root Cause: ARM binaries were hitting unsupported system calls (syscall 398 = futex).

Solution: Switched from ARM to x86_64 architecture for better gem5 compatibility.

2. Syscall Compatibility Issues

Problem: fatal: Syscall 398 out of range errors with ARM binaries.

Root Cause: gem5's syscall emulation mode doesn't support all Linux system calls, particularly newer ones like futex.

Solution:

Tried multiple ARM configurations (starter_se.py, baremetal.py)
Ultimately switched to x86_64 full-system simulation
Used x86-ubuntu-run.py for reliable Ubuntu-based simulation

3. Configuration Complexity

Problem: Custom gem5 configurations were failing with various errors.

Root Cause:

Deprecated port names (slave/master → cpu_side_ports/mem_side_ports)
Missing cache parameters (tag_latency, data_latency, etc.)
Workload object creation issues

Solution: Used gem5's built-in x86-ubuntu-run.py configuration instead of custom scripts.

4. Stats Collection Issues

Problem: Statistics were generated in m5out/stats.txt but scripts expected them elsewhere.

Root Cause: x86-ubuntu-run.py outputs to default m5out/ directory.

Solution: Added automatic copying of stats from m5out/stats.txt to expected output directory.

Key Learnings

Architecture Choice Matters: x86_64 is much more reliable than ARM for gem5 simulations
Full-System vs Syscall Emulation: Full-system simulation is more robust than syscall emulation
Use Built-in Configurations: gem5's built-in configs are more reliable than custom ones
Path Management: Always verify and handle gem5's default output paths

🏗️ How the Project Works

Simulation Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   IoT LLM App   │───▶│   gem5 X86     │───▶│   Statistics    │
│   (24k tokens)  │    │   Full-System   │    │   (482KB)       │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Workflow Process

Environment Setup: Configure paths and verify gem5 installation
Workload Compilation: Compile C workloads to x86_64 binaries
Simulation Execution: Run gem5 with Ubuntu Linux and workload
Statistics Collection: Extract performance metrics from gem5 output
Analysis: Process statistics for energy, performance, and efficiency metrics

Memory Configuration

Total RAM: 16GB (as requested for IoT configuration)
Memory Controllers: 2x DDR3 controllers with 8GB each
Cache Hierarchy: L1I (48KB), L1D (32KB), L2 (1MB)
Memory Access: Timing-based simulation with realistic latencies

📊 Simulation Results

Sample Output (iot_llm_sim)

simSeconds                                   3.875651  # Simulation time
simInsts                                   2665005563  # Instructions executed
simOps                                     5787853650  # Operations (including micro-ops)
hostInstRate                                   474335  # Instructions per second

Performance Metrics

Simulation Speed: ~474K instructions/second
Memory Usage: Successfully processes 24k tokens (24MB allocation)
CPU Utilization: O3CPU with realistic pipeline behavior
Cache Performance: Detailed L1/L2 hit/miss statistics

🛠️ Usage Guide

Basic Usage

# Run IoT LLM simulation
sh scripts/run_one.sh iot_llm_sim big high 0 1MB

# Run with different CPU types
sh scripts/run_one.sh iot_llm_sim little high 0 1MB  # TimingSimpleCPU
sh scripts/run_one.sh iot_llm_sim big low 0 1MB     # Low frequency

# Run parameter sweep
sh scripts/sweep.sh

Advanced Usage

# Custom memory size
sh scripts/run_one.sh iot_llm_sim big high 0 1MB 32GB

# Enable drowsy cache
sh scripts/run_one.sh iot_llm_sim big high 1 1MB

# Run specific workload
sh scripts/run_one.sh tinyml_kws big high 0 1MB

Analysis Commands

# Extract CSV statistics
sh scripts/extract_csv.sh

# Energy analysis
python3 scripts/energy_post.py

# Generate plots
python3 scripts/plot_epi.py
python3 scripts/plot_edp_tinyml.py

# Bundle logs
sh scripts/bundle_logs.sh

🔍 Troubleshooting

Common Issues

Empty stats.txt

# Check if simulation completed
ls -la m5out/stats.txt

# If empty, check logs
cat logs/*.stderr.log

gem5 Binary Not Found

# Verify installation
ls /home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt

# Build if missing
cd /home/carlos/projects/gem5/gem5src/gem5
scons build/X86/gem5.opt -j$(nproc)

Compilation Errors

# Check compiler
gcc --version

# Rebuild workloads
sh scripts/build_workloads.sh

Debug Commands

# Check environment
sh scripts/env.sh

# Verify prerequisites
sh scripts/check_gem5.sh

# Manual gem5 run
/home/carlos/projects/gem5/gem5src/gem5/build/X86/gem5.opt \
  /home/carlos/projects/gem5/gem5src/gem5/configs/example/gem5_library/x86-ubuntu-run.py \
  --command=./iot_llm_sim --mem-size=16GB

📈 Performance Analysis

Key Metrics

simSeconds: Total simulation time
simInsts: Instructions executed
simOps: Operations (including micro-ops)
hostInstRate: Simulation speed
Cache Miss Rates: L1/L2 performance
Memory Bandwidth: DRAM utilization

Energy Analysis

The project includes energy post-processing scripts that calculate:

Energy per Instruction (EPI)
Power consumption
Energy-Delay Product (EDP)
Drowsy vs Non-drowsy comparisons

🎯 Future Enhancements

Multi-core Support: Extend to multi-core IoT configurations
Real LLM Models: Integrate actual transformer models
Power Modeling: Add detailed power consumption analysis
Network Simulation: Include IoT communication patterns
Edge Computing: Simulate edge-to-cloud interactions

📚 References

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test with sh run_all.sh
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Note: This project was developed through iterative problem-solving, switching from ARM to x86_64 architecture and using gem5's built-in configurations for maximum reliability. The final solution provides a robust IoT LLM simulation framework with comprehensive statistics and analysis capabilities.