To begin the project, a script was created to configure and launch the baseline pipeline in Gem5. The simulation was executed on the X86 DerivO3CPU model with a 2 GHz CPU and system clock, 32 KB L1 instruction and data caches, and a unified 1 MB L2 cache. The benchmark program used was memtouch, run in syscall emulation mode. The script defined output directories, cache parameters, and execution limits, ensuring reproducibility of the setup.

Output after running the command

The initial run produced a baseline performance snapshot. The measured IPC was ~0.05 (CPI ≈ 19.7), indicating extremely low throughput. Nearly 97% of cycles retired no instructions, showing that the pipeline was heavily stalled. Analysis of memory system statistics revealed an L1D miss rate close to 50%, with an average miss latency of ~78,000 ticks. These misses frequently propagated through the pipeline, creating bubbles and stalling progress. In contrast, branch prediction worked effectively, with a misprediction rate below 0.05%.

Cycle-by-cycle analysis of pipeline stages.

Cycle-level tracing (via --debug-flags=O3CPU,Fetch,Decode,Rename,IEW,Commit,Branch) revealed the interactions of the pipeline stages. The traces show frequent stalls at IEW and Commit, triggered by long-latency load misses that blocked dependent instructions. Fetch and Decode also experienced backpressure as the backend filled, illustrating how memory bottlenecks propagate through the pipeline. Overall, the cycle-by-cycle view confirmed that the pipeline’s performance limit was not due to execution width or branch handling, but due to structural and data hazards in the memory system. s

Metric	Value	Interpretation
Total Instructions (simInsts)	25,297,289	Benchmark retired ~25M instructions
Total Cycles (numCycles)	498,254,810	Simulation ran ~498M cycles
IPC	0.051	Very low throughput (pipeline mostly stalled)
CPI	19.74	Each instruction took ~20 cycles on average
Commit Histogram	~97% cycles at 0 commit	Pipeline idle most of the time (waiting on memory)
L1I Miss Rate	~0%	Instruction cache well-behaved
L1D Miss Rate	49.8%	Half of all data accesses missed L1
L1D MPKI	~124	Very high miss intensity (memory-bound workload)
Avg. L1D Miss Latency	~78,000 ticks	Memory stalls extremely long
L2 Hit Ratio	66%	1/3 of L2 misses → DRAM access
Branch Pred. Mispredict Rate	0.03%	Branching handled very well (not a bottleneck)
Instruction Mix	73% IntAlu, 27% memory ops	Heavy integer + memory workloa

· The baseline run shows severe memory bottlenecks: nearly half of L1D accesses miss, with miss penalties of tens of thousands of cycles.

· This results in IPC ≈ 0.05, with the pipeline committing zero instructions in ~97% of cycles.

· Branch prediction is highly accurate and does not contribute to stalls.

· Instruction mix is dominated by integer arithmetic and memory operations, with stores forming a large share.

· The cycle-by-cycle analysis confirms that structural and data hazards in the memory subsystem are the main performance limiter, not branch or execution resources.jggjjjj

README.md Unescape Escape

Pipeline Simulation

Set up and configuration in Gem5.

Output after running the command

Cycle-by-cycle analysis of pipeline stages.

README.md