Files
sheepOp/docs/NEURAL_NETWORK_EXPLAINED.md
Carlos Gutierrez 3d2da94ce2 Initial commit: SheepOp LLM - Transformer-based language model implementation
- Complete transformer implementation from scratch
- Training pipeline with gradient accumulation and mixed precision
- Optimized inference with KV caching
- Multi-format data processing (PDFs, images, code, text)
- Comprehensive documentation
- Apache 2.0 license
- Example training plots included in docs/images/
2025-11-06 22:07:41 -05:00

18 KiB
Raw Permalink Blame History

What is a Neural Network? Step-by-Step Explanation

Complete step-by-step explanation of neural networks: what neurons are, what weights are, how calculations work, why they're important, with mathematical derivations and solved exercises.

Table of Contents

  1. What is a Neural Network?
  2. What is a Neuron?
  3. What are Weights?
  4. How Neurons Calculate
  5. Why Weights are Important
  6. Complete Mathematical Formulation
  7. Multi-Layer Neural Networks
  8. Exercise 1: Single Neuron Calculation
  9. Exercise 2: Multi-Layer Network
  10. Exercise 3: Learning Weights
  11. Key Takeaways

6.1 What is a Neural Network?

Simple Definition

A neural network is a computational model inspired by biological neurons that processes information through interconnected nodes (neurons) to make predictions or decisions.

Visual Analogy

Think of a neural network like a factory:

Input → Worker 1 → Worker 2 → Worker 3 → Output

Neural Network:

Input → Neuron 1 → Neuron 2 → Neuron 3 → Output

Each worker (neuron) does a specific job, and they work together to produce the final result.

Basic Structure

Input Layer      Hidden Layer      Output Layer
     ●               ●                 ●
     ●               ●                 ●
     ●               ●                 ●
     ●               ●

Key Components:

  • Input Layer: Receives data
  • Hidden Layers: Process information
  • Output Layer: Produces predictions
  • Connections: Weights between neurons

6.2 What is a Neuron?

Simple Definition

A neuron (also called a node or unit) is the basic processing unit of a neural network. It receives inputs, performs calculations, and produces an output.

Biological Inspiration

Biological Neuron:

Dendrites → Cell Body → Axon → Synapses
(inputs)    (process)   (output) (connections)

Artificial Neuron:

Inputs → Weighted Sum → Activation → Output

Structure of a Neuron

Input 1 (x₁) ────┐
                 │
Input 2 (x₂) ────┼──→ [Σ] ─→ [f] ─→ Output (y)
                 │
Input 3 (x₃) ────┘

Components:

  1. Inputs: Values fed into the neuron
  2. Weights: Strength of connections
  3. Weighted Sum: Sum of inputs × weights
  4. Bias: Added constant
  5. Activation Function: Applies nonlinearity
  6. Output: Final result

Visual Representation

Neuron:
    ┌─────────────────────┐
    │  Inputs: x₁, x₂, x₃ │
    │  Weights: w₁, w₂, w₃│
    │                     │
    │  z = Σ(xᵢ × wᵢ) + b │
    │  y = f(z)           │
    │                     │
    │  Output: y          │
    └─────────────────────┘

Where:

  • z = weighted sum (before activation)
  • f = activation function
  • y = output (after activation)

6.3 What are Weights?

Simple Definition

Weights are numerical values that determine the strength of connections between neurons. They control how much each input contributes to the output.

Visual Analogy

Think of weights like volume controls:

Music Source 1 ──[Volume: 0.8]──→ Speakers
Music Source 2 ──[Volume: 0.3]──→ Speakers
Music Source 3 ──[Volume: 0.5]──→ Speakers

Higher weight = Louder contribution

Neural Network:

Input 1 ──[Weight: 0.8]──→ Neuron
Input 2 ──[Weight: 0.3]──→ Neuron
Input 3 ──[Weight: 0.5]──→ Neuron

Higher weight = Stronger influence

What Weights Do

Weights determine:

  1. How much each input matters
  2. The relationship between inputs and outputs
  3. What patterns the neuron learns

Example:

Weight = 0.1:

  • Input has small influence
  • Weak connection

Weight = 5.0:

  • Input has large influence
  • Strong connection

Weight = -2.0:

  • Input has negative influence
  • Inverts the relationship

Weight = 0.0:

  • Input has no influence
  • Connection is cut

Weight Matrix

In a layer with multiple neurons:

Input Layer          Weights Matrix      Output Layer
x₁ ───────────────────┐
                      │   w₁₁  w₁₂       y₁
x₂ ───────────────────┼─  w₂₁  w₂₂  ──── y₂
                      │   w₃₁  w₃₂
x₃ ───────────────────┘

Weight Matrix:

W = [w₁₁  w₁₂]
    [w₂₁  w₂₂]
    [w₃₁  w₃₂]

Each row: Connections from one input
Each column: Connections to one output


6.4 How Neurons Calculate

Step-by-Step Calculation

Step 1: Weighted Sum

Multiply each input by its weight:

z = x_1 \times w_1 + x_2 \times w_2 + x_3 \times w_3 + ... + b

Or in vector form:

z = \mathbf{x} \cdot \mathbf{w} + b = \sum_{i=1}^{n} x_i w_i + b

Where:

  • x_i = input value
  • w_i = weight for input i
  • b = bias (constant)
  • n = number of inputs

Step 2: Add Bias

Bias shifts the activation:

z = \sum_{i=1}^{n} x_i w_i + b

Bias allows the neuron to:

  • Shift activation threshold
  • Learn patterns independent of inputs
  • Adjust baseline output

Step 3: Apply Activation Function

Apply nonlinear function:

y = f(z)

Common activation functions:

ReLU (Rectified Linear Unit):

f(z) = \max(0, z)

Sigmoid:

f(z) = \frac{1}{1 + e^{-z}}

Tanh:

f(z) = \tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}

GELU (used in transformers):

f(z) = z \cdot \Phi(z)

Where \Phi(z) is the CDF of standard normal distribution

Complete Example

Given:

  • Inputs: x_1 = 0.5, x_2 = 0.3, x_3 = 0.8
  • Weights: w_1 = 0.6, w_2 = 0.4, w_3 = 0.2
  • Bias: b = 0.1
  • Activation: ReLU

Step 1: Weighted Sum

z = (0.5 × 0.6) + (0.3 × 0.4) + (0.8 × 0.2) + 0.1
  = 0.3 + 0.12 + 0.16 + 0.1
  = 0.68

Step 2: Apply Activation

y = ReLU(0.68)
  = max(0, 0.68)
  = 0.68

Result: Output = 0.68


6.5 Why Weights are Important

Reason 1: They Determine What the Neuron Learns

Different weights = Different patterns:

Pattern 1: Emphasis on Input 1

w₁ = 5.0, w₂ = 0.1, w₃ = 0.1
→ Neuron cares mostly about input 1

Pattern 2: Balanced Weights

w₁ = 0.5, w₂ = 0.5, w₃ = 0.5
→ Neuron treats all inputs equally

Pattern 3: Inverted Relationship

w₁ = -2.0, w₂ = 1.0, w₃ = 1.0
→ Neuron inverses input 1's effect

Reason 2: They Enable Learning

Training adjusts weights:

Before Training:

Weights: Random values
→ Random predictions

After Training:

Weights: Learned values
→ Accurate predictions

Weights are what the model learns!

Reason 3: They Control Information Flow

High weights: Information flows easily
Low weights: Information flows weakly
Zero weights: Information blocked
Negative weights: Information inverted

Reason 4: They Enable Complex Patterns

Multiple neurons with different weights:

Neuron 1: w₁ = 1.0, w₂ = 0.0 → Detects pattern A
Neuron 2: w₁ = 0.0, w₂ = 1.0 → Detects pattern B
Neuron 3: w₁ = 0.5, w₂ = 0.5 → Detects pattern C

Together: Model learns complex relationships!


6.6 Complete Mathematical Formulation

Single Neuron Formula

Complete neuron calculation:

z = \sum_{i=1}^{n} x_i w_i + b
y = f(z)

Where:

  • \mathbf{x} = [x_1, x_2, ..., x_n] = input vector
  • \mathbf{w} = [w_1, w_2, ..., w_n] = weight vector
  • b = bias (scalar)
  • f = activation function
  • z = weighted sum (before activation)
  • y = output (after activation)

Matrix Formulation

For multiple neurons:

\mathbf{z} = \mathbf{X} \mathbf{W} + \mathbf{b}
\mathbf{Y} = f(\mathbf{z})

Where:

  • \mathbf{X} \in \mathbb{R}^{B \times n} = input matrix (B samples, n features)
  • \mathbf{W} \in \mathbb{R}^{n \times m} = weight matrix (n inputs, m neurons)
  • \mathbf{b} \in \mathbb{R}^{1 \times m} = bias vector
  • \mathbf{z} \in \mathbb{R}^{B \times m} = weighted sums
  • \mathbf{Y} \in \mathbb{R}^{B \times m} = outputs

Example:

Input Matrix:

X = [x₁₁  x₁₂]  (2 samples, 2 features)
    [x₂₁  x₂₂]

Weight Matrix:

W = [w₁₁  w₁₂]  (2 inputs, 2 neurons)
    [w₂₁  w₂₂]

Bias Vector:

b = [b₁  b₂]  (2 neurons)

Calculation:

z = X × W + b

z₁₁ = x₁₁×w₁₁ + x₁₂×w₂₁ + b₁
z₁₂ = x₁₁×w₁₂ + x₁₂×w₂₂ + b₂
z₂₁ = x₂₁×w₁₁ + x₂₂×w₂₁ + b₁
z₂₂ = x₂₁×w₁₂ + x₂₂×w₂₂ + b₂

6.7 Multi-Layer Neural Networks

Structure

Input Layer → Hidden Layer 1 → Hidden Layer 2 → Output Layer
    x₁            h₁₁               h₂₁             y₁
    x₂            h₁₂               h₂₂             y₂
    x₃            h₁₃               h₂₃

Forward Pass

Layer 1:

\mathbf{h}_1 = f_1(\mathbf{X} \mathbf{W}_1 + \mathbf{b}_1)

Layer 2:

\mathbf{h}_2 = f_2(\mathbf{h}_1 \mathbf{W}_2 + \mathbf{b}_2)

Output Layer:

\mathbf{Y} = f_3(\mathbf{h}_2 \mathbf{W}_3 + \mathbf{b}_3)

Chained together:

\mathbf{Y} = f_3(f_2(f_1(\mathbf{X} \mathbf{W}_1 + \mathbf{b}_1) \mathbf{W}_2 + \mathbf{b}_2) \mathbf{W}_3 + \mathbf{b}_3)

Each layer transforms the input!


6.8 Exercise 1: Single Neuron Calculation

Problem

Given a single neuron with:

  • Inputs: x_1 = 2.0, x_2 = -1.0, x_3 = 0.5
  • Weights: w_1 = 0.5, w_2 = -0.3, w_3 = 0.8
  • Bias: b = 0.2
  • Activation function: ReLU f(z) = \max(0, z)

Calculate the output of this neuron.

Step-by-Step Solution

Step 1: Weighted Sum

Compute:

z = \sum_{i=1}^{3} x_i w_i + b

Substitute values:

z = (2.0 \times 0.5) + (-1.0 \times -0.3) + (0.5 \times 0.8) + 0.2

Calculate each term:

z = (1.0) + (0.3) + (0.4) + 0.2

Sum:

z = 1.0 + 0.3 + 0.4 + 0.2 = 1.9

Step 2: Apply Activation Function

Apply ReLU:

y = \text{ReLU}(z) = \max(0, z) = \max(0, 1.9) = 1.9

Answer

The output of the neuron is y = 1.9.

Verification

Check calculation:

  • Input contribution 1: 2.0 \times 0.5 = 1.0
  • Input contribution 2: -1.0 \times -0.3 = 0.3
  • Input contribution 3: 0.5 \times 0.8 = 0.4
  • Bias: 0.2
  • Total: 1.0 + 0.3 + 0.4 + 0.2 = 1.9
  • ReLU(1.9) = 1.9 ✓

6.9 Exercise 2: Multi-Layer Network

Problem

Given a neural network with 2 layers:

Layer 1:

  • Inputs: x_1 = 1.0, x_2 = 0.5
  • Weights: W_1 = \begin{bmatrix} 0.6 & 0.4 \\ 0.2 & 0.8 \end{bmatrix}
  • Bias: b_1 = [0.1, -0.1]
  • Activation: ReLU

Layer 2:

  • Inputs: Outputs from Layer 1
  • Weights: W_2 = \begin{bmatrix} 0.5 \\ 0.7 \end{bmatrix}
  • Bias: b_2 = 0.2
  • Activation: ReLU

Calculate the final output.

Step-by-Step Solution

Step 1: Layer 1 - Weighted Sum

Input vector:

\mathbf{x} = [1.0, 0.5]

Weight matrix:

\mathbf{W}_1 = \begin{bmatrix} 0.6 & 0.4 \\ 0.2 & 0.8 \end{bmatrix}

Bias vector:

\mathbf{b}_1 = [0.1, -0.1]

Calculate:

\mathbf{z}_1 = \mathbf{x} \mathbf{W}_1 + \mathbf{b}_1

Matrix multiplication:

\mathbf{z}_1 = [1.0, 0.5] \begin{bmatrix} 0.6 & 0.4 \\ 0.2 & 0.8 \end{bmatrix} + [0.1, -0.1]

Compute:

z_{1,1} = 1.0 \times 0.6 + 0.5 \times 0.2 + 0.1 = 0.6 + 0.1 + 0.1 = 0.8
z_{1,2} = 1.0 \times 0.4 + 0.5 \times 0.8 + (-0.1) = 0.4 + 0.4 - 0.1 = 0.7
\mathbf{z}_1 = [0.8, 0.7]

Step 2: Layer 1 - Apply Activation

Apply ReLU:

\mathbf{h}_1 = \text{ReLU}(\mathbf{z}_1) = [\max(0, 0.8), \max(0, 0.7)] = [0.8, 0.7]

Step 3: Layer 2 - Weighted Sum

Input (from Layer 1):

\mathbf{h}_1 = [0.8, 0.7]

Weight matrix:

\mathbf{W}_2 = \begin{bmatrix} 0.5 \\ 0.7 \end{bmatrix}

Bias:

b_2 = 0.2

Calculate:

z_2 = \mathbf{h}_1 \mathbf{W}_2 + b_2

Matrix multiplication:

z_2 = [0.8, 0.7] \begin{bmatrix} 0.5 \\ 0.7 \end{bmatrix} + 0.2

Compute:

z_2 = 0.8 \times 0.5 + 0.7 \times 0.7 + 0.2 = 0.4 + 0.49 + 0.2 = 1.09

Step 4: Layer 2 - Apply Activation

Apply ReLU:

y = \text{ReLU}(z_2) = \max(0, 1.09) = 1.09

Answer

The final output is y = 1.09.

Summary Table

Layer Input Weights Bias Weighted Sum Activation Output
1 [1.0, 0.5] $$\begin{bmatrix} 0.6 & 0.4 \\ 0.2 & 0.8 \end{bmatrix}$$ [0.1, -0.1] [0.8, 0.7] ReLU [0.8, 0.7]
2 [0.8, 0.7] $$\begin{bmatrix} 0.5 \\ 0.7 \end{bmatrix}$$ 0.2 1.09 ReLU 1.09
---

6.10 Exercise 3: Learning Weights

Problem

Given a neuron that should output 1.0 when inputs are [1.0, 1.0] and output 0.0 when inputs are [0.0, 0.0], find appropriate weights and bias.

Use:

  • Activation: Sigmoid f(z) = \frac{1}{1 + e^{-z}}
  • Desired behavior: AND gate (output 1 only when both inputs are 1)

Step-by-Step Solution

Step 1: Set Up Equations

For input [1.0, 1.0], desired output ≈ 1.0:

f(w_1 \times 1.0 + w_2 \times 1.0 + b) = 1.0

For input [0.0, 0.0], desired output ≈ 0.0:

f(w_1 \times 0.0 + w_2 \times 0.0 + b) = 0.0

Note: Sigmoid outputs range from 0 to 1, so:

  • f(z) \approx 1.0 when z \gg 0 (e.g., z > 5)
  • f(z) \approx 0.0 when z \ll 0 (e.g., z < -5)

Step 2: Solve for Bias

From equation 2:

f(b) = 0.0

For sigmoid to output ≈ 0:

b < -5

Let's use:

b = -10

Step 3: Solve for Weights

From equation 1:

f(w_1 + w_2 - 10) = 1.0

For sigmoid to output ≈ 1:

w_1 + w_2 - 10 > 5
w_1 + w_2 > 15

Let's use equal weights:

w_1 = w_2 = 8.0

Check:

w_1 + w_2 = 8.0 + 8.0 = 16.0 > 15 \quad ✓

Step 4: Verify Solution

Test Case 1: Input [1.0, 1.0]

z = 1.0 \times 8.0 + 1.0 \times 8.0 + (-10) = 8.0 + 8.0 - 10 = 6.0
y = \frac{1}{1 + e^{-6.0}} = \frac{1}{1 + 0.0025} \approx 0.9975 \approx 1.0 \quad ✓

Test Case 2: Input [0.0, 0.0]

z = 0.0 \times 8.0 + 0.0 \times 8.0 + (-10) = -10
y = \frac{1}{1 + e^{10}} = \frac{1}{1 + 22026} \approx 0.00005 \approx 0.0 \quad ✓

Test Case 3: Input [1.0, 0.0]

z = 1.0 \times 8.0 + 0.0 \times 8.0 + (-10) = 8.0 - 10 = -2.0
y = \frac{1}{1 + e^{2.0}} = \frac{1}{1 + 7.39} \approx 0.12 < 0.5 \quad ✓

Test Case 4: Input [0.0, 1.0]

z = 0.0 \times 8.0 + 1.0 \times 8.0 + (-10) = 8.0 - 10 = -2.0
y = \frac{1}{1 + e^{2.0}} \approx 0.12 < 0.5 \quad ✓

Answer

Appropriate weights and bias:

  • w_1 = 8.0
  • w_2 = 8.0
  • b = -10.0

The neuron implements an AND gate correctly!

Key Insight

This demonstrates learning:

  • Training finds weights that produce desired behavior
  • Different weights = Different logic functions
  • Learning algorithms (like backpropagation) automatically find these weights from data!

6.11 Key Takeaways

Neurons

Neurons are the basic processing units
Receive inputs, compute weighted sum, apply activation
Output is the result of activation function

Weights

Weights control connection strength
Determine what patterns neurons learn
Are what the model learns during training
Enable complex pattern recognition

Calculation

Weighted sum: $z = \sum x_i w_i + b$
Activation: $y = f(z)$
Matrix form enables efficient computation

Importance

Weights enable learning
Control information flow
Enable complex pattern recognition
Are adjusted during training to minimize error

Neural Networks

Multiple neurons form layers
Multiple layers form networks
Each layer transforms the input
Deep networks learn hierarchical features


Mathematical Summary

Single Neuron

z = \sum_{i=1}^{n} x_i w_i + b
y = f(z)

Multiple Neurons (Matrix Form)

\mathbf{z} = \mathbf{X} \mathbf{W} + \mathbf{b}
\mathbf{Y} = f(\mathbf{z})

Multi-Layer Network

\mathbf{h}_1 = f_1(\mathbf{X} \mathbf{W}_1 + \mathbf{b}_1)
\mathbf{h}_2 = f_2(\mathbf{h}_1 \mathbf{W}_2 + \mathbf{b}_2)
\mathbf{Y} = f_3(\mathbf{h}_2 \mathbf{W}_3 + \mathbf{b}_3)

This document provides a comprehensive explanation of neural networks, neurons, weights, and calculations with mathematical derivations and solved exercises.