In 1957, Frank Rosenblatt built a machine called the Mark I Perceptron. It had 400 photocells as inputs, a bunch of potentiometers as weights, and it could — for the first time ever — learn to classify images by adjusting its own weights.
The New York Times ran a headline: “New Navy Device Learns By Doing.”
Rosenblatt claimed it would one day be conscious. It wasn’t. But the idea it introduced — a unit that takes inputs, computes a weighted sum, and fires or doesn’t based on a threshold — is still the atom every neural network is built from.
Let’s build one.
The Biological Inspiration #
A real neuron looks like this:
- Dendrites receive signals from other neurons
- The soma (cell body) adds them all up
- If the total exceeds a threshold, the neuron fires — sending a signal down the axon to the next neuron
- If not, silence
dendrites → [cell body: sum up] → threshold check → axon → next neuron
The artificial version doesn’t try to be biologically accurate. It just steals the core idea: weighted inputs, sum, threshold.
The Artificial Perceptron #
A perceptron takes $n$ inputs $x_1, x_2, \ldots, x_n$, multiplies each by a weight $w_i$, adds a bias $b$, and passes the result through a step function.
$$z = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n + b = \mathbf{w} \cdot \mathbf{x} + b$$
$$\hat{y} = \text{step}(z) = \begin{cases} 1 & \text{if } z \geq 0 \ 0 & \text{if } z < 0 \end{cases}$$
That’s it. The entire perceptron in two equations.
What each piece does #
Weights $\mathbf{w}$ — how much each input matters. A large positive weight means “this input strongly pushes toward 1.” A large negative weight means “this input strongly pushes toward 0.”
Bias $b$ — shifts the threshold. Without it, the decision boundary always passes through the origin. With it, we can shift it anywhere.
The step function — the simplest possible activation. Above zero → fire (output 1). Below zero → don’t fire (output 0).
In Python:
import numpy as np
def step(z):
return 1 if z >= 0 else 0
def perceptron(x, w, b):
z = np.dot(w, x) + b
return step(z)
The Decision Boundary #
Here’s something beautiful. For a 2-input perceptron, the condition $\mathbf{w} \cdot \mathbf{x} + b = 0$ defines a line in 2D space — the boundary between “fire” and “don’t fire.”
$$w_1 x_1 + w_2 x_2 + b = 0 \implies x_2 = \frac{-w_1 x_1 - b}{w_2}$$
Everything above this line → output 1. Everything below → output 0.
This means a perceptron can only learn to separate things that are linearly separable — things you can split with a straight line. We’ll come back to this. It matters a lot.
The Perceptron Learning Rule #
This is where the magic happens. The perceptron can adjust its own weights when it makes a mistake.
For each training example $({\mathbf{x}}, y)$:
- Make a prediction: $\hat{y} = \text{step}(\mathbf{w} \cdot \mathbf{x} + b)$
- Compute the error: $e = y - \hat{y}$
- Update every weight: $w_i \leftarrow w_i + \eta \cdot e \cdot x_i$
- Update the bias: $b \leftarrow b + \eta \cdot e$
Where $\eta$ (eta) is the learning rate — how big a step to take each update. Usually a small number like $0.1$.
Notice what happens:
- If the prediction is correct ($e = 0$) → nothing changes
- If prediction is 0 but should be 1 ($e = +1$) → weights increase in the direction of $\mathbf{x}$
- If prediction is 1 but should be 0 ($e = -1$) → weights decrease
def train_perceptron(X, y, lr=0.1, epochs=100):
w = np.zeros(X.shape[1])
b = 0.0
for epoch in range(epochs):
for xi, yi in zip(X, y):
y_hat = perceptron(xi, w, b)
error = yi - y_hat
w += lr * error * xi
b += lr * error
return w, b
The Perceptron Convergence Theorem guarantees this will find a perfect solution — if one exists. That “if” is doing a lot of work. More on that in a moment.
Watch It Learn #
Here’s a perceptron learning to classify 2D points in real time. Pick a preset, hit Train, and watch the decision boundary rotate into place.
Perceptron — Live Training
Pick a gate, click train step-by-step or let it run. Place your own points by clicking the canvas.
Custom mode: switch to “Custom”, click to place red dots (label 0), shift+click for blue dots (label 1). Then train.
AND and OR — It Works #
Try AND and OR above. The perceptron finds a clean separating line every time.
Why? Because AND and OR are linearly separable:
AND truth table: OR truth table:
(0,0) → 0 (0,0) → 0
(1,0) → 0 (1,0) → 1
(0,1) → 0 (0,1) → 1
(1,1) → 1 (1,1) → 1
You can literally draw a straight line between the 0s and 1s.
XOR — It Breaks #
Now try XOR. Hit train. Watch it spin forever.
XOR truth table:
(0,0) → 0
(1,0) → 1 ← 1 is diagonal from the other 1
(0,1) → 1
(1,1) → 0 ← 0 is diagonal from the other 0
There is no straight line that separates the 0s from the 1s. The 1s are at opposite corners of a square. This is called not linearly separable, and a single perceptron is mathematically incapable of solving it.
In 1969, Minsky and Papert published a book proving this. It killed funding for neural network research for over a decade — the “AI winter.”
The fix? Stack multiple perceptrons into layers. One layer draws one line. Two layers can draw two lines. Enough layers can carve up any shape.
That’s what the rest of this series is building toward.
One Perceptron = One Neuron #
What we built today is one artificial neuron. It:
- Takes inputs (numbers)
- Multiplies each by a learned weight
- Adds a bias
- Fires or doesn’t based on a threshold
A neural network is just thousands of these, wired together, each learning its own weights.
The perceptron learning rule we used here — nudge the weights when wrong — is the ancestor of backpropagation. It’s cruder, but the spirit is identical.
# Full perceptron from scratch
import numpy as np
class Perceptron:
def __init__(self, lr=0.1, epochs=100):
self.lr = lr
self.epochs = epochs
def fit(self, X, y):
self.w = np.zeros(X.shape[1])
self.b = 0.0
for _ in range(self.epochs):
for xi, yi in zip(X, y):
error = yi - self.predict_single(xi)
self.w += self.lr * error * xi
self.b += self.lr * error
def predict_single(self, x):
return 1 if np.dot(self.w, x) + self.b >= 0 else 0
def predict(self, X):
return np.array([self.predict_single(x) for x in X])
# Train on AND
X = np.array([[0,0],[1,0],[0,1],[1,1]])
y = np.array([0, 0, 0, 1])
p = Perceptron()
p.fit(X, y)
print(p.predict(X)) # [0, 0, 0, 1] ✓
Before You Go — Try These #
-
In the demo, train on AND. Note the weights and bias. Verify by hand: does $w_1(1) + w_2(1) + b \geq 0$? Does $w_1(0) + w_2(0) + b < 0$?
-
Why does the bias matter? Try mentally removing it ($b = 0$) — what constraint does that put on the decision boundary?
-
What happens if the learning rate $\eta$ is very large? Very small? Try it in the code.
-
Can you find any assignment of weights and bias that solves XOR? Try to prove to yourself it’s impossible.
Next up → Lesson 03: The Switch That Isn’t Really a Switch — we replace the hard step function with smooth activations, and suddenly gradients become possible.