Machine Learning

Linear Regression, Explained Through CrossFit

March 2026

You track your lifts. You log your WOD times. You notice patterns — sleep more and your clean feels better, eat less and your Fran time suffers. That intuition is, roughly, what linear regression is doing. It just does it with math instead of gut feeling.

This is a ground-up explanation of linear regression for someone who knows what a PR is but has never written a line of machine learning code.

The question linear regression answers

Say you want to predict your 1-rep max back squat from your training data. You have a logbook: weeks of squatting, each entry recording how many reps you did and at what weight.

The question is: given those numbers, can you build a formula that predicts your 1RM?

That is exactly what linear regression does. It takes some inputs (things you measure) and learns a formula to predict an output (the number you care about).

Features and the target

In ML terms:

Features (also called inputs or X) — the things you feed into the model. In our example: reps completed, weight on the bar, weeks of training, hours of sleep the night before.
Target (also called the label or y) — the thing you're trying to predict. Here: your 1RM.

Think of features as the whiteboard your coach writes on before a strength cycle. The target is the number you hope to hit on test day.

You collect many rows of data — one per training session. Each row pairs a set of features with a known outcome. That collection is your training set.

Here's what a few rows might look like:

reps	weight_lifted (kg)	weeks_trained	hours_of_sleep	actual 1RM (kg)
5	110	4	8	140
3	120	6	6	148
5	115	8	7	155
1	130	10	5	152
5	120	12	9	165

Each row is one training session. The first four columns are the features — what went into that session. The last column is the target — what 1RM you actually hit when you tested it. The model learns from all of these together.

The formula

Linear regression assumes the target is a weighted sum of the features, plus a baseline:

predicted 1RM = (w₁ × reps) + (w₂ × weight_lifted) + (w₃ × weeks_trained) + (w₄ × hours_of_sleep) + b

Here's what each term actually means:

Term	What it is	Example value
`reps`	How many reps you completed at a given weight that session	5
`weight_lifted`	The weight on the bar in kg	110 kg
`weeks_trained`	How many weeks into your current strength cycle	8
`hours_of_sleep`	Hours slept the night before	7
`b`	Your baseline — the starting prediction before any features are considered	45 kg

Say the model has learned these weights after training: w₁ = 0.8, w₂ = 1.2, w₃ = 3.1, w₄ = 2.0, b = 45.

Plug in a session where you hit 5 reps at 110 kg, 8 weeks in, after 7 hours of sleep:

predicted 1RM = (0.8 × 5) + (1.2 × 110) + (3.1 × 8) + (2.0 × 7) + 45
             = 4 + 132 + 24.8 + 14 + 45
             = 219.8 kg

The bias (b = 45) represents what the model predicts for a complete beginner with no sessions, no sleep data — essentially a floor. The weights on top of that reflect how much each feature nudges the prediction up or down.

The model doesn't know the weights ahead of time. It learns them from your logbook.

A large positive weight on weeks_trained means the model has figured out that more training time strongly predicts a higher 1RM. A small weight on hours_of_sleep means sleep matters a little but isn't the main driver.

The error: how wrong is the prediction?

Before the model can learn anything, it needs a way to measure how bad its current guess is.

Suppose the model predicts your 1RM is 140 kg, but you actually hit 152 kg. The error on that data point is 12 kg.

The standard way to measure total error across all your training sessions is Mean Squared Error (MSE):

MSE = average of (predicted − actual)²

Squaring does two things: it makes all errors positive (a miss by −12 is as bad as a miss by +12), and it punishes large misses more than small ones. A model that's 20 kg off on one lift and spot-on for ten others will score worse than one that's consistently 5 kg off everywhere.

MSE is called the loss function — the single number that summarizes how wrong the model is right now. Lower is better. The goal of training is to drive it down.

Gradient descent: learning from each mistake

Now the model has a loss function. How does it improve?

By adjusting the weights. But it can't try every possible combination — there are infinitely many. Instead it uses a method called gradient descent.

Here's the CrossFit analogy: imagine you're blindfolded on a hilly field and you need to walk to the lowest valley. You can't see, but you can feel which direction the ground slopes downward beneath your feet. So you take a step in that direction, feel again, take another step, and repeat.

Gradient descent does the same thing mathematically. At each step it asks: if I nudge each weight slightly, which direction reduces the loss? Then it moves the weights a small amount in that direction.

The size of each step is controlled by the learning rate — a small number (say 0.01). Too large and you overshoot the valley; too small and training takes forever. In CrossFit terms: the learning rate is how aggressively you adjust your technique after each failed rep. Wild overcorrections break the pattern; no adjustment at all means you never improve.

After many passes through the training data, the weights settle at values that minimize the loss. The model has been trained.

Want to see exactly what this looks like? Gradient Descent, Three Reps at a Time walks through three explicit iterations on this same dataset — weights, errors, and updates shown step by step.

What the trained model looks like

After training, you have a set of numbers — the weights and bias. On a new session you've never seen:

predicted 1RM = (0.8 × reps) + (1.2 × weight_lifted) + (3.1 × weeks_trained) + (2.0 × hours_of_sleep) + 45

Plug in the numbers from any training session and you get a predicted 1RM in seconds. The model has distilled your entire logbook into a formula.

Underfitting and overfitting

Two failure modes worth knowing.

Underfitting is when the model is too simple to capture the real pattern. Imagine trying to predict WOD performance using only body weight, ignoring all training data. The prediction will be off for almost everyone. The model hasn't captured enough signal.

Overfitting is the opposite. The model memorizes your specific logbook so precisely that it fails on anything new. It "learned" that your 1RM was high the week you happened to drink two coffees — not because caffeine matters, but because that pattern appeared once in your data.

A good model generalizes. It finds the signal (training improves performance) and ignores the noise (what you had for breakfast on a random Tuesday).

You test for this by holding back some data — say, your last month of sessions — and checking whether the model's predictions hold up on data it never saw during training. This held-back set is called the test set.

Why the "linear" part matters

Linear regression can only model straight-line relationships. If your performance improves quickly at first and then plateaus (which it does — welcome to intermediate lifting), a straight line won't fit that curve well.

That's not a bug in linear regression; it's the honest limit of the model. More complex models (polynomial regression, neural networks) can capture curves — but they're also more expensive and more prone to overfitting.

For a first pass at a problem, linear regression is often the right starting point: fast, interpretable, and clear about what it can and can't do.

The one-sentence summary

Linear regression finds the weights that make a weighted sum of your inputs as close as possible to your outputs, measured by squared error, adjusted step by step using gradient descent.

Everything else in supervised machine learning is, in some sense, an elaboration of that idea.

Next up in this series: logistic regression — what happens when the output isn't a number, but a category (like "did I make the lift or not").