# All Posts

## Introduction to Algorithms and Machine Learning: Preamble

This book was written to support Eurisko, an advanced math and computer science elective course sequence within the Math Academy program at Pasadena High School. During its operation from 2020 to 2023, Eurisko was one of the most advanced high school math/CS sequences in the country. Read more...

## Reimplementing Blondie24: Convolutional Version

Fogel and Chellapilla followed up their 1999 Blondie24 paper with another paper, Evolving an Expert Checkers Playing Program without Using Human Expertise, published in 2001. Read more...

## Reimplementing Blondie24

Fogel and Chellapilla’s Blondie24 was published over the course of two papers. Here we shall address the first paper, Evolving Neural Networks to Play Checkers without Relying on Expert Knowledge, published in 1999. Read more...

## Reimplementing Fogel’s Tic-Tac-Toe Paper

The goal of this chapter is to reimplement the paper Using Evolutionary Programming to Create Neural Networks that are Capable of Playing Tic-Tac-Toe by David Fogel in 1993. This paper preceded Blondie24, and many of the principles introduced in this paper were extended in Blondie24. As such, reimplementing the paper provides good scaffolding as we work our way up to reimplement Blondie24. Read more...

## Introduction to Blondie24 and Neuroevolution

Previously, we built strategic connect four players by constructing a pruned game tree, using heuristics to rate terminal states, and then applying the minimax algorithm. This was a combination of game-specific human intelligence (heuristics) and generalizable artificial intelligence (minimax on a game tree). Read more...

## Reduced Search Depth and Heuristic Evaluation for Connect Four

In theory, we could solve any game by building a big game tree, labeling the terminal states as wins, losses, or ties, and then working backwards from that information to identify the minimax strategy. But in practice, game trees get so big so quickly that for all but the most simple games, game trees are too expensive to store in memory and take too long to traverse. Read more...

## Minimax Strategy

The minimax strategy is a powerful game-playing strategy that operates on game trees. It works by maximizing the worst-case scenario that could potentially arise from every move. Read more...

## Canonical and Reduced Game Trees for Tic-Tac-Toe

A game tree is a data structure that represents all the possible outcomes of a game. It is a graph where the nodes correspond to the states of the game, and the edges correspond to actions that cause the game to transition from one state to another. Game trees are commonly used when coding up strategies for autonomous game-playing agents. Read more...

## Backpropagation

The most common method used to fit neural networks to data is gradient descent, just like we did have done previously for simpler models. The computations are significantly more involved for neural networks, but an algorithm called backpropagation provides a convenient framework for computing gradients. Read more...

## Introduction to Neural Network Regressors

It’s common to represent models via computational graphs. For example, consider the following multiple logistic regression model: Read more...

## Decision Trees

A decision tree is a graphical flowchart that represents a sequence of nested “if-then” decision rules. To illustrate, first recall the following cookie data set that was introduced during the discussion of k-nearest neighbors: Read more...

## Dijkstra’s Algorithm for Distance and Shortest Paths in Weighted Graphs

In a weighted graph, every edge is assigned a value called a weight. Read more...

## Distance and Shortest Paths in Unweighted Graphs

The distance between two nodes in a graph is the fewest number of edges that must be crossed to travel from one node to the other. A path between two nodes is a sequence of nodes that can be traversed to get from one node to the other, traveling along edges. The shortest path is the path with the shortest distance. Read more...

## Breadth-First and Depth-First Traversals

Graphs show up all the time in computer science, so it’s important to know how to work with them. For example, consider the following graph: Read more...

## Naive Bayes

Naive Bayes is a classification algorithm that is grounded in Bayesian probability. It involves choosing the class with the highest conditional probability given the corresponding data point, assuming that all the variables in the data set are independent from each other. Read more...

## K-Nearest Neighbors

Until now, we have been focused on regression problems, in which we predict an output quantity (that is often continuous). However, in the real world, it’s even more common to encounter classification problems, in which we predict an output class (i.e. category). For example, predicting how much money a person will spend at a store is a regression problem, whereas predicting which items the person will buy is a classification problem. Read more...

## Multiple Regression and Interaction Terms

In many real-life situations, there is more than one factor that controls the quantity we’re trying to predict. That is to say, there is more than one input variable that controls the output variable. Read more...

## Regression via Gradient Descent

Gradient descent can be applied to fit regression models. In particular, it can help us avoid the pitfalls that we’ve experienced when we attempt to fit nonlinear models using the pseudoinverse. Read more...

## Overfitting, Underfitting, Cross-Validation, and the Bias-Variance Tradeoff

We have previously described a model as “accurate” when it appears to match closely with points in the data set. However, there are issues with this definition that we will need to remedy. In this chapter, we will expose these issues and develop a more nuanced understanding of model accuracy by way of a concrete example. Read more...

## Power, Exponential, and Logistic Regression via Pseudoinverse

Previously, we learned that we can use the pseudoinverse to fit any regression model that can be expressed as a linear combination of functions. Unfortunately, there are a handful of useful models that cannot be expressed as a linear combination of functions. Here, we will explore $3$ of these models in particular. Read more...

## Regressing a Linear Combination of Nonlinear Functions via Pseudoinverse

Previously, we learned how to use the pseudoinverse to fit linear and polynomial models to data sets consisting of one or more input variables. However, the pseudoinverse is even more general than that. Read more...

## Linear, Polynomial, and Multiple Linear Regression via Pseudoinverse

Supervised learning consists of fitting functions to data sets. The idea is that we have a sample of inputs and outputs from some process, and we want to come up with a mathematical model that predicts what the outputs would be for other inputs. Supervised machine learning involves programming computers to compute such models. Read more...

## Simplex Method

The simplex method is a technique for maximizing linear expressions subject to linear constraints. Many applied problems can be framed like this – for example, a business may have a variety of raw materials and need to determine the most profitable inventory of products that they can produce from those materials. This would ultimately amount to maximizing a linear expression for profit, subject to a linear inequality for each type of raw material (the total amount of raw material used in the products produced must be less than or equal to the amount of raw material available). Read more...

## Hash Tables

Under the hood, dictionaries are hash tables. Read more...

## Hodgkin-Huxley Model of Action Potentials in Neurons

In 1952, Alan Lloyd Hodgkin and Andrew Fielding Huxley published a differential equation model of “spikes” (i.e. “action potentials”) in the voltage of neurons. For this work, they received the 1963 Nobel Prize in Physiology or Medicine (shared with Sir John Carew Eccles). Read more...

## SIR Model For the Spread of Disease

One of the simplest ways to model the spread of disease using differential equations is the SIR model. Let’s construct a SIR model and use our multivariable Euler estimator to plot the solution curves. Read more...

## Euler Estimation

Arrays can be used to implement more than just matrices. We can also implement other mathematical procedures like Euler estimation. We will jump straight to exercises – it is assumed that you’re already familiar with Euler estimation from calculus. Read more...

## Tic-Tac-Toe and Connect Four

One of the best ways to get practice with object-oriented programming is implementing games. Tic-tac-toe and connect four are great for learning the ropes. Later on, we will also use these implementations for developing AI players. Read more...

## K-Means Clustering

Clustering is the act of finding groups of similar records within data. Some examples of clusters in real-life data include users who buy similar items, songs in a similar genre, and patients with similar health conditions. Read more...

## Reduced Row Echelon Form and Applications to Matrix Arithmetic

Recall from linear algebra the following procedure for converting a matrix to reduced row echelon form (RREF): Read more...

## Basic Matrix Arithmetic

Let’s use arrays to implement matrices and their associated mathematical operations. We will jump straight to exercises – it is assumed that you are already familiar with matrix arithmetic and reduced row echelon form. Read more...

## The Ultimate High School Computer Science Sequence: 9 Months In

(Cross-post from eurisko.us) Read more...

## Merge Sort and Quicksort

Merge sort is a sorting algorithm that works by breaking an array in half, recursively calling merge sort on each half separately, and then merging the two sorted halves. The base case is that empty arrays and arrays of 1 item are already sorted (so if you call merge sort on such an array, it leaves the array as-is). Read more...

## Selection, Bubble, Insertion, and Counting Sort

There are many different methods for sorting items in arrays. Let’s go over some of the simplest methods. Read more...

## Multivariable Gradient Descent

Multivariable gradient descent is similar to single-variable gradient descent, except that we replace the derivative $f’(x)$ with the gradient $\nabla f(\vec x)$ as follows: Read more...

## Single-Variable Gradient Descent

Gradient descent is a technique for minimizing differentiable functions. The idea is that we take an initial guess as to what the minimum is, and then repeatedly use the gradient to nudge that guess further and further “downhill” into an actual minimum. Read more...

## Estimating Roots via Bisection Search and Newton-Raphson Method

Two of the simplest methods for estimating roots of functions are bisection search and the Newton-Raphson method. We will cover both of these here. Read more...

## Solving Magic Squares via Backtracking

Brute force search can often be slow or infeasible when there are lots of possibilities that must be checked. Read more...

## Brute Force Search with Linear-Encoding Cryptography

An encoding function maps a string to a sequence of numbers. For example, you have already implemented the trivial encoding function, defined as follows: Read more...

## Cartesian Product

Implementing the Cartesian product provides good practice working with arrays. Recall that the Cartesian product constructs all the points whose elements occupy the given ranges. To illustrate: Read more...

## Roulette Wheel Selection

As we saw previously, it’s easy to simulate a random coin flip by generating a random decimal $r$ between $0$ and $1$ and applying the following function: Read more...

## Simulating Coin Flips

It’s often possible to estimate probabilities of events by simulating a large number of random experiments and measuring the proportion of trials in which the desired outcome occurred. This technique is known as Monte Carlo simulation, named after a famous casino town. Read more...

## Recursive Sequences

A recursive sequence is a sequence where each term is a function of the previous terms. Read more...

## Converting Between Binary, Decimal, and Hexadecimal

We are used to representing numbers using ten characters: $0,$ $1,$ $2,$ $3,$ $4,$ $5,$ $6,$ $7,$ $8,$ $9.$ This is called the decimal number system. Read more...

## Some Short Introductory Coding Exercises

Let’s get started on some introductory coding exercises. It’s assumed that you’ve had some basic exposure to programming and that you know about variables, functions, if statements, for loops, while statements, arrays, strings, and dictionaries. Ideally, you’ve learned some object-oriented programming (classes, attributes, methods) as well. We will speak in terms of Python, but the general ideas are applicable to any programming language. Read more...

## But WHERE do the Taylor Series and Lagrange Error Bound even come from?!

See below for the video version of this post: Read more...

## Trick to Apply the Chain Rule FAST - Peeling the Onion

See below for the video version of this post: Read more...

## Intuition Behind Completing the Square

See below for the video version of this post: Read more...

## Matrix Exponential and Systems of Linear Differential Equations

In this chapter, we will learn how to solve systems of linear differential equations. These systems take the form shown below. Read more...

## Generalized Eigenvectors and Jordan Form

As we saw previously, not every matrix is diagonalizable, even when allowing complex eigenvalues/eigenvectors. The matrix below was given as an example of a non-diagonalizable matrix. Read more...

## Recursive Sequence Formulas via Diagonalization

In this chapter, we introduce an interesting application of matrix diagonalization: constructing closed-form expressions for recursive sequences. Read more...

## Eigenvalues, Eigenvectors, and Diagonalization

Suppose we want to compute a matrix raised to a large power, i.e. multiplied by itself many times. Read more...

## Inverse Matrices

In this chapter, we introduce the idea of the inverse of a matrix, which undoes the transformation of that matrix. Read more...

## Rescaling, Shearing, and the Determinant

The key insight in this chapter is that every square matrix can be decomposed into a product of rescalings and shears. Before we elaborate on that, though, let’s discuss what rescalings and shears are, in terms of matrices. Read more...

## Linear Systems as Transformations of Vectors by Matrices

Let’s create a compact notation for expressing systems of linear equations like the one shown below. Read more...

## Higher-Order Variation of Parameters

Until this point, we have been working exclusively with linear systems. However, solving linear systems can sometimes be a necessary component of solving nonlinear systems. Read more...

## Shearing, Cramer’s Rule, and Volume by Reduction

Not only can a nonzero determinant tell us that a linear system has exactly one solution – the nonzero determinant can also help us quickly find that solution through a process known as Cramer’s rule. Read more...

## Volume as the Determinant of a Square Linear System

We have seen that the space of linear equations is actually a vector space, and that the linear equations in any particular system span a subspace of this vector space. Read more...

## N-Dimensional Volume Formula

N-dimensional volume generalizes the idea of the space occupied by an object:

• • 1-dimensional volume refers to the space occupied by a 1-dimensional object, such as the length of a line segment.
• • 2-dimensional volume refers to the space occupied by a 2-dimensional object, such as the area of a square.
• • 3-dimensional volume is what we normally mean by the word “volume” -- the amount of space occupied by a 3-dimensional object, such as the volume of a cube.

## Elimination as Vector Reduction

Recall that systems of linear equations can be solved through elimination, multiplying equations by constants and adding equations to each other to cancel variables. Read more...

## Span, Subspaces, and Reduction

The span of a set of vectors consists of all vectors that can be made by adding multiples of vectors in the set. Read more...

## Lines and Planes

A line starts at an initial point and proceeds straight in a constant direction. Thus, we can write the equation of a line as Read more...

## Dot Product and Cross Product

We know how to multiply a vector by a scalar, but what does it mean to multiply a vector by another vector? Read more...

## N-Dimensional Space

N-dimensional space consists of points that have N components. For example, $(0,0)$ is the origin in 2-dimensional space, $(0,0,0)$ is the origin in 3-dimensional space, and $(0,0,0,0)$ is the origin in 4-dimensional space. Read more...

## Solving Differential Equations with Taylor Series

Many differential equations don’t have solutions that can be expressed in terms of finite combinations of familiar functions. However, we can often solve for the Taylor series of the solution. Read more...

## Manipulating Taylor Series

To find the Taylor series of complicated functions, it’s often easiest to manipulate the Taylor series of simpler functions, such as those given below. Read more...

## Taylor Series

The sum formula for a geometric series is an example of representing a non-polynomial function as an infinite polynomial within a particular range of inputs. Read more...

## Tests for Convergence

Previously, we saw that sum formulas are only valid for series that converge. But how can we tell whether a series converges or diverges, in the first place? Read more...

## Geometric Series

A geometric series is a sum of the form $r+r^2+r^3+\cdots$ for some number $r$. Read more...

## Variation of Parameters

When we know the zero solutions $y_0$ of a differential equation $y’^\prime+a_1(x)y’+a_0(x)y=f(x)$, we can use a method called variation of parameters to find the particular solution. This method is especially useful in cases where we are unable to guess the particular solution through undetermined coefficients. Read more...

## Integrating Factors

We know how to solve differential equations of the form Read more...

## Undetermined Coefficients

In the previous post, we learned how to solve differential equations of the form Read more...

## Characteristic Polynomial of a Differential Equation

In this chapter, we learn a technique for solving differential equations of the form Read more...

## Solving Differential Equations by Substitution

Sometimes, non-separable differential equations can be converted into separable differential equations by way of substitution. Read more...

## Slope Fields and Euler Approximation

When faced with a differential equation that we don’t know how to solve, we can sometimes still approximate the solution by simpler methods. If we just want to get an idea of what the solutions of the differential equation look like on a graph, we can construct a slope field. Read more...

## Separation of Variables

In differential equations, we are given an equation in terms of the derivative(s) of some function, and we need to solve for the function that makes the equation true. Read more...

## Improper Integrals

Improper integrals have bounds or function values that extend to positive or negative infinity. Read more...

## Integration by Parts

Integration by parts is another technique for simplifying integrals. We can apply integration by parts whenever an integral would be made simpler by differentiating some expression within the integral, at the cost of anti-differentiating another expression within the integral. The formula for integration by parts is given below: Read more...

## Integration by Substitution

Complicated integrals can sometimes be made simpler through the method of substitution. Substitution involves condensing an expression of $x$ into a single variable, say $u$, and then expressing the integral in terms of $u$ instead of $x$. Read more...

## Finding Area Using Integrals

In the last post, we learned how to evaluate integrals of the form $\int f(x) \, dx$, which are also known as indefinite integrals. Read more...

## Antiderivatives

An antiderivative of a function $f(x)$ is a function $F(x)$ whose derivative is $f(x)$, i.e. $F’(x)=f(x)$. Read more...

## L’Hôpital’s Rule

L’Hôpital’s rule provides a way to evaluate limits that take the indeterminate forms of $\frac{0}{0}$ or $\frac{\infty}{\infty}$. It says that, for such limits, we can differentiate the numerator and denominator separately, without changing the actual value of the limit. Read more...

## Differentials and Approximation

The chain rule tells us that we can treat the derivative $\frac{df}{dx}$ like a fraction when multiplying by other derivatives. In this chapter, we continue the idea of interpreting the derivative as a fraction, extending it to an even more literal sense. Read more...

## Finding Extrema

Derivatives can be used to find a function’s local extreme values, its peaks and valleys. At its peaks and valleys, a function’s derivative is either $0$ (a smooth, rounded peak/valley) or undefined (a sharp, pointy peak/valley). Read more...

## Derivatives of Non-Polynomial Functions

In this chapter, we introduce rules for the derivatives of exponential, logarithmic, trigonometric, and inverse trigonometric functions. Although it’s possible to compute each derivative using the difference quotient, it will take a long time to compute derivatives during calculus problems if we have to start from scratch with the difference quotient process every time – so it’s advantageous to remember the derivative rules. The derivative rules are to calculus, what the multiplication table is to arithmetic. Read more...

## Properties of Derivatives

We know that when differentiating polynomials, we can differentiate each term individually. But why are we able to do this? Does multiplication work the same way? What about division? We answer these questions in this chapter. Read more...

## Chain Rule

The chain rule tells us how to take derivatives of compositions of functions. Informally, it says that we can “forget” about the inside of a function when we take the derivative, as long as we multiply by the derivative of the inside afterwards. Read more...

## Power Rule for Derivatives

It can be a pain to evaluate the difference quotient every time we want to take the derivative of a function. Luckily, there are some patterns in derivatives that allow us to compute derivatives without having to go through all the steps of computing the limit of the difference quotient. Read more...

## Derivatives and the Difference Quotient

The derivative of a function is the function’s slope at a particular point. We can approximate the derivative at a point $(x,f(x))$ by choosing another nearby point on the function, and computing the slope. If we increase the input $x$ by a small amount $\Delta x$, then we reach an x-coordinate of $x+\Delta x$, and the corresponding point on the function is $(x+\Delta x, f(x+\Delta x))$. Read more...

## Limits by Logarithms, Squeeze Theorem, and Euler’s Constant

A useful property of limits is that they can be brought inside continuous functions, i.e. the limit of a continuous function is the function of the limit. Read more...

## Evaluating Limits

The limit of a function $f(x)$, as $x$ approaches some value $a$, is the value we would expect for $f(a)$ if we saw only the portion of the graph around (but not including) $x=a$. If the resulting value is $L$, then we denote the limit as follows: Read more...

## Compositions of Functions

Compositions of functions consist of multiple functions linked together, where the output of one function becomes the input of another function. Read more...

## Inverse Functions

Inverting a function entails reversing the outputs and inputs of the function. Read more...

## Reflections of Functions

When a function is reflected, it flips across one of the axes to become its mirror image. Read more...

## Rescalings of Functions

When a function is rescaled, it is stretched or compressed along one of the axes, like a slinky. The function’s general shape is preserved, but it might look a bit thinner or fatter afterwards. Read more...

## Shifts of Functions

When a function is shifted, all of its points move vertically and/or horizontally by the same amount. The function’s size and shape are preserved – it is just slid in some direction, like sliding a book across a table. Read more...

## Piecewise Functions

A piecewise function is pieced together from multiple different functions. Read more...

## Trigonometric Functions

Trigonometric functions represent the relationship between sides and angles in right triangles. Read more...

## Absolute Value

An absolute value function represents the magnitude of a number, i.e. its distance from $0$. Read more...

## Exponential and Logarithmic Functions

Exponential functions have variables as exponents, e.g. $f(x)=2^x$. Read more...

A radical function is a function that involves roots: square roots, cube roots, or any kind of fractional exponent in general. We can often infer what their graphs look like by sandwiching them between polynomial functions. Read more...

## Graphing Rational Functions with Slant and Polynomial Asymptotes

A horizontal asymptote is a horizontal line that arises from a constant whole number term in the proper form of a rational function. Read more...

## Graphing Rational Functions with Horizontal and Vertical Asymptotes

The horizontal and vertical asymptotes of a rational function can give us insight into the shape of its graph. Read more...

## Vertical Asymptotes of Rational Functions

Unlike polynomials, rational functions can “blow up” to positive or negative infinity even for relatively small input values. Such input values are called vertical asymptotes, because they represent vertical lines that the function approaches but never quite reaches. Read more...

## Horizontal Asymptotes of Rational Functions

Like polynomials, rational functions can have end behavior that goes to positive or negative infinity. However, rational functions can also have another form of end behavior in which they become flat, approaching (but never quite reaching) a horizontal line known as a horizontal asymptote. Read more...

## Polynomial Long Division

A rational function is a fraction whose numerator and denominator are both polynomials. Rational functions are usually written in proper form, where the numerator is of a smaller degree than the denominator. (The degree of a polynomial is its highest exponent.) Read more...

## Sketching Graphs of Polynomials

In the previous posts, we learned how to find end behavior, zeros, and factored forms of polynomials. In this chapter, we will put all this information together to sketch graphs of polynomials. Read more...

## Rational Roots and Synthetic Division

In the previous post, we learned how to find the remaining zeros of a polynomial if we are given some zeros to start with. But how do we get those initial zeros in the first place, if they’re not given to us and aren’t obvious from the equation? Read more...

## Zeros of Polynomials

The zeros of a polynomial are the inputs that cause it to evaluate to zero. Read more...

## Standard Form and End Behavior of Polynomials

Polynomials include linear expressions and quadratic expressions, as well as expressions adding multiples of higher exponents of the variable. Rational functions are usually written in proper form, where the numerator is of a smaller degree than the denominator. (The degree of a polynomial is its highest exponent.) Read more...

## Systems of Inequalities

To solve a system of inequalities, we need to solve each individual inequality and find where all their solutions overlap. For example, to solve the system Read more...

Quadratic inequalities are best visualized in the plane. For example, to solve a quadratic inequality $-x^2+x+2>0$, we can find the values of $x$ where the parabola $y=-x^2+x+2$ is positive. Read more...

## Linear Inequalities in the Plane

When a linear equation has one variable, the solution covers a section of the number line: if our solution is $x > \text{some number}$, then the solution covers the section of the number line that lies right of that number; if our solution is $x<\text{some number}$, then the solution covers the section of the number line that lies left of that number. Read more...

## Linear Inequalities in the Number Line

An inequality is similar to an equation, but instead of saying two quantities are equal, it says that one quantity is greater than or less than another. Read more...

Systems of quadratic equations can be solved via substitution. After substituting, the resulting equation can itself be reduced down to a quadratic equation and solved by techniques covered in this chapter. Read more...

## Vertex Form

To easily graph a quadratic equation, we can convert it to vertex form: Read more...

## Completing the Square

Completing the square is another method for solving quadratic equations. Although we can use the quadratic formula to solve any quadratic equation, completing the square helps us gain a better intuition for quadratic equations and understand where the quadratic formula comes from. Read more...

Some quadratic equations cannot be factored easily. For example, in the equation $x^2+3x+1=0$, we need to find two factors of $1$ that add to $3$. But the only integer factors of $1$ are $1$ and $1$, and they definitely don’t add to $3$! Read more...

## Factoring Quadratic Equations

Factoring is a method for solving quadratic equations. It involves converting the quadratic equation to standard form, then factoring it into a product of two linear terms (called factors), and finally solving for the variable values that make either factor equal to $0$. Read more...

## Standard Form of a Quadratic Equation

Quadratic equations are similar to linear equations, except that they contain squares of a single variable. Read more...

## Linear Systems

A linear system consists of multiple linear equations, and the solution of a linear system consists of the pairs that satisfy all of the equations. Read more...

## Standard Form of a Line

The standard form of a linear equation is $ax+by=c$, where $a$, $b$, and $c$ are all integers and $a$ is nonnegative. Read more...

## Point-Slope Form

Suppose we want to write the equation of a line with a given slope $m=2$, through a particular point $(3,5)$. In the previous post, we substituted the given information into a slope-intercept equation form $y=mx+b$, solved for $b$, and rewrote the slope-intercept form with $m$ and $b$ substituted so that $x$ and $y$ were the only variables. Read more...

## Slope-Intercept Form

In the previous post, we were solving linear equations in one variable. Now, let’s consider linear equations in two variables. A few examples are shown below: Read more...

## Solving Linear Equations

Loosely speaking, a linear equation is an equality statement containing only addition, subtraction, multiplication, and division. It does not need to include all of these operations, but it cannot include operations beyond them, such as exponentiation. Read more...