Computing the determinant recursively

In my never-ending effort to translate all of that linear algebra I toiled over in college into functioning Java code (make fun of it on github), I came across one operation that gave me a perfect opportunity to use recursion: computing the determinant of a square matrix.

First, we’re going to talk a little bit about the determinant, so we know what computation we’re dealing with. After that, we’ll implement it recursively in Java in theMatrix class––this version is readable, but computationally expensive for large matrices. If you want to cut to the chase and see the code, feel free…


Spoiler alert: It’s just a bunch of dot products

The way I’m currently twiddling my thumbs and waiting for the world to end is building my very own linear algebra package in Java (make fun of it on github). It’s a good excuse to revisit Euclidean vector spaces through a programming lens, to figure out if I really understand it as well as I like to pretend I do.

I always had to double and triple check my computations when doing multiplying matrices on timed linear algebra exams (because I was prone to making a mistake while trying to take a shortcut), so I had built it up as…


A look at Java through the lens of a simple Vector class

It’s really easy for me to take linear algebra for granted when doing data science. Whether I’m overlooking the specifics of eigendecomposition in Principal Component Analysis, completely forgetting that every weighted sum is actually a dot product between a vector of inputs and a vector of coefficients, or sticking my fingers in my ears to avoid hearing the “tensor” in TensorFlow, I feel like I’ve been shortchanging my degree in math by not getting into the weeds and looking at the implementations.

Although there’s nothing in the definition of a vector space that requires vectors to be represented as lists…


It might be a “toy,” but building it is a serious project.

Image for post
Image for post
Photo by Wexor Tmg on Unsplash

It’s easy to understand why toy datasets are so popular with shiny new data scientists. Whether you’re implementing KNN from scratch on the classic Iris dataset, getting your feet wet on Kaggle with the famous Titanic survivor dataset, or using the MNIST dataset to dive into PyTorch, having a single spreadsheet with clean, well-studied collection of observations can allow you to focus your efforts on understanding the mechanics of the algorithm, model, or library, instead of on menial tasks like merging and cleaning your data or dealing with a substantial amount of missing values.

That said, there’s one big problem…


Seven conceptual hurdles you might face when learning a new programming language

Confession: my personal experience is almost the complete opposite of the title of this article. I actually started with C++ in college, moved to Java to teach AP Computer Science A, and then entered Python territory to work with all of the snazzy data science libraries available. Now that I’m working on a Java project again (a neural network package that is really pushing my understanding of the underlying math to its limits), I’m really paying attention to the little differences in how the languages work. …


A simple tool for AP Statistics teachers

AP Statistics seems to love dotplots. They’re easy to make by hand, they quickly give you an idea of what your distribution looks like, and they don’t require any real planning or number-crunching before diving in––compare this to histograms, which require you to know how high the bar is going to be before you draw it, and how many bins you want to use. There’s a simple one-to-one correspondence with observations and dots on the plot, so they’re easy to understand and easy to produce by hand in a testing environment.

When I was teaching the early units in AP…


Checking for Unique Decodability in Variable-Length Codes

hands behind a wall of code
hands behind a wall of code
Image by S. Hermann & F. Richter from Pixabay

Two fields that often get left on the sidelines in conversations about data science are Information Theory, which studies the quantification, storage, and communication of information, Coding Theory, which studies the properties of codes and their respective fitness for specific applications. A wonderful introduction to the two fields is Information and Coding Theory by Gareth A. Jones and J. Mary Jones. …


“Can you help me for a second?”

How to take this advice.

Every new programmer has gone through it. You get an error and all progress screeches to a halt. Maybe you go out on a limb and ask another programmer for help; maybe you post a question on a message board. At some point, you’ll get the slap in the face: “Just f***ing Google it.”

frustrated guy looking at his laptop
frustrated guy looking at his laptop
Photo by Sebastian Herrmann on Unsplash

Before diving into strategies about effectively utilizing Google to find a solution for your problem, let’s take a moment to unpack this advice — because it is advice, and not just a dismissal.

If you don’t know what the “it” represents in, “Just f***ing Google it,”…


Level up your statistics and your programming at the same time.

minimalist roller coaster image
minimalist roller coaster image
Photo by Augustine Wong on Unsplash

In my last year of teaching, I was teaching three AP STEM courses at the same time — both AP Statistics and AP Computer Science Principles were new to me that year, but I had taught AP Computer Science A the previous year. I took a risk and invested time into learning how to do statistics with Python, in an attempt to create an overlap between two of my preps. In the end, it definitely paid off for me: I found a new passion and am now pursuing a career in data science.

This blog is the first in a…


When a shiny new student of data science or statistics first wanders into the land of hypothesis testing, one of the first conceptual hurdles they’ll have to grapple with is the difference between the population distribution, the sample distribution, and the sampling distribution (of the statistic of interest). This post will explore these three concepts visually by looking at the distribution of heights in Rosner’s FEV dataset (obtained from http://biostat.mc.vanderbilt.edu/DataSets). As usual, the dataset and code can be found on my github.

Our question of interest is: What is the mean height of children involved in the study?

Before we…

Dan Hales

Data Scientist, Math Enthusiast, Programmer and General Nerd. www.linkedin.com/in/danhalesprogramming

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store