In my never-ending effort to translate all of that linear algebra I toiled over in college into functioning Java code (make fun of it on github), I came across one operation that gave me a perfect opportunity to use recursion: computing the determinant of a square matrix.
First, we’re going to talk a little bit about the determinant, so we know what computation we’re dealing with. After that, we’ll implement it recursively in Java in theMatrix
class––this version is readable, but computationally expensive for large matrices. If you want to cut to the chase and see the code, feel free…
The way I’m currently twiddling my thumbs and waiting for the world to end is building my very own linear algebra package in Java (make fun of it on github). It’s a good excuse to revisit Euclidean vector spaces through a programming lens, to figure out if I really understand it as well as I like to pretend I do.
I always had to double and triple check my computations when doing multiplying matrices on timed linear algebra exams (because I was prone to making a mistake while trying to take a shortcut), so I had built it up as…
It’s really easy for me to take linear algebra for granted when doing data science. Whether I’m overlooking the specifics of eigendecomposition in Principal Component Analysis, completely forgetting that every weighted sum is actually a dot product between a vector of inputs and a vector of coefficients, or sticking my fingers in my ears to avoid hearing the “tensor” in TensorFlow, I feel like I’ve been shortchanging my degree in math by not getting into the weeds and looking at the implementations.
Although there’s nothing in the definition of a vector space that requires vectors to be represented as lists…
It’s easy to understand why toy datasets are so popular with shiny new data scientists. Whether you’re implementing KNN from scratch on the classic Iris dataset, getting your feet wet on Kaggle with the famous Titanic survivor dataset, or using the MNIST dataset to dive into PyTorch, having a single spreadsheet with clean, well-studied collection of observations can allow you to focus your efforts on understanding the mechanics of the algorithm, model, or library, instead of on menial tasks like merging and cleaning your data or dealing with a substantial amount of missing values.
Confession: my personal experience is almost the complete opposite of the title of this article. I actually started with C++ in college, moved to Java to teach AP Computer Science A, and then entered Python territory to work with all of the snazzy data science libraries available. Now that I’m working on a Java project again (a neural network package that is really pushing my understanding of the underlying math to its limits), I’m really paying attention to the little differences in how the languages work. …
AP Statistics seems to love dotplots. They’re easy to make by hand, they quickly give you an idea of what your distribution looks like, and they don’t require any real planning or number-crunching before diving in––compare this to histograms, which require you to know how high the bar is going to be before you draw it, and how many bins you want to use. There’s a simple one-to-one correspondence with observations and dots on the plot, so they’re easy to understand and easy to produce by hand in a testing environment.
When I was teaching the early units in AP…
Two fields that often get left on the sidelines in conversations about data science are Information Theory, which studies the quantification, storage, and communication of information, Coding Theory, which studies the properties of codes and their respective fitness for specific applications. A wonderful introduction to the two fields is Information and Coding Theory by Gareth A. Jones and J. Mary Jones. …
Every new programmer has gone through it. You get an error and all progress screeches to a halt. Maybe you go out on a limb and ask another programmer for help; maybe you post a question on a message board. At some point, you’ll get the slap in the face: “Just f***ing Google it.”
Before diving into strategies about effectively utilizing Google to find a solution for your problem, let’s take a moment to unpack this advice — because it is advice, and not just a dismissal.
If you don’t know what the “it” represents in, “Just f***ing Google it,”…
In my last year of teaching, I was teaching three AP STEM courses at the same time — both AP Statistics and AP Computer Science Principles were new to me that year, but I had taught AP Computer Science A the previous year. I took a risk and invested time into learning how to do statistics with Python, in an attempt to create an overlap between two of my preps. In the end, it definitely paid off for me: I found a new passion and am now pursuing a career in data science.
This blog is the first in a…
When a shiny new student of data science or statistics first wanders into the land of hypothesis testing, one of the first conceptual hurdles they’ll have to grapple with is the difference between the population distribution, the sample distribution, and the sampling distribution (of the statistic of interest). This post will explore these three concepts visually by looking at the distribution of heights in Rosner’s FEV dataset (obtained from http://biostat.mc.vanderbilt.edu/DataSets). As usual, the dataset and code can be found on my github.
Our question of interest is: What is the mean height of children involved in the study?
Before we…
Data Scientist, Math Enthusiast, Programmer and General Nerd. www.linkedin.com/in/danhalesprogramming