Casual Inference
https://www.casualinf.com/
Recent content on Casual InferenceHugo -- gohugo.ioyouremail@domain.com (John Lee)youremail@domain.com (John Lee)Tue, 23 Aug 2022 00:00:00 +0000woRdle Play
https://www.casualinf.com/post/2022-08-23-wordle-play/
Tue, 23 Aug 2022 00:00:00 +0000youremail@domain.com (John Lee)https://www.casualinf.com/post/2022-08-23-wordle-play/Intro After watching 3Blue1Brown’s video on solving Wordle using information theory, I’ve decided to try my own method using a similar method using probability. His take on using word frequency and combining this with expected information gain quantified by bits for finding the solution was interesting. This is a great approach, especially when playing against a person, who may chose to play a word that’s not in the predefined list of the official Wordle webiste.Linear Regression on Coffee Rating Data
https://www.casualinf.com/post/2021-01-07-linear-regression-on-coffee-rating-data/
Thu, 07 Jan 2021 00:00:00 +0000youremail@domain.com (John Lee)https://www.casualinf.com/post/2021-01-07-linear-regression-on-coffee-rating-data/While I am reading Elements of Statistical Learning, I figured it would be a good idea to try to use the machine learning methods introduced in the book. I just finished a chapter on linear regression, and learned more about linear regression and the penalized methods (Ridge and Lasso). Since there is an abundant resource available online, it would be redundant to get into the details. I’ll quickly go over Ordinary Least Squares, Ridge, and Lasso regression, and quickly show an application of those methods in R.UIUC Public GPA Dataset Exploration with Shiny
https://www.casualinf.com/post/2020-12-27-uiuc-public-gpa-dataset-exploration-with-shiny/
Mon, 28 Dec 2020 00:00:00 +0000youremail@domain.com (John Lee)https://www.casualinf.com/post/2020-12-27-uiuc-public-gpa-dataset-exploration-with-shiny/Last year, I thought it would be a good idea to dig through the GPA data set available from here. I started building a Shiny app that lets the user explore certain aspects of the data. Now, it’s almost been a year and I haven’t got the chance and the will to work on it until now. I made it really simple so that I can quickly move on to other topics instead of dragging this on for another year with an unfinished product.Grasping Power
https://www.casualinf.com/post/grasping-power/
Sun, 10 Nov 2019 00:00:00 +0000youremail@domain.com (John Lee)https://www.casualinf.com/post/grasping-power/I was reading a paper on calculation of sample sizes, and I inevitably came across the topic of statistical power. Essentially, when you’re designing on experiment, the sample size is an important factor to consider due to limiting resources. You want to have a sample size that is neither too small (which could result in high chance of failure to detect true differences) nor too big (potential waste of resources, albeit yielding better estimation).The Phi Function
https://www.casualinf.com/post/the-phi-function/
Thu, 10 Oct 2019 00:00:00 +0000youremail@domain.com (John Lee)https://www.casualinf.com/post/the-phi-function/I frequently encounter the \(\Phi\) and \(\Phi^{-1}\) functions in statistical texts. For some reason, the notation always throws me off guard, and I have to spend a few minutes visualizing. This post draws a definitive link between the functions and corresponding graphs. This ought to help me save some time and build more solid understanding of the concepts that make use of this.
The \(\Phi\) function is simply cumulative distribution function, \(F\), of a standard normal distribution.Sorting Comparison Pt. 2
https://www.casualinf.com/post/sorting-comparison-pt-2/
Sun, 04 Aug 2019 00:00:00 +0000youremail@domain.com (John Lee)https://www.casualinf.com/post/sorting-comparison-pt-2/Load all the datasets that I’ve saved from the previous benchmarks
set.seed(12345) library(microbenchmark) library(tidyverse) library(knitr) library(kableExtra) load("2019-03-01-sorting-comparison/sort_comparisons") Blowing off the Dust I see that in my environment, two variables, special_case_sort_time and trend_sort_time are loaded. It’s been a long time since I’ve created these data, so I have an unclear memory as to what these objects are. Usually I use str, class to understand they are. I also make use of head to quickly glance at the data usually if it is a data.Sorting Comparison
https://www.casualinf.com/post/2019-03-01-sorting-comparison/sorting-comparison/
Fri, 08 Mar 2019 00:00:00 +0000youremail@domain.com (John Lee)https://www.casualinf.com/post/2019-03-01-sorting-comparison/sorting-comparison/As I’m self studying algorithms and data structures with python from here, I figured I could try to do some experiments with different sorting algorithms using my own implementations in R.
Types of sorting algorithms I will use:
Bubble Sort Insertion Sort Selection Sort Shell Sort Merge Sort Quick Sort I will be dealing with a vector of type double. It can be a collection of any real positive numbers.Two-Dimension LDA
https://www.casualinf.com/post/two-dimension-lda/
Mon, 04 Feb 2019 00:00:00 +0000youremail@domain.com (John Lee)https://www.casualinf.com/post/two-dimension-lda/LDA, Linear Discriminant Analysis, is a classification method and a dimension reducion technique. I’ll focus more on classification. LDA calculates a linear discriminant function (which arises from assuming Gaussian distribution) for each class, and chooses a class that maximizes such function. The linear discriminant function therefore dictates a linear decision boundary for choosing a class. The decision boundary should be linear in the feature space. Discriminant analysis itself isn’t inherently linear.Covariance Matrix
https://www.casualinf.com/post/covariance-matrix/
Mon, 16 Jul 2018 00:00:00 +0000youremail@domain.com (John Lee)https://www.casualinf.com/post/covariance-matrix/In my first machine learning class, in order to learn about the theory behind PCA (Principal Component Analysis), we had to learn about variance-covariance matrix. I was concurrently taking a basic theoretical probability and statistics, so even the idea of variance was still vague to me. Despite the repeated attempts to understand covariance, I still had trouble fully capturing the intuition behind the covariance between two random variables. Even now, application and verification of correct usage of mathematical properties of covariance requires intensive googling.My First Post
https://www.casualinf.com/post/first-post/
Sun, 17 Jun 2018 00:00:00 +0000youremail@domain.com (John Lee)https://www.casualinf.com/post/first-post/This is the first blog post of my life! I will be using this blog to post about anything that I want to share in statistics. For starter, I will run a linear regression with the iris dataset.
names(iris) ## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" Let’s predict Sepal.Length with Petal.Length and Petal.Width.
#separate into training and testing sets set.seed(1234) train_ind <- sample(nrow(iris), floor(0.8 * nrow(iris))) iris_train <- iris[train_ind,] iris_test <- iris[-train_ind,] #run linear regression iris_lm <- lm(Sepal.About me
https://www.casualinf.com/page/about/
Mon, 01 Jan 0001 00:00:00 +0000youremail@domain.com (John Lee)https://www.casualinf.com/page/about/My English name is John Lee and my Korean name is Lee SooHwan.
I was born in Gainesville, Florida, and spent most of my childhood in Seoul, South Korea, and Hamilton, Ontario, Canada. I attended Benicia High School in a quiet town called Benicia, which is located in the Bay Area.
I am currently attending University of Illinois at Urbana-Champaign as a double major in Mathematics and Statistics.
This blog is my way of motivating myself to understand statistics more accurately and more in-depth, code, learn and practice statistical models, and ultimately become a better statistician by conveying these ideas in my own way to an imaginary audience.