Naturally, we want to maximize the right-hand-side of the above statement, which happens to be our likelihood function. I like to think of the likelihood function as “the likelihood that our model will correctly predict any given \(y\) value, given its corresponding feature vector \(\hat{x}\)”. It is, however, important to distinguish between probability and likelihood.. Now, we expand our likelih
