are you me?    id passwd


 being aligned



 생각 생각 생각 (Feat. 수민), 뮤지



여느때와 같이 incognito 노래를 들었는데 - 일상

불현듯 어느 가사가 충렬왕으로 들렸다
그리고 하루종일 그 알 수없는 노래의 알수 없는 부분이 머리를 맴돌았다 뭐 정ㅎ학히는 충렬왕이라는 딘어가 말이다

무슨노래 였을까;

참고로 내머리에 남은 고려는 정도전 쌍화점 뿐이다;

국사 공부와 영어 공부가 동시에 필요한 현상이었다.

written time : 2017-08-08 23:46:21.0

휴가 - 일상

휴가철 이기도 하고 상황도 그렇고 해서 반나절을 작정하고 아무 것도 하지 않은채
대구로 출발했다

written time : 2017-08-05 17:43:29.0

계속되는 불명확함;;; - 컴퓨터

Generative vs. Discriminative; Bayesian vs. Frequentist

I was helping Boyi Xie get ready for his Ph.D. qualifying exams in computer science at Columbia and at one point I wrote the following diagram on the board to lay out the generative/discriminative and Bayesian/frequentist distinctions in what gets modeled.

To keep it in the NLP domain, let’s assume we have a simple categorization problem to predict a category z for an input consisting of a bag of words vector w and parameters β.

Discriminativep(z ; w, β)p(z, β ; w) = p(z | β ; w) * p(β)
Generativep(z, w ; β)p(z, w, β) = p(z, w | β) * p(β)

If you’re not familiar with frequentist notation, items to the right of the semicolon (;) are not modeled probabilistically.

Frequentists model the probability of observations given the parameters. This involves a likelihood function.

Bayesians model the joint probability of data and parameters, which, given the chain rule, amounts to a likelihood function times a prior.

Generative models provide a probabilistic model of the predictors, here the words w, and the categories z, whereas discriminative models only provide a probabilistic model of the categories z given the words w.

All Bayesian Models are Generative (in Theory)

I had a brief chat with Andrew Gelman about the topic of generative vs. discriminative models. It came up when I was asking him why he didn’t like the frequentist semicolon notation for variables that are not random. He said that in Bayesian analyses, all the variables are considered random. It just turns out we can sidestep having to model the predictors (i.e., covariates or features) in a regression. Andrew explains how this works in section 14.1, subsection “Formal Bayesian justification of conditional modeling” in Bayesian Data Analysis, 2nd Edition (the 3rd Edition is now complete and will be out in the foreseeable future). I’ll repeat the argument here.

Suppose we have predictor matrix X and outcome vector y. In regression modeling, we assume a coefficient vector \beta and explicitly model the outcome likelihood p(y|\beta,X) and coefficient prior p(\beta). But what about X? If we were modeling X, we’d have a full likelihood p(X,y|\beta,\psi), where \psi are the additional parameters involved in modeling X and the prior is now joint, p(\beta,\psi).

So how do we justify conditional modeling as Bayesians? We simply assume that \psi and \beta have independent priors, so that

p(\beta,\psi) = p(\beta) \times p(\psi).

The posterior then neatly factors as

p(\beta,\psi|y,X) = p(\psi|X) \times p(\beta|X,y).

Looking at just the inference for the regression coefficients \beta, we have the familiar expression

p(\beta|y,X) \propto p(\beta) \times p(y|\beta,X).

Therefore, we can think of everything as a joint model under the hood. Regression models involve an independence assumption so we can ignore the inference for \psi. To quote BDA,

The practical advantage of using such a regression model is that it is much easier to specify a realistic conditional distribution of one variable given k others than a joint distribution on all k+1 variables.

We knew that all along.

수식이 조금 깨지겠지만 우선..

written time : 2017-07-28 20:08:34.0
...  20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 |  ...