## What's new? |

## 계속되는 불명확함;;; - 컴퓨터 |

-------------------------------------------------------------------

https://lingpipe-blog.com/2013/04/12/generative-vs-discriminative-bayesian-vs-frequentist/

-------------------------------------------------------------------

Generative vs. Discriminative; Bayesian vs. Frequentist

I was helping Boyi Xie get ready for his Ph.D. qualifying exams in computer science at Columbia and at one point I wrote the following diagram on the board to lay out the generative/discriminative and Bayesian/frequentist distinctions in what gets modeled.

To keep it in the NLP domain, let’s assume we have a simple categorization problem to predict a category z for an input consisting of a bag of words vector w and parameters β.

If you’re not familiar with frequentist notation, items to the right of the semicolon (;) are not modeled probabilistically.

Frequentists model the probability of observations given the parameters. This involves a likelihood function.

Bayesians model the joint probability of data and parameters, which, given the chain rule, amounts to a likelihood function times a prior.

Generative models provide a probabilistic model of the predictors, here the words w, and the categories z, whereas discriminative models only provide a probabilistic model of the categories z given the words w.

----------------------------------------------------------------------------

https://lingpipe-blog.com/2013/05/23/all-bayesian-models-are-generative-in-theory/

----------------------------------------------------------------------------

All Bayesian Models are Generative (in Theory)

I had a brief chat with Andrew Gelman about the topic of generative vs. discriminative models. It came up when I was asking him why he didn’t like the frequentist semicolon notation for variables that are not random. He said that in Bayesian analyses, all the variables are considered random. It just turns out we can sidestep having to model the predictors (i.e., covariates or features) in a regression. Andrew explains how this works in section 14.1, subsection “Formal Bayesian justification of conditional modeling” in Bayesian Data Analysis, 2nd Edition (the 3rd Edition is now complete and will be out in the foreseeable future). I’ll repeat the argument here.

Suppose we have predictor matrix X and outcome vector y. In regression modeling, we assume a coefficient vector \beta and explicitly model the outcome likelihood p(y|\beta,X) and coefficient prior p(\beta). But what about X? If we were modeling X, we’d have a full likelihood p(X,y|\beta,\psi), where \psi are the additional parameters involved in modeling X and the prior is now joint, p(\beta,\psi).

So how do we justify conditional modeling as Bayesians? We simply assume that \psi and \beta have independent priors, so that

p(\beta,\psi) = p(\beta) \times p(\psi).

The posterior then neatly factors as

p(\beta,\psi|y,X) = p(\psi|X) \times p(\beta|X,y).

Looking at just the inference for the regression coefficients \beta, we have the familiar expression

p(\beta|y,X) \propto p(\beta) \times p(y|\beta,X).

Therefore, we can think of everything as a joint model under the hood. Regression models involve an independence assumption so we can ignore the inference for \psi. To quote BDA,

The practical advantage of using such a regression model is that it is much easier to specify a realistic conditional distribution of one variable given k others than a joint distribution on all k+1 variables.

We knew that all along.

----------------------------

수식이 조금 깨지겠지만 우선..

https://lingpipe-blog.com/2013/04/12/generative-vs-discriminative-bayesian-vs-frequentist/

-------------------------------------------------------------------

Generative vs. Discriminative; Bayesian vs. Frequentist

I was helping Boyi Xie get ready for his Ph.D. qualifying exams in computer science at Columbia and at one point I wrote the following diagram on the board to lay out the generative/discriminative and Bayesian/frequentist distinctions in what gets modeled.

To keep it in the NLP domain, let’s assume we have a simple categorization problem to predict a category z for an input consisting of a bag of words vector w and parameters β.

Frequentist | Bayesian | |

Discriminative | p(z ; w, β) | p(z, β ; w) = p(z | β ; w) * p(β) |

Generative | p(z, w ; β) | p(z, w, β) = p(z, w | β) * p(β) |

If you’re not familiar with frequentist notation, items to the right of the semicolon (;) are not modeled probabilistically.

Frequentists model the probability of observations given the parameters. This involves a likelihood function.

Bayesians model the joint probability of data and parameters, which, given the chain rule, amounts to a likelihood function times a prior.

Generative models provide a probabilistic model of the predictors, here the words w, and the categories z, whereas discriminative models only provide a probabilistic model of the categories z given the words w.

----------------------------------------------------------------------------

https://lingpipe-blog.com/2013/05/23/all-bayesian-models-are-generative-in-theory/

----------------------------------------------------------------------------

All Bayesian Models are Generative (in Theory)

I had a brief chat with Andrew Gelman about the topic of generative vs. discriminative models. It came up when I was asking him why he didn’t like the frequentist semicolon notation for variables that are not random. He said that in Bayesian analyses, all the variables are considered random. It just turns out we can sidestep having to model the predictors (i.e., covariates or features) in a regression. Andrew explains how this works in section 14.1, subsection “Formal Bayesian justification of conditional modeling” in Bayesian Data Analysis, 2nd Edition (the 3rd Edition is now complete and will be out in the foreseeable future). I’ll repeat the argument here.

Suppose we have predictor matrix X and outcome vector y. In regression modeling, we assume a coefficient vector \beta and explicitly model the outcome likelihood p(y|\beta,X) and coefficient prior p(\beta). But what about X? If we were modeling X, we’d have a full likelihood p(X,y|\beta,\psi), where \psi are the additional parameters involved in modeling X and the prior is now joint, p(\beta,\psi).

So how do we justify conditional modeling as Bayesians? We simply assume that \psi and \beta have independent priors, so that

p(\beta,\psi) = p(\beta) \times p(\psi).

The posterior then neatly factors as

p(\beta,\psi|y,X) = p(\psi|X) \times p(\beta|X,y).

Looking at just the inference for the regression coefficients \beta, we have the familiar expression

p(\beta|y,X) \propto p(\beta) \times p(y|\beta,X).

Therefore, we can think of everything as a joint model under the hood. Regression models involve an independence assumption so we can ignore the inference for \psi. To quote BDA,

The practical advantage of using such a regression model is that it is much easier to specify a realistic conditional distribution of one variable given k others than a joint distribution on all k+1 variables.

We knew that all along.

----------------------------

수식이 조금 깨지겠지만 우선..

written time : 2017-07-28 20:08:34.0

## Generative Model vs Discriminative Model - 컴퓨터 |

생성모델은 (Prior 와 conditional prob. 의 곱인) 결합확률 사용 분별모델은 conditional prob. 사용

[출처] wikipedia.org

In probability and statistics, a

[출처] wikipedia.org

[출처] http://www.cs.ualberta.ca/~chihoon/ml/slides/gvd.pdf

**Discriminative models**, also called conditional models, are a class of models used in machine learning for modeling the dependence of an unobserved variable y on an observed variable x. Within a probabilistic framework, this is done by modeling the conditional probability distribution P(y|x), which can be used for predicting y from x.[출처] wikipedia.org

In probability and statistics, a

**generative model**is a model for randomly generating observable data values, typically given some hidden parameters. It specifies a joint probability distribution over observation and label sequences. Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from a probability density function), or as an intermediate step to forming a conditional probability density function. A conditional distribution can be formed from a generative model through Bayes' rule.[출처] wikipedia.org

[출처] http://www.cs.ualberta.ca/~chihoon/ml/slides/gvd.pdf

written time : 2017-07-26 19:46:15.0

## python 공부 - 컴퓨터 |

oov checker.py

#!/usr/bin/python

import sys

if len(sys.argv) is not 3:

print "!!USAGE: ./oov_checker.py txt dic"

exit()

print "TEXT: ", sys.argv[1]

vocab = set(line.strip() for line in open(sys.argv[2]))

print "Dictionary: ", sys.argv[2], " Size: ", len(vocab)

txt = open(sys.argv[1])

unk_file = open(sys.argv[1]+".unk", "w")

line_N = 0

for line in txt:

words = line.split()

for word in words :

if word in vocab:

unk_file.write(word)

else:

unk_file.write("")

unk_file.write(" ")

unk_file.write("

")

line_N = line_N + 1

print "Total ", line_N," lines were processed and saved to ", sys.argv[1]+".unk"

#!/usr/bin/python

import sys

if len(sys.argv) is not 3:

print "!!USAGE: ./oov_checker.py txt dic"

exit()

print "TEXT: ", sys.argv[1]

vocab = set(line.strip() for line in open(sys.argv[2]))

print "Dictionary: ", sys.argv[2], " Size: ", len(vocab)

txt = open(sys.argv[1])

unk_file = open(sys.argv[1]+".unk", "w")

line_N = 0

for line in txt:

words = line.split()

for word in words :

if word in vocab:

unk_file.write(word)

else:

unk_file.write("

unk_file.write(" ")

unk_file.write("

")

line_N = line_N + 1

print "Total ", line_N," lines were processed and saved to ", sys.argv[1]+".unk"

written time : 2017-07-21 10:47:44.0