ํƒœ๊ทธ ๋ณด๊ด€๋ฌผ: maximum-likelihood

maximum-likelihood

์ปค๋„ ๋ฐ€๋„ ์ถ”์ •๊ธฐ๋ฅผ 2D์— ํ†ตํ•ฉ ์˜๋ขฐ ๊ทธ๋ž˜์„œ F

์•„๋ฌด๋„์ด ๊ธธ์„ ๋”ฐ๋ผ ๊ฐ€๊ณ  ์‹ถ์–ดํ•˜๋Š” ๊ฒฝ์šฐ์— ๋Œ€๋น„ ํ•˜์—ฌ์ด ์งˆ๋ฌธ ์—์„œ ์™”์Šต๋‹ˆ๋‹ค .

๊ธฐ๋ณธ์ ์œผ๋กœ ๋‚˜๋Š” ๊ฐ ๊ฐ์ฒด์— ์ฃผ์–ด์ง„ ์ˆ˜์˜ ์ธก์ • ๊ฐ’์ด ๋ถ™์–ด์žˆ๋Š” N ๊ฐœ์˜ ๊ฐ์ฒด ๋กœ ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ ์„ธํŠธ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค (์ด ๊ฒฝ์šฐ 2 ๊ฐœ).

ฮฉ

N

ฮฉ=o1[x1,y1],o2[x2,y2],...,oN[xN,yN]

I๋Š” ํ™•๋ฅ ์„ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š” ์ƒˆ๋กœ์šด ์˜ค๋ธŒ์ ํŠธ ์— ์†ํ•˜๋Š” ฮฉ I๋Š” ํ™•๋ฅ  ๋ฐ€๋„ ์ˆ˜๋“ ์งˆ๋ฌธ์— ์˜๋ขฐ ๊ทธ๋ž˜์„œ F ๋‚ด๊ฐ€ ์ด๋ฏธ ํŒ๋‹จ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •๊ธฐ ํ†ตํ•ด์„ .

p[xp,yp]

ฮฉ

f^

์ œ ๋ชฉํ‘œ๋Š”์ด ์ƒˆ๋กœ์šด ์˜ค๋ธŒ์ ํŠธ์˜ ํ™•๋ฅ  (ํš๋“ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์„ค์ •์ด 2D ๋ฐ์ดํ„ฐ์— ์†ํ•˜๋Š”) ฮฉ์„ , I๋Š” PDF๋กœ ํ†ตํ•ฉ ๋“ค์—ˆ๋‹ค F๋ฅผ โ€œ์œ„์— ์ง€์ง€์ฒด์˜ ๊ฐ’๋˜๋Š” ๋ฐ€๋„ โ€ ๊ด€์ฐฐ ํ•œ ๊ฒƒ๋ณด๋‹ค ์ ์Šต๋‹ˆ๋‹ค โ€œ. ์€ โ€œ๊ด€์ฐฐโ€๋ฐ€๋„๊ฐ€ f๋ฅผ ์ƒˆ๋กœ์šด ๊ฐ์ฒด ํ‰๊ฐ€ (P) , ์ฆ‰ : F ( X์˜ P , Y , P ) . ๋”ฐ๋ผ์„œ ๋ฐฉ์ •์‹์„ ํ’€์–ด์•ผํ•ฉ๋‹ˆ๋‹ค.

p[xp,yp]

ฮฉ

f^

f^

p

f^(xp,yp)

โˆฌx,y:f^(x,y)<f^(xp,yp)f^(x,y)dxdy

๋‚ด 2D ๋ฐ์ดํ„ฐ ์„ธํŠธ (python์˜ stats.gaussian_kde ๋ชจ๋“ˆ์„ ํ†ตํ•ด ์–ป์€)์˜ PDF๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์— ์ด๋ฏธ์ง€ ์„ค๋ช…์„ ์ž…๋ ฅํ•˜์‹ญ์‹œ์˜ค

์—ฌ๊ธฐ์„œ ๋นจ๊ฐ„ ์  ์€ ๋‚ด ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ PDF ์œ„์— ๊ทธ๋ ค์ง„ ์ƒˆ ๊ฐ์ฒด ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค .

p[xp,yp]

๊ทธ๋ž˜์„œ ์งˆ๋ฌธ์€ : pdf๊ฐ€ ๊ทธ๋ ‡๊ฒŒ ๋ณด์ผ ๋•Œ ํ•œ๊ณ„ ๋Œ€ํ•ด ์œ„์˜ ์ ๋ถ„์„ ์–ด๋–ป๊ฒŒ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

x,y:f^(x,y)<f^(xp,yp)

๋”ํ•˜๋‹ค

์˜๊ฒฌ ์ค‘ ํ•˜๋‚˜์—์„œ ์–ธ๊ธ‰ ํ•œ Monte Carlo ๋ฐฉ๋ฒ•์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ์ž‘๋™ํ•˜๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ๋ช‡ ๊ฐ€์ง€ ํ…Œ์ŠคํŠธ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ๋‚ด๊ฐ€ ์–ป์€ ๊ฒƒ์ž…๋‹ˆ๋‹ค :

์ € ๋Œ€์—ญํญ ์˜์—ญ์˜ ๊ฒฝ์šฐ ๋‘ ๋Œ€์—ญํญ์ด ๊ฑฐ์˜ ๋™์ผํ•œ ๋ณ€๋™์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ’์ด ์•ฝ๊ฐ„ ๋” ํฌ๊ฒŒ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. ํ‘œ์—์„œ์˜ ๊ฐ€์žฅ ํฐ ๋ณ€ํ™”๋Š” ์ฐจ์ด ์ค€๋‹ค ์‹ค๋ฒ„์˜ VS 2,500 1,000 ์ƒ˜ํ”Œ ๊ฐ’์„ ๋น„๊ต ์  (x, y) = (2.4,1.5) ๋ฐœ์ƒ 0.0126๋˜๋Š” ~1.3%. ์ œ ๊ฒฝ์šฐ์—๋Š” ์ด๊ฒƒ์ด ๋Œ€๋ถ€๋ถ„ ํ—ˆ์šฉ๋ฉ๋‹ˆ๋‹ค.

ํŽธ์ง‘ : ๋ฐฉ๊ธˆ 2 ์ฐจ์›์—์„œ Scott์˜ ๊ทœ์น™์ด ์—ฌ๊ธฐ์— ์ฃผ์–ด์ง„ ์ •์˜์— ๋”ฐ๋ผ Silverman์˜ ๊ทœ์น™๊ณผ ๋™์ผํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•˜ ์Šต๋‹ˆ๋‹ค .



๋‹ต๋ณ€

๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ ์ ๋ถ„ ์˜์—ญ์„ ๋ž˜์Šคํ„ฐ ํ™”ํ•˜๊ณ  ์ ๋ถ„์— ๋Œ€ํ•œ ์ด์‚ฐ ๊ทผ์‚ฌ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ฃผ์˜ํ•ด์•ผ ํ•  ์‚ฌํ•ญ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  1. ํฌ์ธํŠธ์˜ ๋ฒ”์œ„๋ณด๋‹ค ๋” ๋งŽ์€ ๊ฒƒ์„ ํฌํ•จํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •์น˜๊ฐ€ ์ธ์‹ ๊ฐ€๋Šฅํ•œ ๊ฐ’์„ ๊ฐ€์งˆ ๋ชจ๋“  ์œ„์น˜๋ฅผ ํฌํ•จํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ํฌ์ธํŠธ์˜ ๋ฒ”์œ„๋ฅผ ์ปค๋„ ๋Œ€์—ญํญ์˜ 3 ~ 4 ๋ฐฐ๋กœ ํ™•์žฅํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค (๊ฐ€์šฐ์Šค ์ปค๋„์˜ ๊ฒฝ์šฐ).

  2. ๋ž˜์Šคํ„ฐ์˜ ํ•ด์ƒ๋„์— ๋”ฐ๋ผ ๊ฒฐ๊ณผ๊ฐ€ ์•ฝ๊ฐ„ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ด์ƒ๋„๋Š” ๋Œ€์—ญํญ์˜ ์ž‘์€ ๋ถ€๋ถ„์ด์–ด์•ผํ•ฉ๋‹ˆ๋‹ค. ๊ณ„์‚ฐ ์‹œ๊ฐ„์€ ๋ž˜์Šคํ„ฐ์˜ ์…€ ์ˆ˜์— ๋น„๋ก€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์˜๋„ ํ•œ ๊ฒƒ๋ณด๋‹ค ๊ฑฐ์นœ ํ•ด์ƒ๋„๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ผ๋ จ์˜ ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ ๊ฑฐ์˜ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ตœ๊ณ ์˜ ํ•ด์ƒ๋„. ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ ๋” ์ •๋ฐ€ํ•œ ํ•ด์ƒ๋„๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ์€ 256 ํฌ์ธํŠธ์˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ๊ทธ๋ฆผ์ž…๋‹ˆ๋‹ค.

ํฌ์ธํŠธ๋Š” ๋‘ ๊ฐœ์˜ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •์น˜์— ๊ฒน์ณ์ง„ ๊ฒ€์€ ์ ์œผ๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. 6 ๊ฐœ์˜ ํฐ ์ ์ƒ‰ ์ ์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํ‰๊ฐ€๋˜๋Š” "ํ”„๋กœ๋ธŒ"์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ 1000 x 1000 ์…€์˜ ํ•ด์ƒ๋„์—์„œ 4 ๊ฐœ์˜ ๋Œ€์—ญํญ (๊ธฐ๋ณธ์ ์œผ๋กœ 1.8 (์ˆ˜์ง)๊ณผ 3 (์ˆ˜ํ‰), 1/2, 1, 5 ๋‹จ์œ„)์— ๋Œ€ํ•ด ์ˆ˜ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ ์‚ฐ์ ๋„ ๋งคํŠธ๋ฆญ์Šค๋Š” ๊ฒฐ๊ณผ๊ฐ€ 6 ๊ฐ€์ง€ ํ”„๋กœ๋ธŒ ํฌ์ธํŠธ์˜ ๋Œ€์—ญํญ์— ์–ผ๋งˆ๋‚˜ ๊ฐ•ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๋ณ€ํ˜•์€ ๋‘ ๊ฐ€์ง€ ์ด์œ ๋กœ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๋ถ„๋ช…ํžˆ ๋ฐ€๋„ ์ถ”์ •์น˜๊ฐ€ ๋‹ค๋ฅด๋ฏ€๋กœ ํ•œ ๊ฐ€์ง€ ํ˜•ํƒœ์˜ ๋ณ€๋™์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๋” ์ค‘์š”ํ•œ ๊ฒƒ์€ ๋ฐ€๋„ ์ถ”์ •์น˜์˜ ์ฐจ์ด๊ฐ€ ๋‹จ์ผ ( "ํ”„๋กœ๋ธŒ") ์ง€์ ์—์„œ ํฐ ์ฐจ์ด๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค. ํ›„์ž์˜ ๋ณ€ํ˜•์€ ์  ๊ตฐ์ง‘์˜ ์ค‘๊ฐ„ ๋ฐ€๋„ "์ •์ง€"์ฃผ์œ„์—์„œ ๊ฐ€์žฅ ํฝ๋‹ˆ๋‹ค.์ด ๊ณ„์‚ฐ์ด ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ์œ„์น˜์ž…๋‹ˆ๋‹ค.

์ด๋Š” ์ด๋Ÿฌํ•œ ๊ณ„์‚ฐ ๊ฒฐ๊ณผ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ํ•ด์„ ํ•  ๋•Œ ์ƒ๋‹นํ•œ์ฃผ์˜๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ์ž„์˜์˜ ๊ฒฐ์ • (์‚ฌ์šฉํ•  ๋Œ€์—ญํญ)์— ๋งค์šฐ ๋ฏผ๊ฐ ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.


R ์ฝ”๋“œ

์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ฒซ ๋ฒˆ์งธ ํ•จ์ˆ˜์˜ 6 ์ค„์— ํฌํ•จ๋˜์–ด f์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์šฉ๋ฒ•์„ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ๋‚˜๋จธ์ง€ ์ฝ”๋“œ๋Š” ์•ž์˜ ๊ทธ๋ฆผ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

library(MASS)     # kde2d
library(spatstat) # im class
f <- function(xy, n, x, y, ...) {
  #
  # Estimate the total where the density does not exceed that at (x,y).
  #
  # `xy` is a 2 by ... array of points.
  # `n`  specifies the numbers of rows and columns to use.
  # `x` and `y` are coordinates of "probe" points.
  # `...` is passed on to `kde2d`.
  #
  # Returns a list:
  #   image:    a raster of the kernel density
  #   integral: the estimates at the probe points.
  #   density:  the estimated densities at the probe points.
  #
  xy.kde <- kde2d(xy[1,], xy[2,], n=n, ...)
  xy.im <- im(t(xy.kde$z), xcol=xy.kde$x, yrow=xy.kde$y) # Allows interpolation $
  z <- interp.im(xy.im, x, y)                            # Densities at the probe points
  c.0 <- sum(xy.kde$z)                                   # Normalization factor $
  i <- sapply(z, function(a) sum(xy.kde$z[xy.kde$z < a])) / c.0
  return(list(image=xy.im, integral=i, density=z))
}
#
# Generate data.
#
n <- 256
set.seed(17)
xy <- matrix(c(rnorm(k <- ceiling(2*n * 0.8), mean=c(6,3), sd=c(3/2, 1)),
               rnorm(2*n-k, mean=c(2,6), sd=1/2)), nrow=2)
#
# Example of using `f`.
#
y.probe <- 1:6
x.probe <- rep(6, length(y.probe))
lims <- c(min(xy[1,])-15, max(xy[1,])+15, min(xy[2,])-15, max(xy[2,]+15))
ex <- f(xy, 200, x.probe, y.probe, lim=lims)
ex$density; ex$integral
#
# Compare the effects of raster resolution and bandwidth.
#
res <- c(8, 40, 200, 1000)
system.time(
  est.0 <- sapply(res,
           function(i) f(xy, i, x.probe, y.probe, lims=lims)$integral))
est.0
system.time(
  est.1 <- sapply(res,
           function(i) f(xy, i, x.probe, y.probe, h=1, lims=lims)$integral))
est.1
system.time(
  est.2 <- sapply(res,
           function(i) f(xy, i, x.probe, y.probe, h=1/2, lims=lims)$integral))
est.2
system.time(
  est.3 <- sapply(res,
           function(i) f(xy, i, x.probe, y.probe, h=5, lims=lims)$integral))
est.3
results <- data.frame(Default=est.0[,4], Hp5=est.2[,4],
                      H1=est.1[,4], H5=est.3[,4])
#
# Compare the integrals at the highest resolution.
#
par(mfrow=c(1,1))
panel <- function(x, y, ...) {
  points(x, y)
  abline(c(0,1), col="Red")
}
pairs(results, lower.panel=panel)
#
# Display two of the density estimates, the data, and the probe points.
#
par(mfrow=c(1,2))
xy.im <- f(xy, 200, x.probe, y.probe, h=0.5)$image
plot(xy.im, main="Bandwidth=1/2", col=terrain.colors(256))
points(t(xy), pch=".", col="Black")
points(x.probe, y.probe, pch=19, col="Red", cex=.5)

xy.im <- f(xy, 200, x.probe, y.probe, h=5)$image
plot(xy.im, main="Bandwidth=5", col=terrain.colors(256))
points(t(xy), pch=".", col="Black")
points(x.probe, y.probe, pch=19, col="Red", cex=.5)

๋‹ต๋ณ€

์ ์ ˆํ•œ ์ˆ˜์˜ ๊ด€์ธก์น˜๊ฐ€์žˆ๋Š” ๊ฒฝ์šฐ ํ†ตํ•ฉ์„ ์ „ํ˜€ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š์•„๋„๋ฉ๋‹ˆ๋‹ค. ์ƒˆ๋กœ์šด ํฌ์ธํŠธ๊ฐ€ . ๋‹น์‹ ์ด ๋ฐ€๋„ ์ถ”์ •์ด ๊ฐ€์ • ; ๋Œ€ํ•œ ๊ด€์ธก์น˜ ์ˆ˜ ๋ฅผ ์š”์•ฝ ํ•˜๊ณ  ํ‘œ๋ณธ ํฌ๊ธฐ๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ํ•„์š”ํ•œ ํ™•๋ฅ ์— ๋Œ€ํ•œ ๊ทผ์‚ฌ์น˜๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

x0

f^

x

f^(x)<f^(x0)

์ด๊ฒƒ์€ ๊ฐ€ "๋„ˆ๋ฌด ์ž‘์ง€ ์•Š๋‹ค"๊ณ  ์ƒ˜ํ”Œ ํฌ๊ธฐ๊ฐ€ ์ €๋ฐ€๋„ ์ง€์—ญ์—์„œ ์ ์ ˆํ•œ ์ถ”์ •์น˜๋ฅผ ์ œ๊ณตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํžˆ ํฌ๊ณ  (ํ™•์‚ฐ ๋  ์ •๋„๋กœ) ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ณ€ ๋Ÿ‰ ๊ฒฝ์šฐ 20000 ๊ฑด์ด ์ถฉ๋ถ„ํžˆ ์ปค ๋ณด์ž…๋‹ˆ๋‹ค .

f^(x0)

x

๋‹ต๋ณ€