데이터 세트 변경 후 기존 표준 편차를 사용하여 새로운 표준 편차 계산 nnn real values, which has mean

I have an array of $n$

n

$n$ real values, which has mean $μ_{o l d}$

μ_{o l d}

$\mu_{old}$ and standard deviation $σ_{o l d}$

σ_{o l d}

$\sigma_{old}$ . If an element of the array $x_{i}$

x_{i}

$x_i$ is replaced by another element $x_{j}$

x_{j}

$x_j$ , then new mean will be

$μ_{n e w} = μ_{o l d} + \frac{x_{j} - x_{i}}{n}$
$μ_{n e w} = μ_{o l d} + \frac{x_{j} - x_{i}}{n}$
$\mu_{new}=\mu_{old}+\frac{x_j-x_i}{n}$

이 방법의 장점은 값에 관계없이 일정한 계산이 필요하다는 것 입니다. 계산에 대한 접근도는 사용 의 계산과 같은 사용 ? $n$

n

$n$ $σ_{n e w}$

σ_{n e w}

$\sigma_{new}$ $σ_{o l d}$

σ_{o l d}

$\sigma_{old}$ $μ_{n e w}$

μ_{n e w}

$\mu_{new}$ $μ_{o l d}$

μ_{o l d}

$\mu_{old}$

답변

“분산 계산하기위한 알고리즘”에 대한 위키 백과의 문서 섹션 방법 요소가 귀하의 관찰에 추가하는 경우 분산을 계산하는 방법을 보여줍니다. (표준 편차는 분산의 제곱근입니다.) 한다고 가정합니다. $x_{n + 1}$

x_{n + 1}

$x_{n+1}$ 을 배열에 추가 한 다음

σ_{n e w}^{2} = σ_{o l d}^{2} + (x_{n + 1} - μ_{n e w}) (x_{n + 1} - μ_{o l d}) .

$\sigma_{new}^2 = \sigma_{old}^2 + (x_{n+1} - \mu_{new})(x_{n+1} - \mu_{old}).$

EDIT: Above formula seems to be wrong, see comment.

Now, replacing an element means adding an observation and removing another one; both can be computed with the formula above. However, keep in mind that problems of numerical stability may ensue; the quoted article also proposes numerically stable variants.

To derive the formula by yourself, compute $(n - 1) (σ_{n e w}^{2} - σ_{o l d}^{2})$

(n - 1) (σ_{n e w}^{2} - σ_{o l d}^{2})

$(n-1)(\sigma_{new}^2 - \sigma_{old}^2)$ using the definition of sample variance and substitute $μ_{n e w}$

μ_{n e w}

$\mu_{new}$ by the formula you gave when appropriate. This gives you $σ_{n e w}^{2} - σ_{o l d}^{2}$

σ_{n e w}^{2} - σ_{o l d}^{2}

$\sigma_{new}^2 - \sigma_{old}^2$ in the end, and thus a formula for $σ_{n e w}$

σ_{n e w}

$\sigma_{new}$ given $σ_{o l d}$

σ_{o l d}

$\sigma_{old}$ and $μ_{o l d}$

μ_{o l d}

$\mu_{old}$ . In my notation, I assume you replace the element $x_{n}$

x_{n}

$x_n$ by $x_{n}^{'}$

x_{n}^{'}

$x_n'$ :

\begin{array}{rcl} σ^{2} & = & (n - 1)^{- 1} \sum_{k} (x_{k} - μ)^{2} \\ (n - 1) (σ_{n e w}^{2} - σ_{o l d}^{2}) & = & \sum_{k = 1}^{n - 1} ((x_{k} - μ_{n e w})^{2} - (x_{k} - μ_{o l d})^{2}) \\ + ((x_{n}^{'} - μ_{n e w})^{2} - (x_{n} - μ_{o l d})^{2}) \\ = & \sum_{k = 1}^{n - 1} ((x_{k} - μ_{o l d} - n^{- 1} (x_{n}^{'} - x_{n}))^{2} - (x_{k} - μ_{o l d})^{2}) \\ + ((x_{n}^{'} - μ_{o l d} - n^{- 1} (x_{n}^{'} - x_{n}))^{2} - (x_{n} - μ_{o l d})^{2}) \end{array}

$\begin{eqnarray*} \sigma^2 &=& (n-1)^{-1} \sum_k (x_k - \mu)^2 \\ (n-1)(\sigma_{new}^2 - \sigma_{old}^2) &=& \sum_{k=1}^{n-1} ((x_k - \mu_{new})^2 - (x_k - \mu_{old})^2) \\ &&+\ ((x_n' - \mu_{new})^2 - (x_n - \mu_{old})^2) \\ &=& \sum_{k=1}^{n-1} ((x_k - \mu_{old} - n^{-1}(x_n'-x_n))^2 - (x_k - \mu_{old})^2) \\ &&+\ ((x_n' - \mu_{old} - n^{-1}(x_n'-x_n))^2 - (x_n - \mu_{old})^2) \\ \end{eqnarray*}\\$

The $x_{k}$

x_{k}

$x_k$ in the sum transform into something dependent of $μ_{o l d}$

μ_{o l d}

$\mu_{old}$ , but you’ll have to work the equation a little bit more to derive a neat result. This should give you the general idea.

답변

Based on what i think i’m reading on the linked Wikipedia article you can maintain a “running” standard deviation:

real sum = 0;
int count = 0;
real S = 0;
real variance = 0;

real GetRunningStandardDeviation(ref sum, ref count, ref S, x)
{
   real oldMean;

   if (count >= 1)
   {
       real oldMean = sum / count;
       sum = sum + x;
       count = count + 1;
       real newMean = sum / count;

       S = S + (x-oldMean)*(x-newMean)
   }
   else
   {
       sum = x;
       count = 1;
       S = 0;
   }

   //estimated Variance = (S / (k-1) )
   //estimated Standard Deviation = sqrt(variance)
   if (count > 1)
      return sqrt(S / (count-1) );
   else
      return 0;
}

Although in the article they don’t maintain a separate running sum and count, but instead have the single mean. Since in thing i’m doing today i keep a count (for statistical purposes), it is more useful to calculate the means each time.

답변

Given original $\bar{x}$

\bar{x}

$\bar x$ , $s$

s

$s$ , and $n$

n

$n$ , as well as the change of a given element $x_{n}$

x_{n}

$x_n$ to $x_{n}^{'}$

x_{n}^{'}

$x_n'$ , I believe your new standard deviation $s^{'}$

s^{'}

$s'$ will be the square root of

s^{2} + \frac{1}{n - 1} (2 n Δ \bar{x} (x_{n} - \bar{x}) + n (n - 1) (Δ \bar{x})^{2}),

$s^2 + \frac{1}{n-1}\left(2n\Delta \bar x(x_n-\bar x) +n(n-1)(\Delta \bar x)^2\right),$
where $Δ \bar{x} = {\bar{x}}^{'} - \bar{x}$

Δ \bar{x} = {\bar{x}}^{'} - \bar{x}

$\Delta \bar x = \bar x' - \bar x$ , with ${\bar{x}}^{'}$

{\bar{x}}^{'}

$\bar x'$ denoting the new mean.

Maybe there is a snazzier way of writing it?

I checked this against a small test case and it seemed to work.

How IT

언제든지 물어보세요.

데이터 세트 변경 후 기존 표준 편차를 사용하여 새로운 표준 편차 계산 nnn real values, which has mean

답변

답변

답변

답변