데이터 세트 변경 후 기존 표준 편차를 사용하여 새로운 표준 편차 계산 nnn real values, which has mean

I have an array of

n

real values, which has mean

μold

and standard deviation

σold

. If an element of the array

xi

is replaced by another element

xj

, then new mean will be

μnew=μold+xj−xin

이 방법의 장점은 값에 관계없이 일정한 계산이 필요하다는 것 입니다. 계산에 대한 접근도는 σ N E w 사용 σ O L D를 의 계산과 같은 μ n은 전자 w 사용 μ O L D는 ?

n

σnew

σold

μnew

μold


답변

“분산 계산하기위한 알고리즘”에 대한 위키 백과의 문서 섹션 방법 요소가 귀하의 관찰에 추가하는 경우 분산을 계산하는 방법을 보여줍니다. (표준 편차는 분산의 제곱근입니다.) x n + 1을 더 한다고 가정합니다.

xn+1

을 배열에 추가 한 다음

σnew2=σold2+(xn+1−μnew)(xn+1−μold).

EDIT: Above formula seems to be wrong, see comment.

Now, replacing an element means adding an observation and removing another one; both can be computed with the formula above. However, keep in mind that problems of numerical stability may ensue; the quoted article also proposes numerically stable variants.

To derive the formula by yourself, compute

(n−1)(σnew2−σold2)

using the definition of sample variance and substitute

μnew

by the formula you gave when appropriate. This gives you

σnew2−σold2

in the end, and thus a formula for

σnew

given

σold

and

μold

. In my notation, I assume you replace the element

xn

by

xn′

:

σ2=(n−1)−1∑k(xk−μ)2(n−1)(σnew2−σold2)=∑k=1n−1((xk−μnew)2−(xk−μold)2)+ ((xn′−μnew)2−(xn−μold)2)=∑k=1n−1((xk−μold−n−1(xn′−xn))2−(xk−μold)2)+ ((xn′−μold−n−1(xn′−xn))2−(xn−μold)2)

The

xk

in the sum transform into something dependent of

μold

, but you’ll have to work the equation a little bit more to derive a neat result. This should give you the general idea.


답변

Based on what i think i’m reading on the linked Wikipedia article you can maintain a “running” standard deviation:

real sum = 0;
int count = 0;
real S = 0;
real variance = 0;

real GetRunningStandardDeviation(ref sum, ref count, ref S, x)
{
   real oldMean;

   if (count >= 1)
   {
       real oldMean = sum / count;
       sum = sum + x;
       count = count + 1;
       real newMean = sum / count;

       S = S + (x-oldMean)*(x-newMean)
   }
   else
   {
       sum = x;
       count = 1;
       S = 0;
   }

   //estimated Variance = (S / (k-1) )
   //estimated Standard Deviation = sqrt(variance)
   if (count > 1)
      return sqrt(S / (count-1) );
   else
      return 0;
}

Although in the article they don’t maintain a separate running sum and count, but instead have the single mean. Since in thing i’m doing today i keep a count (for statistical purposes), it is more useful to calculate the means each time.


답변

Given original

,

s

, and

n

, as well as the change of a given element

xn

to

xn′

, I believe your new standard deviation

s′

will be the square root of

s2+1n−1(2nΔx¯(xn−x¯)+n(n−1)(Δx¯)2),


where

Δx¯=x¯′−x¯

, with

x¯′

denoting the new mean.

Maybe there is a snazzier way of writing it?

I checked this against a small test case and it seemed to work.