I have an array of
nreal values, which has mean
μoldand standard deviation
σold. If an element of the array
xiis replaced by another element
xj, then new mean will be
μnew=μold+xj−xin
이 방법의 장점은 값에 관계없이 일정한 계산이 필요하다는 것 입니다. 계산에 대한 접근도는 σ N E w 사용 σ O L D를 의 계산과 같은 μ n은 전자 w 사용 μ O L D는 ?
nσnew
σold
μnew
μold
답변
“분산 계산하기위한 알고리즘”에 대한 위키 백과의 문서 섹션 방법 요소가 귀하의 관찰에 추가하는 경우 분산을 계산하는 방법을 보여줍니다. (표준 편차는 분산의 제곱근입니다.) x n + 1을 더 한다고 가정합니다.
xn+1을 배열에 추가 한 다음
EDIT: Above formula seems to be wrong, see comment.
Now, replacing an element means adding an observation and removing another one; both can be computed with the formula above. However, keep in mind that problems of numerical stability may ensue; the quoted article also proposes numerically stable variants.
To derive the formula by yourself, compute
(n−1)(σnew2−σold2)using the definition of sample variance and substitute
μnewby the formula you gave when appropriate. This gives you
σnew2−σold2in the end, and thus a formula for
σnewgiven
σoldand
μold. In my notation, I assume you replace the element
xnby
xn′:
The
xkin the sum transform into something dependent of
μold, but you’ll have to work the equation a little bit more to derive a neat result. This should give you the general idea.
답변
Based on what i think i’m reading on the linked Wikipedia article you can maintain a “running” standard deviation:
real sum = 0;
int count = 0;
real S = 0;
real variance = 0;
real GetRunningStandardDeviation(ref sum, ref count, ref S, x)
{
real oldMean;
if (count >= 1)
{
real oldMean = sum / count;
sum = sum + x;
count = count + 1;
real newMean = sum / count;
S = S + (x-oldMean)*(x-newMean)
}
else
{
sum = x;
count = 1;
S = 0;
}
//estimated Variance = (S / (k-1) )
//estimated Standard Deviation = sqrt(variance)
if (count > 1)
return sqrt(S / (count-1) );
else
return 0;
}
Although in the article they don’t maintain a separate running sum
and count
, but instead have the single mean
. Since in thing i’m doing today i keep a count
(for statistical purposes), it is more useful to calculate the means each time.
답변
Given original
x¯,
s, and
n, as well as the change of a given element
xnto
xn′, I believe your new standard deviation
s′ will be the square root of
where
, with
x¯′denoting the new mean.
Maybe there is a snazzier way of writing it?
I checked this against a small test case and it seemed to work.