I have an array of
real values, which has mean
and standard deviation
. If an element of the array
is replaced by another element
, then new mean will be
이 방법의 장점은 값에 관계없이 일정한 계산이 필요하다는 것 입니다. 계산에 대한 접근도는 σ N E w 사용 σ O L D를 의 계산과 같은 μ n은 전자 w 사용 μ O L D는 ?
답변
“분산 계산하기위한 알고리즘”에 대한 위키 백과의 문서 섹션 방법 요소가 귀하의 관찰에 추가하는 경우 분산을 계산하는 방법을 보여줍니다. (표준 편차는 분산의 제곱근입니다.) x n + 1을 더 한다고 가정합니다.
을 배열에 추가 한 다음
EDIT: Above formula seems to be wrong, see comment.
Now, replacing an element means adding an observation and removing another one; both can be computed with the formula above. However, keep in mind that problems of numerical stability may ensue; the quoted article also proposes numerically stable variants.
To derive the formula by yourself, compute
using the definition of sample variance and substitute
by the formula you gave when appropriate. This gives you
in the end, and thus a formula for
given
and
. In my notation, I assume you replace the element
by
:
The
in the sum transform into something dependent of
, but you’ll have to work the equation a little bit more to derive a neat result. This should give you the general idea.
답변
Based on what i think i’m reading on the linked Wikipedia article you can maintain a “running” standard deviation:
real sum = 0;
int count = 0;
real S = 0;
real variance = 0;
real GetRunningStandardDeviation(ref sum, ref count, ref S, x)
{
real oldMean;
if (count >= 1)
{
real oldMean = sum / count;
sum = sum + x;
count = count + 1;
real newMean = sum / count;
S = S + (x-oldMean)*(x-newMean)
}
else
{
sum = x;
count = 1;
S = 0;
}
//estimated Variance = (S / (k-1) )
//estimated Standard Deviation = sqrt(variance)
if (count > 1)
return sqrt(S / (count-1) );
else
return 0;
}
Although in the article they don’t maintain a separate running sum
and count
, but instead have the single mean
. Since in thing i’m doing today i keep a count
(for statistical purposes), it is more useful to calculate the means each time.
답변
Given original
,
, and
, as well as the change of a given element
to
, I believe your new standard deviation
will be the square root of
where
, with
denoting the new mean.
Maybe there is a snazzier way of writing it?
I checked this against a small test case and it seemed to work.