포드를 검사 할 때 다음과 같이 OOMKilled 이벤트에 대한 계측을 설정하고 싶습니다.
Name: pnovotnak-manhole-123456789-82l2h
Namespace: test
Node: test-cluster-cja8smaK-oQSR/10.x.x.x
Start Time: Fri, 03 Feb 2017 14:34:57 -0800
Labels: pod-template-hash=123456789
run=pnovotnak-manhole
Status: Running
IP: 10.x.x.x
Controllers: ReplicaSet/pnovotnak-manhole-123456789
Containers:
pnovotnak-manhole:
Container ID: docker://...
Image: pnovotnak/it
Image ID: docker://sha256:...
Port:
Limits:
cpu: 2
memory: 3Gi
Requests:
cpu: 200m
memory: 256Mi
State: Running
Started: Fri, 03 Feb 2017 14:41:12 -0800
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Fri, 03 Feb 2017 14:35:08 -0800
Finished: Fri, 03 Feb 2017 14:41:11 -0800
Ready: True
Restart Count: 1
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tder (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-46euo:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tder
QoS Class: Burstable
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
11m 11m 1 {default-scheduler } Normal Scheduled Successfully assigned pnovotnak-manhole-123456789-82l2h to test-cluster-cja8smaK-oQSR
10m 10m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Created Created container with docker id xxxxxxxxxxxx; Security:[seccomp=unconfined]
10m 10m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Started Started container with docker id xxxxxxxxxxxx
11m 4m 2 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Pulling pulling image "pnovotnak/it"
10m 4m 2 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Pulled Successfully pulled image "pnovotnak/it"
4m 4m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Created Created container with docker id yyyyyyyyyyyy; Security:[seccomp=unconfined]
4m 4m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Started Started container with docker id yyyyyyyyyyyy
포드 로그에서 얻는 것은 전부입니다.
{
textPayload: "shutting down, got signal: Terminated
"
insertId: "aaaaaaaaaaaaaaaa"
resource: {
type: "container"
labels: {
pod_id: "pnovotnak-manhole-123456789-82l2h"
...
}
}
timestamp: "2017-02-03T22:34:48Z"
severity: "ERROR"
labels: {
container.googleapis.com/container_name: "POD"
...
}
logName: "projects/myproj/logs/POD"
}
그리고 kublet 통나무;
{
insertId: "bbbbbbbbbbbbbb"
jsonPayload: {
_BOOT_ID: "ffffffffffffffffffffffffffffffff"
MESSAGE: "I0203 22:41:11.925928 1843 kubelet.go:1816] SyncLoop (PLEG): "pnovotnak-manhole-123456789-82l2h_test(a-uuid)", event: &pleg.PodLifecycleEvent{ID:"another-uuid", Type:"ContainerDied", Data:"..."}"
...
이것을 OOM 이벤트로 고유하게 식별하기에는 충분하지 않은 것 같습니다. 다른 아이디어가 있습니까?
답변
로그에 OOMKilled 이벤트가 없지만 포드가 종료 된 것을 감지 할 수 있으면 kubectl get pod -o go-template=... <pod-id>
이유를 판별하는 데 사용할 수 있습니다 . 문서 에서 직접 예를 들면 다음과 같습니다.
[13:59:01] $ ./cluster/kubectl.sh get pod -o go-template='{{range.status.containerStatuses}}{{"Container Name: "}}{{.name}}{{"\r\nLastState: "}}{{.lastState}}{{end}}' simmemleak-60xbc
Container Name: simmemleak
LastState: map[terminated:map[exitCode:137 reason:OOM Killed startedAt:2015-07-07T20:58:43Z finishedAt:2015-07-07T20:58:43Z containerID:docker://0e4095bba1feccdfe7ef9fb6ebffe972b4b14285d5acdec6f0d3ae8a22fad8b2]]
프로그래밍 방식 으로이 작업을 수행하는 경우 kubectl
출력 에 의존하는 더 좋은 대안 은 Kubernetes REST API GET /api/v1/pods
메소드 를 사용하는 것입니다. API에 액세스하는 방법도 설명서에 나와 있습니다 .