最近经常收到一台服务器的Zabbix告警,内容如下:

文件描述符详解(快速找到每个进程打开的文件描述符数目)(1)

文件描述符过多告警

容易看到是服务器打开的文件描述符过多,导致告警触发。这台服务器运行的程序较多,找出到底是哪个程序打开了过量的文件描述符不是那么容易,编写一个脚本来帮助我们揪出问题程序:

#!/bin/bash cd /proc for pid in [0-9]* do echo "PID = $pid with $(ls /proc/$pid/fd/ | wc -l) file descriptors" done | sort -rn -k5 | head | while read -r _ _ pid _ fdcount _ _ do command=$(ps -o comm -p "$pid" --no-headers) printf "pid = ] with M fds: %s\n" "$pid" "$fdcount" "$command" done

脚本运行的结果:

[aneirin@host~]$sudo ./print-fds.sh pid = 15837 with 100283 fds: java pid = 2917 with 3097 fds: mysqld pid = 5990 with 1246 fds: java pid = 35294 with 487 fds: java pid = 31002 with 145 fds: java pid = 33494 with 69 fds: java pid = 3279 with 65 fds: java pid = 22108 with 62 fds: redis-server pid = 3090 with 27 fds: memcached pid = 2420 with 23 fds: modem-manager

看到PID号为15837的程序打开了过多的文件描述符,此时可以协调程序作者或者紧急通过systemd限制该程序所能打开的最多文件描述符,这个告警便可以排除。

,