最近经常收到一台服务器的Zabbix告警,内容如下:
文件描述符过多告警
容易看到是服务器打开的文件描述符过多,导致告警触发。这台服务器运行的程序较多,找出到底是哪个程序打开了过量的文件描述符不是那么容易,编写一个脚本来帮助我们揪出问题程序:
#!/bin/bash
cd /proc
for pid in [0-9]*
do
echo "PID = $pid with $(ls /proc/$pid/fd/ | wc -l) file descriptors"
done | sort -rn -k5 | head | while read -r _ _ pid _ fdcount _ _
do
command=$(ps -o comm -p "$pid" --no-headers)
printf "pid = ] with M fds: %s\n" "$pid" "$fdcount" "$command"
done
脚本运行的结果:
[aneirin@host~]$sudo ./print-fds.sh
pid = 15837 with 100283 fds: java
pid = 2917 with 3097 fds: mysqld
pid = 5990 with 1246 fds: java
pid = 35294 with 487 fds: java
pid = 31002 with 145 fds: java
pid = 33494 with 69 fds: java
pid = 3279 with 65 fds: java
pid = 22108 with 62 fds: redis-server
pid = 3090 with 27 fds: memcached
pid = 2420 with 23 fds: modem-manager
看到PID号为15837的程序打开了过多的文件描述符,此时可以协调程序作者或者紧急通过systemd限制该程序所能打开的最多文件描述符,这个告警便可以排除。
,