Hi everyone,
Another issue with Solaris 11.3 (intel/amd)... users are getting the error "No space left on device" when they login, or open a new session.
The error seems to occur at random, but once it starts, it is persistent. When I look at the OS stats, I see the following:
Memory: 16G phys mem, 400M free mem, 2048M total swap, 1649M free swap
Filesystem Size Used Available Capacity Mounted on
swap 382M 64M 318M 17% /tmp
pool_01/home 7.8T 176M 1.8T 1% /export/home
# swap -lh
swapfile dev swaplo blocks free
/dev/zvol/dsk/rpool/swap 309,1 4K 2.0G 1.6G
I can't find any reason where memory would be constrained. The users home directory and /tmp (swap) has plenty of space in them. There are no full filesystems anywhere on the system.
This condition, of course, not only affects the users shell, but also their ability to run certain applications. One oddity is that the root user does not suffer from this condition.
From what I can tell, it seems to be linked to the amount of space in /tmp, but when I look at /tmp, there is plenty of space there. /tmp is linked to swap... which indicates 1.6GB free and 400M of free RAM.
The error presents itself as soon as the user logs in. For example:
Last login: Sat Feb 20 00:16:49 2016 from alpha1
Oracle Corporation SunOS 5.11 11.3 September 2015
[alpha1] adamh:/home/adamh> -ksh: line 1: write to 1 failed [No space left on device]
When I run a truss on a new session, I see these kinds of errors:
12074: 2.4353 write(1, "1B [ A", 3) | Err#28 ENOSPC |
ksh: line 1: write to 1 failed [No space left on device]
12074: 2.4354 write(2, " k s h : l i n e 1 :".., 57) | = 57 |
and
12074: 1.8532 write(1, " x x a 0 1\n", 6) | Err#28 ENOSPC |
/root/.kshrc: line 3: write to 1 failed [No space left on device]
12074: 1.8535 write(2, " / r o o t / . k s h r c".., 66) | = 66 |
It seems as though it gets into a condition to where the shell (ksh) tries to write a value (ie. PS1) or some other internal shell value to the screen, it is failing with a ENOSPC. If I change the users shell to bash, csh or something else, the error goes away. So, I thought maybe the shell was starting up something that was causing the problem, but even if I remove the users .profile and .kshrc, ksh still produces the error. The error occurs with ksh via su as well... with or without the dash. Please note, the root user also has ksh as its shell and does not suffer from these errors. All non-root users with ksh are affected.
ksh looks like this:
$ ls -li /bin/ksh /usr/bin/ksh
12323 -r-xr-xr-x 8 root | bin | 2558824 Oct 6 11:59 /bin/ksh |
12323 -r-xr-xr-x 8 root | bin | 2558824 Oct 6 11:59 /usr/bin/ksh |
bc99c166dc1f95cd4287366acc8363d4 /usr/bin/ksh
I've run out of ideas as to where to look for the problem. If any of you have any suggestions, they would be greatly appreciated.
Thanks to you all in advance!