Got into state where I get "su: no shell" when suing to any user
I have a Solaris10/X86 VM that I've been given that I can use to run a WebLogic 10.3.3 app that I'm writing. I went through a series of steps configuring it and got into a bolloxed state that I don't understand. I've had a handful of experienced Solaris admins look at this and they've all given up.
After I got my login, I installed JDK 1.6 and WebLogic 10.3.3 as myself and installed and tested my app. No problems.
I then created a "service account" with "groupadd", "useradd", and "passwd -N" (prevents login). I was able to su to it.
I then installed JDK 1.6 and WebLogic 10.3.3 in /opt and changed my env and path references in both my login and the service account to point there.
I then started setting up two SMF services so I can start the nodemanager and admin server at startup, owned by the service account. I validated them up to a point, but I didn't fully test them.
It was about this point when I discovered that I could no longer log into the box as myself. When I used PUTTY to log in, the window just disappeared after I entered my password. When I tried to "su" to myself from the root login, it just said "su: no shell". I still had a window that was logged in as myself, and I noticed that it would fail to execute any command but shell builtins. I don't remember exactly what error it gave, but it was something like "not found". I don't have that window anymore, and I can't get to that state anymore. It did give me the impression that the "next level" cause for "su: no shell" was simply that it couldn't execute the shell.
We also tried "su"ing to every account in /etc/passwd, and they all failed with the same error. The only user we can "su" to is root.
I'm sure you will first ask about the permissions on /bin/bash. That's the first thing everyone has asked.
% ls -lt /bin/bash
lrwxrwxrwx 1 root root 19 Nov 30 2010 /bin/bash -> /usr/local/bin/bash*
% ls -lt /usr/local/bin/bash
-rwxr-xr-x 1 bin bin 2099560 Feb 23 2009 /usr/local/bin/bash*
The experienced admins that have looked at this tried numerous other ideas, but they've all found no clue why this is happening.
Fortunately, this is just a VM, so they can reimage it, and I'm sure it will be easier the second time to execute all the steps I did before, but I'd sure like to figure out what is wrong without having to do it again. I'm also not certain that this won't just happen again. If we don't figure this out, we'll probably take a couple of image snapshots along the way, just in case.