[Swan-dev] my test run hung last night
D. Hugh Redelmeier
hugh at mimosa.com
Sat Sep 5 20:17:56 EEST 2015
My test run from last night is hung in ikev1-impair-gx-01/OUTPUT. No
progress for 12 hours.
east.pluto.log is 17618 lines long. Most looks like this:
| expiring aged bare shunts
| event_schedule called for 20 seconds
| event_schedule_tv called for about 20 seconds and change
| inserting event EVENT_SHUNT_SCAN, timeout in 20.000000 seconds
| handling event EVENT_PENDING_DDNS
| event_schedule called for 60 seconds
| event_schedule_tv called for about 60 seconds and change
| inserting event EVENT_PENDING_DDNS, timeout in 60.000000 seconds
| elapsed time in connection_check_ddns for hostname lookup 0.000000
| handling event EVENT_SHUNT_SCAN
I don't know why. Certainly west was a problem: the west.init?)
script failed to cd into the test directory and stopped.
At Paul's suggestion, I rebooted east and west, hoping that would get
the test run unstuck. It did not.
When I do a ps -laxgwf, I see:
0 1105 5868 4363 20 0 431820 19000 futex_ Sl+ pts/2 0:17 \_ swantest
4 1105 18674 5868 20 0 27796 7616 poll_s S+ pts/2 0:00 \_ /usr/sbin/tcpdump -w OUTPUT/swan12.pcap -i swan12 -s 0 -n not stp and not port 22
4 0 19041 5868 20 0 0 0 exit Zs ? 0:00 \_ [sudo] <defunct>
4 0 19044 5868 20 0 0 0 exit Zs ? 0:00 \_ [sudo] <defunct>
<defunct> should not happen. It means that the parent process isn't
doing reaping.
I tried to kill 18674 (tcpdump) but it would not die.
kill -9 and sudo kill -9 didn't work.
BTW, one new think is that before the run, I changed the ownership and
capabilities of tcpdump, as per the wiki page on testing.
What's going on? What should I do at this point? I assume that there
are interesting forensics possible if I don't just kill everything.
More information about the Swan-dev
mailing list