View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0000623||tcsh||general||public||2017-07-03 08:58||2017-07-03 08:58|
|Target Version||Fixed in Version|
|Summary||0000623: Unexpected high CPU load after tcsh version 6.13|
We use in production a single file setting many environment variables on csh solaris (never had any problems) and recently started deploying on tcsh linux.
With version tcsh 6.13 all is "normal". As of version 6.14 and up to latest 6.20, we see a very high CPU load just by sourcing the script (basically more than a full CPU used just by a one-liner heart beat script called randomly by twenty hearbeating processes). You can see some xload screenshots attached to compare.
We mitigated by removing calls from $HOME/.cshrc or using !#/bin/csh -f when possible, but we still see an overall high CPU load in all scripting (which is important).
Using the "perf" kernel profiler, I think I have narrowed down the culprit to gconv().
If I recompile tcsh 6.14 or 6.20 removing #ifdef HAVE_ICONV from config.h and #define WIDE_CHAR (and UTF_16 for 6.20) from config_f.h
then the CPU stays at nearly zero as expected even in "worst" conditions (even with sourcing in $HOME/.cshrc and without using -f ).
Of course I can not say if it is a gconv() bug, a bad configuration on all 5 machines (which perfectly possible), or the way it is called.
I think that gconv() is part of glibc, so here are the versions I have depending on the machine. I did not check them all with tcsh-620 + iconv deactivated.
libc-2.12.so (on two different RH like VMs)
libc-2.7.so (on one RH derivative hardware server)
libc-2.19.so (debian VM)
libc-2.24.so (ubuntu VM)
Direct compilation of a tcsh 6.20 with iconv enabled and crazy cpu: perf report -g graph -i xyz.perf.data
+ 22.86% heartbeat.csh ISO8859-15.so [.] gconv
+ 12.86% heartbeat.csh libc-2.12.so [.] __GI_rtwcomb
+ 12.38% heartbeat.csh [kernel.kallsyms] [k] 0xffffffff811807f9
8.81% heartbeat.csh libfreebl3.so [.] 0x000000000000a8bd
+ 4.52% heartbeat.csh tcsh [.] one_wctomb
+ 4.05% heartbeat.csh tcsh [.] short2str
3.33% prelink [kernel.kallsyms] [k] 0xffffffff811a9926
+ 3.33% heartbeat.csh tcsh [.] btell
2.86% id [kernel.kallsyms] [k] 0xffffffff811585c2
+ 2.14% heartbeat.csh libc-2.12.so [.] wctomb
With iconv disabled, and no cpu problems:
+ 19.06% heartbeat.csh tcsh-620-noiconv [.] short2str
+ 13.38% heartbeat.csh [kernel.kallsyms] [k] 0xffffffff81297385
+ 11.37% heartbeat.csh libc-2.12.so [.] _int_malloc
+ 10.70% heartbeat.csh libfreebl3.so [.] 0x0000000000013887
+ 5.35% heartbeat.csh libc-2.12.so [.] memcpy
The file /usr/lib/locale/locale-archive on all machines was around 100Mb, shrinking it to 3Mb by keeping only en_ and fr_ locales makes no difference.
I have been able to reproduce on:
1) three physical servers, all of them Red Hat derivatives (RH6.5, Fedora Core, CentOS)
2) four VMs, including a Ubuntu and a Debian
Any help appreciated.
|Steps To Reproduce||I wrote a few shell scripts simulating the heartbeating daemons. The file HOW-TO-REPRODUCE-LOAD-PB.TXT should explain how to set up, but basically, untar the scripts in your $HOME which will create a HL_TCSH/ directory/ then check that command :|
Add it to your $HOME/.cshrc in comment.
Launch "poc_launch_daemons.csh 20" to simulate 20 heartbeating processes and wait for load to stabilize.
Then uncomment the "source $HOME/HL_TCSH/logicals.csh" in your .cshrc and watch load go up after some 30 seconds.
You might have to unset LS_COLORS to test with tcsh lower than 6.15.
|Additional Information||1) the two load graphs compare the exact same script "poc_launch_daemons.csh 20" when changing ONLY the symbolic link /bin/tcsh from tcsh-6.13 binary to tcsh-6.18 or tcsh-6.20.|
2) All variables in my env. Trying different combinations of LANG and/or LC_ALL did not make any difference.
SSH_CLIENT=192.168.12.33 58840 22
SSH_CONNECTION=192.168.12.33 58840 192.168.4.83 22
|Tags||No tags attached.|