Purpose
The purpose of this article is to outline some of the options available as part of the glance plus (measureware) package. Specifically I intend to focus on glance's ability to gather real time stats in the background and output the results to text files. This data can then be saved and used for historical analysis. Typically you would use perfview for this type of reporting but glance allows a much smaller interval than the minimum 5 minutes for perfview.
Another reason one might want to do this is as a feed to MRTG. MRTG will graph practically anything that can be presented as a number. Using Glance scripts it would be pretty easy to setup a number of MRTG graphs for various performance metrics. MRTG would then automatically handle the daily, weekly, monthly and yearly views. Of course this would require an MRTG implementation on Dude, netcat for aix on dude, copies of the client scripts on every host reporting to mrtg and someone to decide what is important, write the scripts and establish the MRTG reporting hierarchy. Beyond th scope of this article.
First off - Documentation
The following links are links to various glance specific documents:
-
-
-
-
Now for a few examples.
Simulating SAR for disks
This example will show you how you can use glance to simulate the same output you would get with sar -d. There are some subtle differences as to how the values are calculated that I will not try and explain here mainly because I can't! :) Also, I personally would rather use sar for capturing disk data that glance but it is an excellent example of what you can do.
Script
# The following glance adviser disk loop shows disk activity comparable # to sar -d data.
# Note that values will differ between sar and glance because of differing # data sources, calculation methods, and collection intervals.
headersprinted = 0
# For each disk, if there was activity, print a summary: disk loop { if BYDSK_PHYS_IO_RATE > 0 then { # print headers if this is the first active disk found this interval: if headersprinted == 0 then { print "-------- -------- device %util queue r+w/s blks/s secs-avserv" headersprinted = 1 } # sar shows average service time in milliseconds: avserv = ( BYDSK_UTIL / 100 ) / BYDSK_PHYS_IO_RATE * 1000 # sar blks/s is 512-byte blocks per second (KB rate times 2): blks = BYDSK_PHYS_BYTE_RATE * 2 print GBL_STATDATE, " ", GBL_STATTIME, " ",BYDSK_DEVNAME|15, BYDSK_UTIL|7|2, BYDSK_REQUEST_QUEUE|8|2, BYDSK_PHYS_IO_RATE|8|0, blks|8|0, avserv|16|2 } }
if headersprinted == 0 then print GBL_STATTIME, " (no disk activity this interval)"
Output
-------- -------- device %util queue r+w/s blks/s secs-avserv 08/02/02 14:50:02 0/0/2/0.6.0 71.01 0.00 110 740 6.46 08/02/02 14:50:02 0/0/2/1.6.0 56.52 0.00 102 681 5.52 08/02/02 14:50:02 0/4/0/0.5.0 9.42 0.00 10 66 9.42 08/02/02 14:50:02 0/12/0/0.5.0 7.97 0.00 9 63 8.66 08/02/02 14:50:02 0/...29.0.2.1.3 1.44 0.00 24 443 0.61 08/02/02 14:50:02 0/...29.0.2.1.4 0.72 0.00 23 443 0.31 08/02/02 14:50:02 0/...29.0.2.2.0 0.72 0.00 1 25 10.29 08/02/02 14:50:02 0/...29.0.2.2.1 0.72 0.00 3 43 2.40 08/02/02 14:50:02 0/...29.0.2.2.3 0.72 0.00 2 74 3.13 08/02/02 14:50:02 0/...29.0.2.2.4 2.17 0.00 5 123 4.09 08/02/02 14:50:02 0/...29.0.2.2.5 0.72 0.00 20 591 0.36 08/02/02 14:50:02 0/...29.0.2.0.0 0.00 0.00 2 28 0.00 08/02/02 14:50:02 0/...29.0.2.3.4 0.00 0.00 1 98 0.00 -------- -------- device %util queue r+w/s blks/s secs-avserv 08/02/02 14:50:03 0/0/2/0.6.0 23.61 0.00 34 151 6.90 08/02/02 14:50:03 0/0/2/1.6.0 18.05 0.00 31 126 5.75 08/02/02 14:50:03 0/4/0/0.5.0 2.77 0.00 4 17 6.60 08/02/02 14:50:03 0/12/0/0.5.0 2.77 0.00 4 17 6.60 08/02/02 14:50:03 0/...29.0.2.1.3 1.38 0.00 40 800 0.35 08/02/02 14:50:03 0/...29.0.2.1.4 1.38 0.00 40 800 0.35 08/02/02 14:50:03 0/...29.0.2.2.0 2.77 0.00 1 46 19.79 08/02/02 14:50:03 0/...29.0.2.2.3 0.00 0.00 1 46 0.00 08/02/02 14:50:03 0/...29.0.2.2.4 0.00 0.00 3 91 0.00 08/02/02 14:50:03 0/...29.0.2.2.5 1.38 0.00 11 366 1.21
CPU Utilization - Averaged over # of CPU's
The following script will return the average CPU utilization (averaged over number of CPU's) for system, user and total utilization. It could easily be modified to be a feed into MRTG for graphing purposes.
Script
# # Sample glance script showing average CPU utilization across all CPU's #
headersprinted = 0 total_total = 0 total_sys = 0 total_user = 0 count = 0
# For each CPU cpu loop { # print headers if this is the first row if headersprinted == 0 then { print " Sys CPU User CPU Total CPU" headersprinted = 1 } total_total=total_total+GBL_CPU_TOTAL_UTIL total_sys=total_sys+GBL_CPU_SYS_MODE_UTIL total_user=total_user+GBL_CPU_USER_MODE_UTIL count = count + 1 } print total_sys/count, " ", total_user/count, " ", total_total/count
Output
# glance -j 5 -adviser_only -syntax cpu.cfg -iterations 3
Welcome to GlancePlus
Sys CPU User CPU Total CPU 8 30 38 Sys CPU User CPU Total CPU 5 31 35 Sys CPU User CPU Total CPU 4 24 28
Lan Statistics
This example will produce packet level statistics (in, out, collisions, errors...) for every lan interface in the server. I found it to be another useful example.
Script
# initialize variables:
netif_to_examine = "" # lan0 would only report on lan0, etc. headers_printed = headers_printed
netif loop { # print information for the selected interface or if null THEN all: IF (BYNETIF_NAME == netif_to_examine) or (netif_to_examine == "") THEN {
# print headers the first time through the loop: IF headers_printed == 0 THEN {
print "Date Time Interface InPkts OutPkts OutQ Colls Errs" print " "
headers_printed = 1
}
# print one line per interface reported:
print GBL_STATDATE, " ", GBL_STATTIME, " ", BYNETIF_NAME|8,
BYNETIF_IN_PACKET, BYNETIF_OUT_PACKET,
BYNETIF_QUEUE, BYNETIF_COLLISION, BYNETIF_ERROR
# (note that some interface types do not report collisions or errors)
}
} print " "
Output
# glance -j 5 -adviser_only -syntax lan.cfg -iterations 3
Welcome to GlancePlus Date Time Interface InPkts OutPkts OutQ Colls Errs 04/28/04 15:01:00 lan0 3 2 0 0 0 04/28/04 15:01:00 lan3 35 39 0 0 0 04/28/04 15:01:00 lan6 31 32 0 0 0 04/28/04 15:01:00 lan7 50 31 0 0 0 04/28/04 15:01:00 lan8 0 0 0 0 0 04/28/04 15:01:00 lan4 0 0 0 0 0 04/28/04 15:01:00 lan9 0 0 0 0 0 04/28/04 15:01:00 lan10 0 0 0 0 0 04/28/04 15:01:00 lan11 0 0 0 0 0 04/28/04 15:01:00 lan5 3 2 0 0 0 04/28/04 15:01:00 lo0 26 26 0 na 0 04/28/04 15:01:05 lan0 13 7 0 0 0 04/28/04 15:01:05 lan3 173 230 0 0 0 04/28/04 15:01:05 lan6 121 129 0 0 0 04/28/04 15:01:05 lan7 197 142 0 0 0 04/28/04 15:01:05 lan8 1 1 0 0 0 04/28/04 15:01:05 lan4 1 1 0 0 0 04/28/04 15:01:05 lan9 1 1 0 0 0 04/28/04 15:01:05 lan10 1 1 0 0 0 04/28/04 15:01:05 lan11 1 1 0 0 0 04/28/04 15:01:05 lan5 13 7 0 0 0 04/28/04 15:01:05 lo0 3 3 0 na 0 04/28/04 15:01:10 lan0 12 7 0 0 0 04/28/04 15:01:10 lan3 151 221 0 0 0 04/28/04 15:01:10 lan6 97 105 0 0 0 04/28/04 15:01:10 lan7 165 126 0 0 0 04/28/04 15:01:10 lan8 1 1 0 0 0 04/28/04 15:01:10 lan4 1 1 0 0 0 04/28/04 15:01:10 lan9 1 1 0 0 0 04/28/04 15:01:10 lan10 1 1 0 0 0 04/28/04 15:01:10 lan11 1 1 0 0 0 04/28/04 15:01:10 lan5 12 7 0 0 0 04/28/04 15:01:10 lo0 0 0 0 na 0
Detailed Process Gathering
This script proved very valuable when trying to identify Mobila processes running amok in the early days of Mobila. PV has the 5 minute limitation and that skewed things so if a process was bad but for a short period of time it got lost. You must be carefull with this example as it will generate a lot of data very quickly, depending on the interval. I.e. listing all processes every second is a lot of lines in a short period of time.
It would be pretty easy to modify this script to simply count the processes if you wanted to report back to MRTG the number of processes.
Script
process loop {
if ((proc_cpu_total_util > 0) or ( proc_stop_reason != "SLEEP" )) then { print gbl_statdate, "|", gbl_stattime, "|", proc_cpu_last_used, "|", proc_mem_virt, "|", proc_mem_res, "|", proc_cpu_total_util, "|", proc_stop_reason, "|", proc_disk_logl_io_rate, "|", proc_proc_id, "|", proc_parent_proc_id, "|", proc_user_name, "|", proc_proc_name, "|", proc_cache_wait_time, "|", proc_cdfs_wait_time, "|", proc_disk_subsystem_wait_time, "|", proc_disk_wait_time, "|", proc_graphics_wait_time, "|", proc_inode_wait_time, "|", proc_ipc_subsystem_wait_time, "|", proc_ipc_wait_time, "|", proc_jobctl_wait_time, "|", proc_lan_wait_time, "|", proc_mem_wait_time, "|", proc_msg_wait_time, "|", proc_nfs_wait_time, "|", proc_other_io_wait_time, "|", proc_other_wait_time, "|", proc_pipe_wait_time, "|", proc_pri_wait_time, "|", proc_rpc_wait_time, "|", proc_sem_wait_time, "|", proc_socket_wait_time, "|", proc_stream_wait_time, "|", proc_sys_wait_time, "|", proc_term_io_wait_time } }
Output
gbl_statdate | gbl_stattime | proc_cpu_last_used | proc_mem_virt | proc_mem_res | proc_cpu_total_util | proc_stop_reason | proc_disk_logl_io_rate | proc_proc_id | proc_parent_proc_id | proc_user_name | proc_proc_name | proc_cache_wait_time | proc_cdfs_wait_time | proc_disk_subsystem_wait_time | proc_disk_wait_time | proc_graphics_wait_time | proc_inode_wait_time | proc_ipc_subsystem_wait_time | proc_ipc_wait_time | proc_jobctl_wait_time | proc_lan_wait_time | proc_mem_wait_time | proc_msg_wait_time | proc_nfs_wait_time | proc_other_io_wait_time | proc_other_wait_time | proc_pipe_wait_time | proc_pri_wait_time | proc_rpc_wait_time | proc_sem_wait_time | proc_socket_wait_time | proc_stream_wait_time | proc_sys_wait_time | proc_term_io_wait_time 08/02/02|14:50:00| 0| 32kb| 32kb| 0.0|OTHER | 0.0| 8| 0|root |supsched | 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 08/02/02|14:50:00| 0| 32kb| 32kb| 0.0|OTHER | 0.0| 9| 0|root |strmem | 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 08/02/02|14:50:00| 0| 32kb| 32kb| 0.0|OTHER | 0.0| 10| 0|root |strweld | 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 08/02/02|14:50:00| 0| 32kb| 32kb| 0.0|OTHER | 0.0| 11| 0|root |strfreebd | 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 08/02/02|14:50:00| 0| 32kb| 32kb| 0.0|OTHER | 0.0| 24| 0|root |lvmschedd | 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 08/02/02|14:50:00| 2| 32kb| 32kb| 0.0|STRMS | 0.0| 25| 0|root |smpsched | 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00 08/02/02|14:50:00| 0| 32kb| 32kb| 0.0|STRMS | 0.0| 26| 0|root |smpsched | 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00 08/02/02|14:50:00| 0| 32kb| 32kb| 0.0|STRMS | 0.0| 27| 0|root |smpsched | 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00 08/02/02|14:50:00| 1| 32kb| 32kb| 0.0|STRMS | 0.0| 28| 0|root |smpsched | 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00 08/02/02|14:50:00| 1| 32kb| 32kb| 0.0|OTHER | 0.0| 29| 0|root |sblksched | 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 08/02/02|14:50:00| 2| 32kb| 32kb| 0.0|OTHER | 0.0| 30| 0|root |sblksched | 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 08/02/02|14:50:00| 2| 1.8mb| 88kb| 0.0|OTHER | 0.0| 575| 1|root |ptydaemon | 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.75| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00
Not so pretty - cut back on the info to report on and it cleans up nicely... :) |