Текст статьи скопирован отсюда: http://www.ibm.com/developerworks/aix/library/au-unix-perfmonsar.html
Users seem to remember performance problems some time after they occur. Ignoring the "If
it wasn't important then, why is it important now?" question that you long to ask, the question
then becomes, "What was the condition of the system at the time of the alleged problem?" By
periodically taking performance snapshots and reviewing the data, you're one step closer to
pinpointing the cause of the problem and creating a solution.
Collecting data
The SAR suite of utilities is bundled with your system (in fact, it
is installed on most flavors of UNIX®), but probably not enabled. To
enable SAR, you must run some utilities at periodic intervals through
the cron facility. Use the crontab -e command while running as the root user, and then provide the configuration shown in Listing 1.
Listing 1. Run crontab for the root user to enable the SAR collection
# Collect measurements at 10-minute intervals
0,10,20,30,40,50 * * * * /usr/lib/sa/sa1
# Create daily reports and purge old files
0 0 * * * /usr/lib/sa/sa2 -A
|
The first command, sa1 , is a shell script that calls sadc to collect the performance data in a binary log file. The sa1 command also ensures that each day has its own file, which I explain in the Timing is everything section. Run this command every ten minutes, which is a good tradeoff between granularity and system impact.
The second command, sa2 , is another shell script that
dumps all the data from the current day's binary log file into a text
file, and then purges any log files older than seven days. The -A
argument specifies what is extracted from the binary file into the text
file. Although you can read the text file to see the status of the
system for the day, I show you how to query the binary log files to be
more precise.
Back to top Extracting useful information
Data is being collected, but it must be queried to be useful. Running the sar command without options generates basic statistics about CPU usage for the current day. Listing 2 shows the output of sar without any parameters. (You might see different column names depending on the platform. In some UNIX flavors, sadc
collects more or less data based on what's available.) The examples
here are from Sun Solaris 10; whatever platform you're using will be
similar, but might have slightly different column names.
Listing 2. Default output of sar (showing CPU usage
-bash-3.00$ sar
SunOS unknown 5.10 Generic_118822-23 sun4u 01/20/2006
00:00:01 %usr %sys %wio %idle
00:10:00 0 0 0 100
. cut ...
09:30:00 4 47 0 49
Average 0 1 0 98
|
Each line in the output of sar is a single
measurement, with the timestamp in the left-most column. The other
columns hold the data. (These columns vary depending on the command-line
arguments you use.) In Listing 2, the CPU usage is broken into four categories:
- %usr: The percentage of time the CPU is spending on user processes, such
as applications, shell scripts, or interacting with the user.
- %sys: The percentage of time the CPU is spending executing kernel tasks. In
this example, the number is high, because I was pulling data from the kernel's random
number generator.
- %wio: The percentage of time the CPU is waiting for input or output from a
block device, such as a disk.
- %idle: The percentage of time the CPU isn't doing anything useful.
The last line is an average of all the datapoints. However, because most systems experience
busy periods followed by idle periods, the average doesn't tell the entire story.
Watching disk activity
Disk activity is also monitored. High disk usage means that there
will be a greater chance that an application requesting data from disk
will block (pause) until the disk is ready for that process. The
solution typically involves splitting file systems across disks or
arrays; however, the first step is to know that you have a problem.
The output of sar -d shows various disk-related statistics for
one measurement period. For the sake of brevity, Listing 3 shows only hard disk drive activity.
Listing 3. Output of sar -d (showing disk activity)
$ sar -d
SunOS unknown 5.10 Generic_118822-23 sun4u 01/22/2006
00:00:01 device %busy avque r+w/s blks/s avwait avserv
. cut ...
14:00:02 dad0 31 0.6 78 16102 1.9 5.3
dad0,c 0 0.0 0 0 0.0 0.0
dad0,h 31 0.6 78 16102 1.9 5.3
dad1 0 0.0 0 1 1.6 1.3
dad1,a 0 0.0 0 1 1.6 1.3
dad1,b 0 0.0 0 0 0.0 0.0
dad1,c 0 0.0 0 0 0.0 0.0
|
As in the previous example, the time is along the left. The other columns are as follows:
- device: This is the disk, or disk partition, being
measured. In Sun Solaris, you must translate this disk into a physical
disk by looking up the reported name in /etc/path_to_inst, and then
cross-reference that information to the entries in /dev/dsk. In Linux®,
the major and minor numbers of the disk device are used.
- %busy: This is the percentage of time the device is being read from or written to.
- avque: This is the average depth of the queue that is
used to serialize disk activity. The higher the avque value, the more
blocking is occurring.
- r+w/s, blks/s: This is disk activity per second in terms of read or write operations and
disk blocks, respectively.
- avwait: This is the average time (in milliseconds) that a disk read or write operation
waits before it is performed.
- avserv: This is the average time (in milliseconds) that a disk read or write operation
takes to execute.
Some of these numbers, such as avwait and avserv values, correlate directly
into user experience. High wait times on the disk likely point to several people contending for the disk,
which should be confirmed with high avque numbers. High avserv values
point to slow disks.
Other metrics
Many other items are collected, with corresponding arguments to view them:
- The
-b argument shows information on buffers and the efficiency of using a buffer versus having to go to disk. - The
-c argument shows system calls broken down into some of the popular calls, such as fork() , exec() , read() , and write() .
High process creation can lead to poor performance and is a sign that
you might need to move some applications to another computer. - The
-g , -p , and -w arguments show paging (swapping) activity. High paging is a sign of memory starvation. In particular, the -w
argument shows the number of process switches: A high number can mean
too many things are running on the computer, which is spending more time
switching than working. - The
-q argument shows the size of the run queue, which is the same as the load average for the time. - The
-r argument shows free memory and swap space over time.
Each UNIX flavor implements its own set of measurements and command-line arguments for sar . Those I've shown are common and represent the elements that I find more useful.
Back to top Timing is everything
The examples thus far have shown the current day's data, which has its uses, but it also has two
problems:
- You're interested in an hour of data, but you get the whole day.
- You need to go back to a different day.
As you saw earlier, sa1 saves the data in a different file for each day. Looking at the sa1
script itself tells you which directory is used; in the case of Sun
Solaris 10, it is in /var/adm/sa. Several files reside in this
directory, starting with either "sa" or "sar" followed by a number. The
number represents the day of the month, with the files beginning with
"sar" being text dumps of the data for that day (created by the nightly
run of sa2 ) and the files beginning with "sa" holding the
binary version. Indeed, the file containing the current date is the file
that is being read from when you launch sar .
Specifying -f to the sar command selects
the file to read from. If today were the 23rd day of the month, I could
look at yesterday's data by reading from sa22 with the command sar -f /var/adm/sa/sa22 . You can also pass the other arguments I showed you to access different types of data.
The second thing you can do to narrow the scope of the query is to specify the time by using the -s and -e arguments (think start and end). Note that -s is not inclusive, so you must subtract an extra ten minutes from the chosen start time. Continuing with the previous example, Listing 4 shows swap file usage and the run queue for the 22nd from 2:30 p.m. to 3:00 p.m.
Listing 4. A complex sar query specifying date, time, and multiple data sets
# sar -f /var/adm/sa/sa22 -s 14:20 -e 15:00 -w -q -i 4
SunOS unknown 5.10 Generic_118822-23 sun4u 01/22/2006
14:20:00 swpin/s bswin/s swpot/s bswot/s pswch/s
14:30:00 0.00 0.0 0.00 0.0 140
14:40:01 0.00 0.0 0.00 0.0 144
14:50:01 0.00 0.0 0.00 0.0 140
15:00:00 0.00 0.0 0.00 0.0 139
Average 0.00 0.0 0.00 0.0 140
14:20:00 runq-sz %runocc swpq-sz %swpocc
14:30:00 10.5 100 0.0 0
14:40:01 10.5 100 0.0 0
14:50:01 10.4 100 0.0 0
15:00:00 10.5 100 0.0 0
Average 10.5 100 0.0 0
|
Back to top Making sense of it all
A brief look at Listing 4
shows that swap activity was NIL, approximately 140 process switches
per second occurred, and the load average was slightly more than ten.
Assuming that you were investigating a claim of poor performance at the
time, what does this tell you?
- Whatever process is running isn't memory intensive, because you don't see swapping.
- Chances are that this problem is caused by a long-running set of
processes, because the run queue and process switches are relatively
consistent. Had they not been, you could suspect application-level
problems, such as a busy Web server.
- Knowing that the output of Listing 3 shows part of the same time period, you can see that one of the disks was being used heavily (31 percent according to
sar -b ,
but also 16,000 blocks per second). This disk is the home directory
partition; depending on what the user was trying to do, he or she might
have experienced slow responses.
A quick look at the CPU usage for the time period shows that the
system took up approximately 80 percent of the CPU; the rest was
consumed by user tasks. As the systems administrator, you can use this
information in three ways:
- Go back over previous days' logs. In this case, I found that the problem started at 1:00 p.m. and
ended the next morning.
- Try to correlate the activity to any
cron jobs that might have been
started that day. - Try to find a trend. Looking at data from a couple of other days, I saw that the performance
was normal, which isn't indicative of a system that has reached its limits.
In this case, the problem seemed to be isolated, and for good reason -- I was intentionally running the
disks with shell scripts to create some interesting sar reports!
However, had a trend appeared, such as busy home drives during working hours, it would have been a
call to do something about the problem. Possible solutions range from splitting home directories off to
other disks, installing faster disks, or moving to something like Network Attached Storage (NAS).
Back to top Conclusion
Obtaining qualitative data about your system at periodic intervals is an effective way of finding
performance bottlenecks and determining whether further action is needed. SAR and related utilities do
just this -- snapshots are taken every ten minutes and a front end allows you to access this data.
Though tactical in nature, a wealth of information is provided that enables systems administrators to
discover just what aspect of the system is suffering and whether it requires further investigation.
Resources Learn - SAR runs on most flavors of UNIX, including
AIX®, HP-UX, and
Linux.
- Stay current with developerWorks technical events and Webcasts.
- The UNIX Insider Perfomance Q&A column has some valuable advice on performance-tuning Solaris, including more interpretation of
sar results.
- If you liked
sar , you might also like iostat and
vmstat , which let you dig into current system activity in more depth. The Solaris System Adminstration Guide outlines these tools' use along with more information on sar . Like sar , most of this information applies to other flavors of UNIX.
- I've written about using
vmstat to watch current activity for Linux, which also applies to systems such as AIX, Solaris, and HP-UX.
Get products and technologies -
Build your next development project with
IBM
trial software, available for download directly from developerWorks.
-
SarCheck® has a commercial offering built around SAR
that provides a graphical view of the data. A free
evaluation is available.
|