This article looks into some of the steps necessary to
locate and tune DB2 application performance problems. This bulletin
will:
- Discuss the tools and structures needed for DB2 tuning.
- Address the identification of DB2 performance problems, and the
initial reports and processes used for this identification.
- Explain some of the detailed investigation required to alleviate
performance problems.
DB2 Tuning Information Sources.
Prior to any DB2 tuning, there must be a review of the performance
information that can pinpoint the problem. DB2 sub-systems have
facilities to gather information and produce DB2 TRACE records which are
written to either SMF or GTF destinations. The normal destination for
performance-related informational records is SMF. DB2 creates three
common SMF record identifiers: 100, 101 and 102. DB2 assigns multiple
classes to these record types. These classes are specified in DSNZPARM
or on the START TRACE command, and tell DB2 what level of detail to
accumulate.
SMF Records.
SMF 100 records are called Statistics records and contain DB2
system-wide information. Four classes are associated with statistics
records. Since very minimal overhead is associated with statistics
records, the Rule-of-Thumb is to turn on statistics classes 1, 3, 4 and
5. These classes contain the following information:
- Class 1 – These records are written every "N"
minutes (as specified in DSNZPARM) and show summary information
related to all DB2 activity during this time interval.
- Class 3 – These records are written for every deadlock or
timeout and contain information related to these incidents.
- Class 4 – These records are written during DB2 exceptional
conditions and contain diagnostic information.
- Class 5 – These records contain information related to Data
Sharing.
SMF 101 records are called Accounting records and contain the
individual DB2 thread-related information used to initiate any
performance tuning project. The Rule-of-Thumb for accounting trace
classes is to turn on 1, 2, and 3 (if you’re using DB2 packages and
would like package level accounting information, activate classes 7 and
8). The following explains the normal accounting classes:
- Class 1 – This record contains basic accounting information.
- Class 2 – This record contains "in-DB2" time.
Without this class it’s impossible to determine if the problem resides
in DB2 or in another resource or product.
- Class 3 – This record contains wait time. This is important if
the problem is related to buffer pools, CPU overload, DASD, or other
factors that contribute to wait time.
- Class 7 – This record contains Package/DBRM basic accounting
information.
- Class 8 – This record contains Package/DBRM level wait time
information.
SMF 102 records are called Performance records and contain detailed
performance-related information. These records are collected for very
detailed problem determination. Beware, some of the performance traces
seriously affect DB2 performance.
Other Performance Data.
Another important source of tuning information is DB2 EXPLAIN output.
This information describes the methods that DB2 uses to retrieve
answer-sets. It’s very important to always run EXPLAIN when binding
plans or packages. It’s also very important to retain historical data
of EXPLAIN output. This history is vital in determining if a performance
issue is related to data retrieval method changes.
The final base information that’s useful for DB2 performance tuning
is RMF data. The RMF information can be especially useful if the problem
is CPU resource or DASD contention. RMF helps to pinpoint the specific
problem for these types of constraints.
Identify and Isolate Performance Problems.
Where should you start if it appears that there are DB2 application
performance issues? The first thing to do is identify that there truly
is a problem. This may be recognized by:
- A user’s complaint that things are taking too long.
- A performance monitor.
- A proactive (normally user written) accounting information
analysis application.
Isolate the Cause.
Once a problem is recognized, the 80/20 rule should be used to
identify the small percentage of DB2 transactions that represent the
majority of the total workload. These transactions will give the
greatest payback when they’re tuned. During this phase of the
investigation ask questions such as: is this problem related to a single
transaction or job or user, or does this problem only appear at certain
times of the day? Getting the answers to questions like these will narrow
the list of "offending" transactions to a manageable number.
DB2PM Short Accounting Report.
During the tuning project’s fact-finding phase, the normal starting
point is to review the DB2PM Short Accounting Report. This report
provides indications of resources used. The short accounting report will
identify:
- DB2 response time (elapsed time).
- Resources used (processor and I/O).
- Lock suspensions.
- Application code changes (via the SQL used fields).
- Wait times (processor, I/O wait or lock wait).
These fields identify potential problems. There is one warning when
using the Short Accounting Report: this report is the average of all
executions of a plan or package for the reporting interval. Averages can
have the effect of hiding some very long running executions of the
plan/package if 1) there are many executions of the plan/package during
this period and 2) most of the executions are of very short duration.
Again, look for the plan/package(s) that appear to be using the majority
of resources during the reporting interval.
Detailed Performance Investigation.
Once the "problem" plan/package has been identified, it’s
time for a detailed review of the information related to this
plan/package. The first place to look is the EXPLAIN output. Run a
stand-alone EXPLAIN to determine the current access path that DB2 is
using. Then compare this information to the EXPLAIN output (accumulated
in your PLAN_TABLE) as it relates to the last bind that was run for this
plan/package. If the access path DB2 is using is incorrect based on how
you think DB2 should be retrieving information, check for things such as
missing statistics in the DB2 catalog or changes in SQL calls.
DB2PM Long Accounting Report.
If the DB2 access path review doesn’t reveal the solution, examine
the DB2PM Long Accounting Report. This report gives detailed information
regarding plan/package execution. Following are a few of the key pieces
of information presented in this report:
- Class 1 Elapse Time – Compare this value to CICS or IMS
transit times. They should be close, but not necessarily identical
since DB2 time doesn’t include:
- Time before the first SQL statement.
- DB2 create thread time.
- Time after the DB2 terminate thread.
This helps identify whether the problem is related to DB2 or some
"outside" cause.
- Not-in-DB2 Time – This is the calculated difference between
Class 1 and Class 2 elapse time. If time spent outside DB2 (but within
the DB2 accounting interval) is lengthy, the problem will be found in
the application, CICS, IMS, or the overall system, and not within DB2.
- Lock/Latch Suspension Time – This value shows contention for
DB2 resources. Check the "Locking Summary" section of this
report for additional information, then proceed to the Locking Reports
for help.
- Synchronous I/O Suspension Time – This is total application
wait time for DB2 synchronous I/Os. If the number of I/Os is high,
check for:
- A change in access path.
- Application code changes.
- System-wide DB2 bufferpool problems.
- RID pool failures.
- System-wide EDM pool problems.
If I/O time is greater than expected, check for I/O contention. A
Synchronous read should take from 15-25 milliseconds, depending on the
DASD device. If this value is longer, use RMF to check for DASD
contention.
- Asynchronous Read Suspensions – This is the accumulated
time for read I/O done under a thread other than this thread. It
includes time for Sequential prefetch, List prefetch, Sequential
detection or Synchronous read performed by another thread. The
Rule-of-Thumb for Sequential prefetch or Sequential detection
(asynchronous I/O) is 1 to 2 milliseconds per page. The
Rule-of-Thumb for List prefetch is 3-4 milliseconds per page. Check
"Other Read I/O" to locate value.
- Not-Accounted-For DB2 Time – This is accounting class 2
time that is not part of class 2 CPU or class 3 suspensions, and is
normally due to MVS paging, processor wait time or time spent
waiting for parallel tasks to complete. Check the "Not
Account" field for this value.
There are many places to look and many causes for DB2
performance-related issues. This article has been a good starting point
for tuning DB2. You will find that the better you understand how DB2
functions, the easier performance tuning becomes. For additional
information on DB2 application tuning, go to some of the following
information sources:
Disclaimer.
The opinions in this article are solely those of the author, and the
information herein is to be taken "as-is".
|