Once all the considerations on "Application side diagnostics" are applied, the next step is to look into the settings of ALTIBASE HDB.
Log file I/O contention
ALTIBASE HDB makes use of WAL (Write Ahead Logging) protocol which requires that all modifications are written to a transaction log (logfile) before they are applied to DBMS. If you want to enhance the performance of the application, you will need to check the state of log files.
There are two threads related to the state of log files. The first one is "logfile prepare thread" which allocates new memory in order to write a redo log. The second one is "log write thread" which applies all modifications into DBMS.
Once the speed of "log write thread" exceeds "logfile prepare thread", "log write thread" has to wait for "logfile prepare thread" to prepare log files. This symptom is called as "logfile IO contention".
ALTIBASE HDB provides a parameter called "PREPARE_LOG_FILE_COUNT"(default value is 5) to address this issue. Increasing this value sufficiently can prevent the problem in advance.
You can monitor the state of logfile as follows:
If the value of LF_PREPARE_WAIT_COUNT is greater than 1 from the result of the above query, you need to increase the value. The value can be estimated as following.
Double the parameter value of "PREPARE_LOG_FILE_COUNT", then see the value of LF_PREPARE_WAIT_COUNT is still larger than 1.
If so, double the value again and see the result until the value reaches 100.
If LF_PREPARE_WAIT_COUNT is still over 1, it means that the performance of disk is slow. Therefore, you have to consider changing the disk to resolve the problem.
The fuzzy and ping-pong based implementation of ALTIBASE HDB checkpointing does not impact transaction response times. However, if the log files and data files are physically on the same file system there may be some read and write I/O contention.
In another case, If buffered I/O mechanisms of the Operating System (OS) is used, most of I/O the traffic resides in the OS buffer. Once OS flushes all data onto disk area, the performance again may fluctuate at that time. If buffered I/O of the OS causes performance degradations, you may want to consider changing it to DIRECT I/O. Please, refer to the OS manual how to change this setting.
The best approach to deal with such I/O contentions is to separate the physical location of the log files and datafiles.
ALTIBASE HDB provides the following parameters to scatter I/O writes.
- CHECKPOINT_BULK_WRITE_PAGE_COUNT 0 => 100
- CHECKPOINT_BULK_WRITE_SLEEP_SEC 0 => 0
- CHECKPOINT_BULK_WRITE_SLEEP_USEC 0 => 5000
- CHECKPOINT_BULK_SYNC_PAGE_COUNT 3200 => 100
We highly recommend that the number of MULTIPLEXING_THREAD_COUNT be twice or four times bigger than the number of CPU cores. The best-suited value must be determined by testing with the application.
It is best to have the number of "DEDICATED" thread same as "# of IPC"
Having "DEDICATED" value greater than # of IPC means a long running query may exist, so you will need to locate and tune it.
The number of "SOCKET" thread might grow up to the value of MULTIPLEXING_MAX_THREAD_COUNT(default 1024). If the value of "SOCKET" is bigger than the value of MULTIPLEXING_THREAD_COUNT, again there may exist some long running queries.
A long running query slows down the performance and takes much CPU usage in the system.
Memory Ager (Garbage Collector - GC)
The Memory Ager (GC) reclaims garbage or memory occupied by objects that are no longer in use by the application - the committed old versions by transactions. The GC works independently formed by threads and is consisted with MEM_LOGICAL_AGER and MEM_DELTHR. The former mostly clears unused "index pages" while the latter does unused "data pages", respectively. If GC can not process the work fast enough, it affects the performance of transactions. The reason is that transactions may look up not only the current version of record and index pages, but also the old versions of them.
ALTIBASE HDB provides the performance view tables to monitor the state of GC.
If you get increasing values of GCGAP while executing the query above, there may exist some long running queries. You can locate such queries with the following SQL.
If there are no symptoms related to the cases above, ALTIBASE HDB provides the parameters below to take control.
- AGER_WAIT_MINIMUM=200000 (microseconds)
- AGER_WAIT_MAXIMUM=1000000 (microseconds)
To enhance GC's performance, you need to lower the parameter values. Then the next wake-up time of GC, it will be faster.
The Buffer Cache uses the LRU (least recently used) algorithm to keep track of what was used and when. The LRU allows HDB to determine which buffers have not been accessed for a long time so that they can be replaced first.In ALTIBASE HDB, the LRU list is separated into hot and cold zones, and can thus be called a “hot-cold LRU list”. Buffers that are accessed frequently are placed in the hot zone, whereas those that are not accessed frequently are placed in the cold zone. When a buffer needs to be replaced, only the cold zone is searched, meaning that hot buffers are not considered as replacement candidates.
When a page is first loaded into a buffer, it is inserted at the mid-point (LRU cold first) of the LRU list. When allocating a buffer to the new data page, if there are no free buffers in the prepare list, the end (LRU cold last) of this list is searched first, and a cold buffer is then replaced. The buffer that is replaced is called a “victim”. Buffers that are read frequently are moved to the “LRU hot first” position in the hot zone. Meanwhile, dirty buffers, that is, buffers containing pages that have been updated but haven't been flushed to disk, are moved to the flush list. Additionally, clean buffers, that is, buffers containing pages that haven't been updated, are designated as replacement buffers as long as they are not in the hot zone. The relative size of the hot zone can be set using the HOT_LIST_PCT property. The default is 50, which means that half of the LRU list is used as the hot zone. Users need to analyze access patterns in their application since HOT_LIST_PCT property values can influence the performance.
- Example SQL how to alter HOT_LIST_PCT value
SQL PLAN CACHE
The Plan Cache is a piece of memory that ALTIBASE HDB uses in order to store SQL plans. This feature has been introduced to HDB because the Prepare-Phase(Parse-Validate-Optimize) is expensive in terms of CPU cycles. It's functionality can be compared to the Library Cache of Oracle.
ALTIBASE HDB provide performance-views to find out about the state of the SQL plan cache.
Following parameters are used to tune the performance of the SQL plan cache.
- Check the hit_ratio of the plan cache in the time interval.
- If CACHE_HIT_COUNT is much bigger than CACHE_MISS_COUNT, that would be a desirable state.
We highly recommend you that the ratio of "CACHE_MISS_COUNT / CACHE_HIT_COUNT" be normally less than 1/10000. Please also note that this figure is not applicable to all cases.
- Check the number of statements and SQL_PLAN_CACHE_BUCKET_CNT
The plan cache is managed by a hash-table. SQL_PLAN_CACHE_BUCKET_CNT property inidicates the number of hash table buckets in a SQL plan cache. To reduce the scan time, we highly recommend that SQL_PLAN_CACHE_BUCKET_CNT would be 1/4 of total number of statements based on our experience. Set the SQL_PLAN_CACHE_BUCKET_CNT = total_number_of_statement/4
This property indicates the number of execution contexts that are initially created when plans are generated. The initial number of execution contexts is specified before plans are created, however, this only determines the initial number. The number of execution contexts increases or decreases automatically
as required during runtime. Increasing this value can help realize better performance when only one plan is executed at a time, however, in other cases the plan size is merely increased, without realizing improved performance.To avoid the concurrent contention, we recommend that SQL_PLAN_CACHE_PREPARED_EXECUTION_CONTEXT_CNT would be similar as the number of CPU's.
If all considerations above are applied, and still observing performance issues, the following procedure will be helpful to figure out what the problem is.
We highly recommend you to consecutively gather 3 times of pstack with sleep interval (for example: 10 seconds).