Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


The above query shows the current usage status of memory resources inside Altibase. With this information, it is possible to compare and analyze which module has a large memory increase with a periodic result log. (Comparison of results of the previous day/same day)

Impact of performing bulk change operation or long-running queries

...

Altibase supports MVCC. MVCC is a technique that can improve the performance of DBMS itself by preventing waiting between inquiry/change transactions. (For more detailed information, please refer to the "Altibase MVCC & GC Guide".) According to the implementation of MVCC, there are data to be deleted called Garage Data. If there is a large amount of data to be deleted or queries with a long-running time, the data to be deleted cannot be deleted until the corresponding transaction is completed. As a result, an increase in online log files or an increase in physical memory may occur. Queries with a long-running time that cause this phenomenon can be checked as follows.

 

SELECT *
FROM   V$STATEMENT
WHERE  TOTAL_TIME > 100000000
       AND EXECUTE_FLAG = 1
;

 

This query retrieves queries that are currently being executed and execution time is more than 100 seconds.

 

SELECT SESSION_ID,
       ID,
       RPAD(QUERY, 150)
FROM   V$STATEMENT
WHERE  TX_ID = (SELECT ID
                FROM   V$TRANSACTION
                WHERE  MEMORY_VIEW_SCN IN (SELECT MINMEMSCNINTXS
                                           FROM   V$MEMGC
                                           LIMIT  1))
;

 

This query retrieves queries that have been executing for a long time to prevent the processing of the data to be deleted.

For more detailed information, please refer to the "Altibase Monitoring Query Guide".

Failure due to system problems

...

This section describes the type of error due to insufficient system resources.

Error type

Description 

Out of memory

Insufficient memory

 

Resource busy

Temporary unable to access system resources

 

Too many open files

When the limit on the number of files that can be accessed at the same time is exceeded

 

No space left on device

Insufficient disk space

If the above error types occur as a cause in the Altibase trace log, there are cases where the system error code is also recorded when an error message is recorded. It is also possible to check whether insufficient system resources occurs with the corresponding system error code.

In addition, check the following logs to see if an error has occurred in the system.

OS                                           

System log to check

 

SUN

/var/adm/message file

HP

/var/adm/syslog/syslog.log file

AIX

errpt -a

Linux

/var/log/message file

Replication Failure

...

Altibase provides data replication method using TCP/IP network for high availability. If there is any delay or other error during the service with replication, take the following actions.

Type

Description

Replication sender/receiver problem

Sender/Receiver does not operate normally due to network error or replication setting error

 

Occurrence of data conflict

When data cannot be replicated because the data values in the DB between both ends are different

Check the following replication Sender/Receiver problems.

Type                         

Method

Existence of Sender                   

SELECT COUNT(*)
FROM   V$REPSENDER
;

 

If the Sender is running normally, the result of the above query should be displayed as "1" or higher. 

Existence of Receiver

SELECT COUNT(*)
FROM   V$REPRECEIVER
;

 

If the Receiver is running normally, the result of the above query should be displayed as "1" or higher.

(Must exist as many as the number of redundant objects)


Among the Altibase trace logs, various messages related to replication are recorded in "altibase_rp.log".

Message when replication is running normally.

[Recovery Sender] Replication REP1 Start... at [6030857] (Log of the Server that started replication)                                                                                                          
[Receiver] Replication REP1 Started ... (Log of the Receiver that received the command to start replication)

If there is a problem with the Sender's connection attempt (a problem with the network or the other receiver)
ERR-61012(errno=111) [Sender] Failed to connect to the peer server

When the Receiver is shut down
ERR-6104b(errno=0) [Receiver] REP1 receiver is ended (by thr_exit)
Receive and process SR
When the other party normally stopped replication
RECEIVER:REPLICATION STOP MSG arrived!

The problem of replication Sender/Receiver should be analyzed whether it was the execution of the user's intended command at the time the error was recorded or due to a network failure. If that status does not change even with repeated replication restart commands, immediately request technical support from Altibase Technical Headquarters.

The problem that will be caused in the replication Sender/Receiver is that the data to be sent with replication cannot be sent, and data is changed. Therefore, it is necessary to monitor the status of the replication Sender, which is called the replication gap.

SELECT REP_NAME,
       REP_GAP
FROM   V$REPGAP
 
REP_NAME         REP_GAP                                                                                    
------------------------------------------------------------------
REP1             0
1 row selected.

                                                                                                                                                                                                                                                    

REP_NAME refers to the object name of the replication, and the size from the log record currently being sent by REP_GAP to the last log record that has not been sent yet (default in MB).

 

※ REP_GAP of Altibase version 6.5.1 or lower is as follows.

Calculate with online log file serial number SN (Sequence Number) and XSN

REP_GAP(Replicaiton Gap) = [Latest SN of local SERVER]-[Latest XSN of local SERVER]Image Added

This value is close to zero(0) and shows a continuous changing value. However, if this value increases continuously, it can be estimated that there is a problem in Sender/Receiver, so replication of each server and the network must be checked.

Data conflict due to replication can be checked in "altibase_rp.log" or "altibase_rp_conflict.log" depending on the configuration.

 

In case of PK crash due to INSERT DML (Dup Error)
ERR-11058(errno=0) The row already exists in a unique index.                                                                                                                                                    

When data is not on the other side by DELETE DML
ERR-61036(errno=0) [Receiver] err_not found in deleteXlog()
ERR-61000(errno=0) The received record is not found in the database.

When data is not on the other size by UPDATE DML (Not Found)
ERR-6103a(errno=0) [Receiver] err_not_found in updateXlog()
ERR-61000(errno=0) The received record is not found in the database.

when the data is different from the original is sent by UPDATE DML.
ERR-61035(errno=0) [Receiver] An update conflict encountered.
ERR-61001(errno=0) A conflict has been occurred while executing the received statement.

Except for the type of INSERT, output messages are recorded in the form of two errors each for one error. Additionally, since the SQL statement log is recorded for each type of data conflict by which SQL statement, it can be important information to find the corresponding data. The reason for such a phenomenon occurs is that the original data is incorrect or that the replication server attempts to change data is having the same PK without distinction. Therefore, it is necessary to carefully review the execution form of the application based on the SQL information.