文章目录
- 1.起因
- 2.查因过程
- 2.1 定位job
- 2.2 定位sql text
- 2.3 定位db_link
- 2.4 测试dblink
- 2.5 tnsping host
- 2.6 检查host信息
- 2.7检查网路状况
- 3.处置办法:
- 4.结论
1.起因
在巡查长事务时,有两个事务执行了很长时间没有完成
SELECT SE.SID,SE.SERIAL#,to_char(logon_time,'YYYY-MM-DD HH24:MI:SS') LOGON_TIME,SE.STATUS,SE.OSUSER,SE.MACHINE,SE.PROGRAM,SE.BLOCKING_SESSION,
SE.SQL_ID,SE.PREV_SQL_ID ,SE.EVENT,SE.P1TEXT,SE.P1,SE.P2TEXT,SE.P2,SE.P3TEXT,SE.P3,SE.SECONDS_IN_WAIT
FROM SYS.V_$SESSION SE , SYS.v_$session_wait SW
WHERE SE.SID=SW.SID AND SE.STATUS='ACTIVE' AND SE.USERNAME NOT IN ('SYS','SYSMAN','DBSNMP')
AND (SE.LAST_CALL_ET/86400) > 4/24
AND ( TO_CHAR(SYSDATE,'YYYYMMDD') <> TO_CHAR(SE.LOGON_TIME,'YYYYMMDD') OR SYSDATE-LOGON_TIME >= 4/24 );
Output:
SID SERIAL# LOGON_TIME STATUS OSUSER MACHINE PROGRAM BLOCKING_SESSION SQL_ID PREV_SQL_ID EVENT P1TEXT P1 P2TEXT P2 P3TEXT P3 SECONDS_IN_WAIT
----- ---------- ------------------- -------- ------- --------- ----------------------- ---------------- ------------- ------------- -------------------- -------------------- ---------- ------------------------------ ---------- ------------------------------ ---------- ---------------
184 61011 2024-06-05 23:59:58 ACTIVE oracle GSIAISDB oracle@GSIAISDB (J002) 59zjsg14auv2n fqrn289mq4kdd single-task message 0 0 0 55877
213 24855 2024-06-05 16:21:28 ACTIVE oracle GSIAISDB oracle@GSIAISDB (J000) apa8h0u6kkkcj fqrn289mq4kdd single-task message 0 0 0 83330
从上面的输出,导致这个长事务的event是single-task message,等待时间分别为55877、83330秒,根据program字段发现这是两个job,执行中的sql_id事分别是59zjsg14auv2n与apa8h0u6kkkcj
2.查因过程
2.1 定位job
涉事的job信息如下:
select jr.sid,j.job,j.log_user,j.schema_user,j.last_date,j.last_sec,j.this_date,j.this_sec,j.next_Date,j.next_sec,j.total_time,j.broken,j.interval,j.failures,j.what from dba_jobs_running jr,
dba_jobs j
where jr.job=j.job and jr.job in (261,81)
Output:
SID JOB LOG_USER SCHEMA_USER LAST_DATE LAST_SEC THIS_DATE THIS_SEC NEXT_DATE NEXT_SEC TOTAL_TIME B INTERVAL FAILURES WHAT
---------- ---------- -------------------- ------------------------------ ------------------ -------------------------------- ------------------ -------------------------------- ------------------ -------------------------------- ---------- - ------------------------------ ---------- ------------------------------
213 261 EDP EDP 05-JUN-24 15:22:26 05-JUN-24 16:22:27 05-JUN-24 16:22:26 223707 N sysdate + (1/24) 0 PRO_TRS_BATCH_ETL_GSCM_JOB;
184 81 EDP EDP 05-JUN-24 00:00:02 06-JUN-24 00:00:03 06-JUN-24 00:00:00 145656 N trunc(sysdate+1) 0 PRO_IMP_EXPM_TO_GSCM_JOB;
2.2 定位sql text
进一步确定sql text入下:
select sql_fulltext from v$sqlarea where sql_id in ('59zjsg14auv2n','apa8h0u6kkkcj');
Output:
SQL_FULLTEXT
--------------------------------------------------------------------------------
SELECT NVL(COUNT(*), 0) FROM VIE_EDI_DEBIT_M@AIS2MATDB WHERE RECEIVE_ID = :B1 AN
SELECT COUNT(*) FROM VIE_TW_ACC_LOCK@AIS2MATDB WHERE ACC_YM= :B1
从上面的输出来看,两个sql都有使用dblink
2.3 定位db_link
涉事dblink信息如下,link使用的host是DTC
select * from dba_db_links where db_link='AIS2MATDB'
Output:
OWNER DB_LINK USERNAME HOST CREATED
------------------------------ ------------------------------ ------------------------------ -------------------- ------------------
PUBLIC AIS2MATDB SCM_HQIN DTC 20-JAN-15
2.4 测试dblink
测试dblink,出现tns:connect timeout
2.5 tnsping host
使用tnsping,返回TNS-0305:Failed to resolve name
[oracle@GSIAISDB ~]$ tnsping DTC
TNS Ping Utility for Linux: Version 10.2.0.5.0 - Production on 06-JUN-2024 15:48:25
Copyright (c) 1997, 2010, Oracle. All rights reserved.
Used parameter files:
TNS-03505: Failed to resolve name
2.6 检查host信息
cat $ORACLE_HOME/network/admin/tnsnames.ora|grep -i dtc -A 5
DTC =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 172.19.2.69)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = DTC)
)
)
2.7检查网路状况
使用ping目标地址,返回timeout:
ping 172.19.2.69
Ping 172.19.2.69 (使用 32 位元組的資料):
要求等候逾時。
要求等候逾時。
要求等候逾時。
要求等候逾時。
3.处置办法:
1.kill涉事session
alter system disconnect session 180,61011' immediate
System DISCONNECT 已更改.
alter system disconnect session '213,24855' immediate
System DISCONNECT 已更改.
2.通知网管检查网路异常
4.结论
对于此事件,oracle官方解释如下
single-task message
When running single task, this event indicates that the session waits for the client side of the executable.
Wait Time: Total elapsed time that this session spent in the user application.
Parameters: none
很明显这是一个网络中断引起的事件