Wednesday, August 20, 2014

Removing Fatal NI connect error 12170 from Database Alert Log

In our one of the 11gR2 Production Database, there were so many entries w.r.t Fatal NI connect error 12170 + TNS-12535: TNS:operation timed outTNS-00505: Operation timed out (with different ports), found in both RAC Instances Alert Log file.

Problem:


Fatal NI connect error 12170.


  VERSION INFORMATION:

        TNS for Linux: Version 11.2.0.3.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
  Time: 20-AUG-2014 14:17:36
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535

TNS-12535: TNS:operation timed out

    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out

    nt secondary err code: 110
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=xxx.xxx.xxx.xxx)(PORT=44326))

Reason:


The same error message was repeating during whole day almost for every application server.

I’ve finally found what was causing the problem. Our database is behind a firewall. Firewall has an “idle session timeout” value. If a connection remains idle for more than the “idle session timeout” value it drops the connections. 

Application developers usually configure their connection pools to remain alive for a long time, which is more than the “idle session timeout” value of the firewall. Not all the connections in the pool are used. Some stays idle. After sometime the firewall drops them and I get those operation timed out problems.


Solution / Workaround: 


1) Add the following line to the sqlnet.ora file on the server.


SQLNET.EXPIRE_TIME=10

In this configuration database will probe the application servers every 10 minutes to see if they are up. In fact this is a mechanism to determine stale connections by database. Because it sends packets to application servers every ten minutes, the connections are detected as active by firewalls and they are not broken.


**In an installation that includes GRID, this parameter should be set in the RDBMS_HOME/network/admin/sqlnet.ora file. This would be the default location for sqlnet.ora file parameters referenced by the instance. 

2) One way to minimize the impact is by using the parameter SQLNET.INBOUND_CONNECT_TIMEOUT (default to 60 seconds on 10gR2 and 11g) but, sometimes, this value is not adequate.

Oracle also mention the occurrence of this error if you use DB Console or Enterprise Manager to monitor your databases and the em agent will try to connect to the target database repeatedly and, statistically, some will fail (frequency will depend on how busy your system is).
Most of the time (certainly for DB Console and Enterprise Manager Agent) the application will try to connect again and it will succeed.
To fix the problem you could increase the value of SQLNET.INBOUND_CONNECT_TIMEOUT (in Seconds) in the sqlnet.ora / CONNECT_TIMEOUT_<DB_Name> (in Minutes) in the listener.ora file located on the server side.
3) If you already have a value you have considered adequate, you might want to add the following line on your listener.ora file:
DIAG_ADR_ENABLED_<listener_name>=OFF  (in listener.ora)
DIAG_ADR_ENABLED=OFF in sqlnet.ora
This line will hide the error on the alert.log file and the error will be posted on the $ORACLE_HOME/network/log/sqlnet.log file

No comments:

Post a Comment