How to reduce the number of cached records in EventLog Analyzer

How to reduce the number of cached records in EventLog Analyzer

Issue description 

Cached record files are unprocessed log files that can accumulate within the EventLog Analyzer local directory when the indexing process is disrupted. When the number of cached record files exceeds 100, an email notification will be sent out as "The folder ES\CachedRecord  has crossed its threshold limit. This is not favorable for real-time log processing and alerts." or "ManageEngine EventLogAnalyzer: CachedRecord folder has crossed 50GB" to the email address configured under Admin Settings.

Possible causes 

If the indexing process in EventLog Analyzer is disrupted, the log entries being collected will start to accumulate as cached record files.
This can be caused by:
  1. Hardware resource-related factors.
  2. EventLog Analyzer application-level factors.
 
Hardware resource-related factors:
Depending on the log flow rate for the log sources configured in EventLog Analyzer, sufficient system resources must be allocated to the EventLog Analyzer server.
For more details on calculating log flow rate, refer to System Resource Calculator.
  • CPU and RAM:
    • Ensure that CPU and RAM resources allocated to EventLog Analyzer is as per the recommendations.
    • For virtual machines, 100% RAM/CPU allocation to the virtual machine running EventLog Analyzer is required.
    • Sharing memory or CPU with other virtual machines on the same host may result in RAM/CPU starvation. This may negatively impact EventLog Analyzer's performance.
    • Employ thick provisioning, as thin provisioning increases I/O latency.
  • Storage disk:
    • Storage capacity should be allocated in accordance with the standard recommendations outlined in the system requirements page.
    • Based on the log flow rate value, adequate storage space allocation is required.
    • The storage disk should preferably be faster-performing solid-state drives or RAID/high-speed SAN drives.
    • The input/output operations per second (IOPS) value must be between 750 and 1,500. (To calculate the IOPS in Windows-based machines, you can use Windows Resource Monitor.) For more details, refer to this article.
  • Network bandwidth:
    • If you have configured network-attached storage (shared path) for storing the ElasticSearch indices, then EventLog Analyzer might face some difficulties with indexing whenever there is a bandwidth limitation between the EventLog Analyzer server and the remote shared path.
    • We recommend using direct-attached storage with high IOPS for storing Elasticsearch indices.
 
EventLog Analyzer application-level factors
  • Log flow rate:
    • EventLog Analyzer has definite log handling capacity and if the incoming logs from various log sources exceed the limit, it is likely for EventLog Analyzer to experience issues with processing or indexing the logs.
    • For example, EventLog Analyzer can handle around 3,000 Windows logs per second, 20,000 Type 1 syslogs (Linux, HP, PfSense, Juniper), 12,000 Type 2 syslogs (Cisco, SonicWall, Huawei, NetScreen, Meraki, H3C), 7,000 Type 3 syslogs (Barracuda, Fortinet, Checkpoint), 5,000 Type 4 syslogs (Palo Alto, Sophos, F5, Firepower, and other syslogs).
    • A single-installation server can handle either a maximum of 3,000 Windows logs or any of the high-flow values listed for each log type above.
    • To calculate the log flow rate in EventLog Analyzer, refer to this article.
  • Correlation and activity rules:
    • Correlation and activity rules are resource-intensive. If too many rules are enabled in EventLog Analyzer, they can cause the tool to experience difficulties in processing incoming rules, thereby leading to the piling up of cached record files.
    • Therefore, it is recommended to enable only the necessary correlation and activity monitoring rules in EventLog Analyzer to ensure optimal performance.
  • ElasticSearch heap memory:
    • EventLog Analyzer's Elasticsearch module requires a certain amount of heap memory to function properly.
    • For example, if the overall size of the Elasticsearch data folder is around 700GB, then Elasticsearch would require over 11GB of allocated heap memory.
      Note: A max of 32GB heap can be allocated.
    • This allocation happens automatically due to resource restricting. If this does not happen, follow the steps to increase heap manually.
    • Depending on the data folder size, you should raise the heap accordingly.
    • For more details, refer to this best practices guide.
  • Elasticsearch data holding capacity:
    • A single Elasticsearch node can handle around 1.5–1.9TB worth of indices.
    • You can find the size by checking the System Diagnostics in EventLog Analyzer.
    • If usage is found to be more than 1.9TB, we recommend you upgrade to EventLog Analyzer Distributed edition.

 Resolution steps

  1. Build number: Check if your build number is above 12320. If not, upgrade EventLog Analyzer to the latest version.
  1. Spike in CPU/RAM utilization:
    1. If you receive an email notification about unprocessed cached record files, open your Task Manager and verify if processes like Zulu Platform x64 Architecture and Sysevtcol are using a significantly high level of CPU/RAM resources.
    2. If you find a process with high CPU/RAM usage, right-click the process and click Open File Location.
    3. In case the Zulu Process location is identified to be in "Installation directory - > ManageEngine - > EventLog Analyzer - > jre - > bin" or "Installation directory - > ManageEngine - > Eventlog Analyzer - > ES  - > Bin" or "Installation directory - > ManageEngine - > elasticsearch - > ES - > Jre", then it is indicative that Eventlog Analyzer and it's indexing process is actively happening in the background.
    4. However, if there are other applications or tools consuming high CPU/RAM, then it is likely that EventLog Analyzer is being deprived of the required processing power.
    5. To resolve this issue, please check with the respective support teams of the third-party tools.
    6. If the log flow rate is within the limits mentioned above and the allocated resources are as per recommendations, then it's highly likely that there are too many correlation rules enabled in EventLog Analyzer.
    7. To resolve this, please disable the unnecessary correlation rules.
  2. Disk speed:
    1. Open your Task Manager and check the disk utilization. If it is up significantly, check the IOPS value of the disk as described in this article.
    2. If the IOPS value is less than the recommended values (during times when disk utilization is over 90%), it is advised to move EventLog Analyzer to a faster performing drive.
  3. Check for the count of cached record files:
    1. In the EventLog Analyzer server, open File Explorer and navigate to Installation Directory > ManageEngine > EventLog Analyzer > ES > Cached Records.
    2. If the number of files within the Cached Records folder is reducing steadily, this means EventLog Analyzer's log indexing activity is happening faster than the ingestion of incoming logs.
    3. This occurs when there's a sudden spike in log flow from log sources that exceeds the handling capacity.
    4. The number of cached records will eventually come down and become zero over a period of time. This can take hoursdepending on the count of files.
    5. However, if the number of files is steadily rising:
      1. Check the log flow rate value for various types of log sources.
      2. If the log flow rate value is higher than the handling capacity, we recommend you upgrade to EventLog Analyzer Distributed edition.
      3. You may also consider enabling log collection filters to filter out unnecessary logs or noisy events to bring down the overall log flow rate. 

How to reach support

If you have completed both the hardware-level and application-level troubleshooting steps above and the number of cached record files is still not reducing, please reach out to our support team.
 

 

                  New to ADSelfService Plus?

                    • Related Articles

                    • How to increase the records per page in Eventlog Analyzer

                      Objective By default, EventLog Analyzer displays 10 records per page in the Reports section. While users can manually adjust this number, the change is temporary. This article will guide you through configuring a global rule so that all reports ...
                    • How to increase the number of records displayed in EventLog Analyzer's Reports and Search modules

                      Objective By default, EventLog Analyzer displays a limited number of records (default: 10 records) in the UI for both the Reports and Search modules. However, administrators may need to increase this limit to view more log entries directly within the ...
                    • Troubleshooting guide: EventLog Analyzer UI is unresponsive

                      Overview This document outlines the common causes and recommended steps to resolve the issue when the EventLog Analyzer UI becomes unresponsive. Possible causes Insufficient system resources High CPU or memory usage on the server. Low disk space in ...
                    • How to change the MSSQL backend database details used by EventLog Analyzer

                      Objective This article provides step-by-step instructions for updating the Microsoft SQL Server (MSSQL) backend database details used by EventLog Analyzer, such as: IP address of the database server. Port number. Database password. Updating your ...
                    • Introduction to EventLog Analyzer

                      What is log management?  An enterprise network consists of different entities—perimeter devices, workstations, servers, applications, and more. Each entity records every activity that unfolds within it in the form of logs. These logs hold information ...