I am using the Splunk logging driver to send logs to splunk with the following command line:
docker run -d -p 443:8443 --log-driver=splunk --log-opt splunk-token=REDACTED --log-opt splunk-url=https://myloghost.example.net:8088 --log-opt splunk-sourcetype=idp --log-opt splunk-index=auth_idp --log-opt splunk-insecureskipverify=1 --log-opt splunk-format=raw --log-opt splunk-gzip=true --name shib --restart always --health-cmd 'curl -k -f https://127.0.0.1:8443/idp/status || exit 1' --health-interval=2m --health-timeout=30s
The container runs normally, and logs flow into Splunk. All is good. This is in a testing environment, so it is not always in use, but the container is left running. Sometimes, when I start using the service the container provides, nothing is logged to Splunk immediately. If I wait 10-15 minutes, the logs eventually show up with the correct time stamps, etc.
I've noticed on the docker host that netstat -tpn | grep -e 8088
gives me output similar to this:
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 947 xxx.xxx.x.xxx:49010 xxx.xxx.x.xx:8088 ESTABLISHED 12682/dockerd-curre
On the Splunk host, the same command shows zeroes in the Recv-Q and Send-Q columns. The Splunk Distributed Management Console doesn't show any events received during the lag time. On the Docker host, there is a message in /var/log/messages
from Docker that happens at the same time the logs are finally sent to Splunk:
Jul 6 13:14:19 idpdock0-0 dockerd-current: time="2018-07-06T13:14:19.428396282-04:00" level=error msg="Post https://myloghost.example.net:8088/services/collector/event/1.0: read tcp xxx.xxx.x.xxx:49010->xxx.xxx.x.xx:8088: read: connection timed out"
It seems to me like the logging driver get stuck trying to do some I/O operation, and when it finally times out, it tries again and the logs are sent. However, I have no idea what the condition that causes it to get stuck is, nor do I know of any way to adjust the time out period.
I'd like to know why the logs take so long to get to Splunk sometimes, and if there is anything I can do to avoid the delays.