High Availability - PostgreSQL fails to configure properly

High Availability - PostgreSQL fails to configure properly

We have been running a primary and secondary server for a long time now in HA, but the HA seems to have broken recently.

After rebooting the servers the HA show as running, then after 5 minutes it suddenly goes red and "not running"

The PMP0 log showed the following a few minutes after startup:

[12:52:59:151]|[02-24-2021]|[com.adventnet.passtrix.utils.HAUtils]|[INFO]|[43]: Going to start rubyrep replication process...|
[12:54:03:284]|[02-24-2021]|[com.adventnet.passtrix.ProcessUtil]|[INFO]|[43]: rubyrep process error: Exception caught: no connection to 'right' database|
[12:54:03:284]|[02-24-2021]|[com.adventnet.passtrix.ProcessUtil]|[INFO]|[43]:  Going to kill process for ResourceId 8273|
[12:54:03:284]|[02-24-2021]|[com.adventnet.passtrix.ProcessUtil]|[INFO]|[43]: Going to kill Process: java.lang.ProcessImpl@265ab134 for id: 8,273|
[12:54:03:284]|[02-24-2021]|[com.adventnet.passtrix.ProcessUtil]|[INFO]|[43]: Process list is null during remove: 8,273|
[12:54:03:284]|[02-24-2021]|[com.adventnet.passtrix.utils.HAUtils]|[INFO]|[43]: rubyrep replication process stopped...|


As I couldn't figure out what was wrong, I decided to just uninstall the secondary server and do the read-only install and HAPack setup again from scratch as the process doesn't really take that long.

Suffice to say, recreated the HA from scratch and I'm still getting the same error messages about no connection to 'right' database.

Any troubleshooting assistance for the PG replication welcome here.  I've checked basic connectivity between the servers by telnet the port 2345 in both directions so it's not firewall.  AFAIK nothing has changed in our environment.

I've attached the pmp0 log.

Thanks
                New to ADManager Plus?

                  New to ADSelfService Plus?