I ran into a “fun” problem this morning, and I’m wondering what I should have done to prevent it. Something had gone off with the VM that hosts my RT instance and it had run out of memory. No big deal, I probably need to do some tuning. But…
- The OOM killer killed the MariaDB server.
- RT, when it found it couldn’t connect to the database server, dropped back to the install interface. Running publicly.
- A confused user went all the way through the configuration and saved it. I guess they thought it was This means that RT overwrioe RT_SiteConfig.pm. (Yes, it was mode 640, owned by apache. Oops.)
Note that I’m using the Fedora distro packages: rt-4.4.1-5.fc25.noarch. They do contain a few small patches, but nothing remotely significant.
So, obviously I didn’t realize that the install interface would actually allow anything bad to happen. Certainly my fault. It was no big deal to recover things, but I’m trying to make sure that I’ve done everything possible to prevent it. Obviously changing RT_SiteConfig.pm to not be writable by the web server was step 1. But what else should I do?
RT will fall back to the install interface whenever it can’t talk to the database server. I have looked but I’m not having any luck finding a configuration setting to disable that. Ideally I’d replace it with something that indicates downtime. I’m thinking that absent any configuration option, the best thing I can do is adjust the web server configuration to redirect accesses to things under “install” to some other page that I set up to say that things are offline.
It’s also possible that something in the Fedora packaging is set up in a way that makes this more problematic. I can at least attempt to get the packaging fixed if anyone has any suggestions. Is it smart at all to leave RT_SiteConfig mode 640, apache:apache? I’d think mode 640, root:apache would be smarter personally but then how would the configuration interface work at all? By root running a standalone server?