RT 4.2.15 nginx & system hanging and crashing every few hours

Hello, I am trying to wrap my heard around why our RT system started crashing randomly every few hours during business hours. This is a relatively new development.

Below are the errors I am getting.

My platform is:
MySQL 5.5.61
perl, v5.10.1
nginx
RT 4.5.1

Is there a tool or command I can run to check that the database and whatever dependency requirements were done are valid and there are no issues? Another individual upgraded the application to 4.5.1 not too long ago and I am not sure if they did all that was requirement in the readme’s.

[31073] [Tue Oct 16 18:02:03 2018] [warning]: The Bookmarks query syntax is deprecated, and will be removed in RT 4.4. You should use id = ‘Bookmarked’ instead. Object: RT::Attribute #311. Call stack:
[/opt/rt4/share/html/Elements/MyRT:99]
[/opt/rt4/share/html/index.html:78]
[/opt/rt4/sbin/…/lib/RT/Interface/Web.pm:681]
[/opt/rt4/sbin/…/lib/RT/Interface/Web.pm:369]
[/opt/rt4/share/html/autohandler:53] (/opt/rt4/sbin/…/lib/RT.pm:959)
[31073] [Tue Oct 16 18:02:03 2018] [warning]: The Bookmarks query syntax is deprecated, and will be removed in RT 4.4. You should use id = ‘Bookmarked’ instead. Object: RT::Attribute #311. Call stack:
[/opt/rt4/share/html/Elements/MyRT:99]
[/opt/rt4/share/html/index.html:78]
[/opt/rt4/sbin/…/lib/RT/Interface/Web.pm:681]
[/opt/rt4/sbin/…/lib/RT/Interface/Web.pm:369]
[/opt/rt4/share/html/autohandler:53] (/opt/rt4/sbin/…/lib/RT.pm:959)
[31074] [Tue Oct 16 18:02:07 2018] [info]: rt-4.2.15-31074-1539712927-1999.118598-5-0@Help #118598/2822863 - Scrip 5 On Correspond Notify AdminCcs (/opt/rt4/sbin/…/lib/RT/Action/SendEmail.pm:284)
[31074] [Tue Oct 16 18:02:07 2018] [info]: <rt-4.2.15-31074-1539712927-1999.118598-5-0@He

I am also seeing strange errors as per below.

[31850] [Tue Oct 16 18:57:30 2018] [warning]: cannot remove path when cwd is /opt/rt4/var/qFHFNqGlwF for /opt/rt4/var/qFHFNqGlwF: at /usr/local/share/perl5/File/Temp.pm line 784. (/usr/local/share/perl5/Carp.pm:168)

RT version 4.5.1? You might want to check that as, unless you’re talking to us from the future, the most recent RT release is 4.4.3.

Have you looked for saved queries that have this “Bookmarks” query syntax in them?

sorry. typo. 4.2.15

I just checked the databases via mysql tuner.

Below are specs for maschine.
VM
5GB RAM
6 vCPU;s


[root@help mysql]# free -h
total used free shared buffers cached
Mem: 4.9G 4.7G 122M 20K 6.1M 349M
-/+ buffers/cache: 4.4G 477M
Swap: 3.9G 43M 3.9G

---- my.cnf snip of innodb settings
#innodb_data_home_dir = /var/lib/mysql
innodb_log_files_in_group = 3
innodb_log_file_size = 341MB
innodb_log_buffer_size = 128M
innodb_flush_log_at_trx_commit = 1
innodb_buffer_pool_size = 3584MB
innodb_additional_mem_pool_size = 2M
innodb_file_io_threads = 4
innodb_lock_wait_timeout = 50
innodb_buffer_pool_instances = 3
#sort_buffer = 2M
innodb_flush_method = O_DIRECT
table_cache = 600
max_heap_table_size = 512M
tmp_table_size = 256M
thread_cache_size = 3
max_connections=20

below are my results.


[OK] Currently running supported MySQL version 5.5.61
[OK] Operating on 64-bit architecture

-------- Database Metrics --------------------------------------------------------------------------

[!!] 3 different collations for database help_rt
[OK] 1 engine for help_rt database.
[!!] help_rt table column(s) has several charsets defined for all text like column(s).
[!!] help_rt table column(s) has several collations defined for all text like column(s).

-------- Storage Engine Statistics -----------------------------------------------------------------
[OK] Total fragmented tables: 0

-------- Analysis Performance Metrics --------------------------------------------------------------
[OK] No stat updates during querying INFORMATION_SCHEMA.

-------- Database Metrics --------------------------------------------------------------------------

[!!] 3 different collations for database help_rt
[OK] 1 engine for help_rt database.
[!!] help_rt table column(s) has several charsets defined for all text like column(s).
[!!] help_rt table column(s) has several collations defined for all text like column(s).

-------- Table Column Metrics ----------------------------------------------------------------------
[!!] Consider changing type for column id in table help_rt.ACL
[!!] Consider changing type for column PrincipalType in table help_rt.ACL
[!!] Consider changing type for column PrincipalId in table help_rt.ACL
[!!] Consider changing type for column RightName in table help_rt.ACL
[!!] Consider changing type for column ObjectType in table help_rt.ACL
[!!] Consider changing type for column ObjectId in table help_rt.ACL
[!!] Consider changing type for column Creator in table help_rt.ACL
[!!] Consider changing type for column Created in table help_rt.ACL
[!!] Consider changing type for column LastUpdatedBy in table help_rt.ACL
[!!] Consider changing type for column LastUpdated in table help_rt.ACL
[!!] Consider changing type for column id in table help_rt.Articles
[!!] Consider changing type for column Name in table help_rt.Articles
[!!] Consider changing type for column Summary in table help_rt.Articles
[!!] Consider changing type for column SortOrder in table help_rt.Articles
[!!] Consider changing type for column Class in table help_rt.Articles
[!!] Consider changing type for column Parent in table help_rt.Articles
[!!] Consider changing type for column URI in table help_rt.Articles
[!!] Consider changing type for column Creator in table help_rt.Articles
[!!] Consider changing type for column Created in table help_rt.Articles
[!!] Consider changing type for column LastUpdatedBy in table help_rt.Articles
[!!] Consider changing type for column LastUpdated in table help_rt.Articles
[!!] Consider changing type for column id in table help_rt.Attachments
[!!] Consider changing type for column TransactionId in table help_rt.Attachments
[!!] Consider changing type for column Parent in table help_rt.Attachments
[!!] Consider changing type for column MessageId in table help_rt.Attachments
[!!] Consider changing type for column Subject in table help_rt.Attachments
[!!] Consider changing type for column Filename in table help_rt.Attachments
[!!] Consider changing type for column ContentType in table help_rt.Attachments
[!!] Consider changing type for column ContentEncoding in table help_rt.Attachments

ALTER TABLE help_rt.Users MODIFY SMIMECertificate CHAR(0);
MySQL was started within the last 24 hours - recommendations may be inaccurate
Reduce your overall MySQL memory footprint for system stability
Enable the slow query log to troubleshoot bad queries
Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1
Consider installing Sys schema from https://github.com/mysql/mysql-sys

Variables to adjust:
query_cache_size (=0)
query_cache_type (=0)
query_cache_limit (> 16M, or use smaller result sets)
innodb_buffer_pool_size (>= 11.6G) if possible.

When you say RT is crashing, what do you see in the browser? Error page from the web server? Blank page? An error message from RT? Those logs entries in your original posting appear to be warnings rather than errors (ie a feature still works in your version of RT, but will be going away in RT version 4.4 so you need to stop using it before you consider upgrading).

What we see is a blank page and RT is not responding.

Memory usage on the VM is in the high 90% wise. Most of it used by MySQL.

total       used       free     shared    buffers     cached
Mem:          4.9G       4.5G       347M        36K       131M       519M
-/+ buffers/cache:       3.9G       998M
Swap:         3.9G        49M       3.9G

Output of database tuner is:

Below are some of the errors.

Oct 19 10:02:43 RT: [31136] Use of uninitialized value $Text::Template::GEN7::comment in substitution (s///) at template line 1.
.
       [/opt/rt4/share/html/Elements/MyRT:99]
        [/opt/rt4/share/html/index.html:78]
        [/opt/rt4/sbin/../lib/RT/Interface/Web.pm:681]
        [/opt/rt4/sbin/../lib/RT/Interface/Web.pm:369]
        [/opt/rt4/share/html/autohandler:53] (/opt/rt4/sbin/../lib/RT.pm:959)
[31265] [Fri Oct 19 14:04:01 2018] [warning]: The __Bookmarks__ query syntax is deprecated, and will be removed in RT 4.4.  You should use id = '__Bookmarked__' instead.  Object: RT::Attribute #311.  Call stack:
        [/opt/rt4/share/html/Elements/MyRT:99]
        [/opt/rt4/share/html/index.html:78]
        [/opt/rt4/sbin/../lib/RT/Interface/Web.pm:681]
        [/opt/rt4/sbin/../lib/RT/Interface/Web.pm:369]
        [/opt/rt4/share/html/autohandler:53] (/opt/rt4/sbin/../lib/RT.pm:959)

[31238] [Fri Oct 19 14:02:59 2018] [warning]: The __Bookmarks__ query syntax is deprecated, and will be removed in RT 4.4.  You should use id = '__Bookmarked__' instead.  Object: RT::Attribute #311.  Call stack:
        [/opt/rt4/share/html/Elements/MyRT:99]
        [/opt/rt4/share/html/index.html:78]
        [/opt/rt4/sbin/../lib/RT/Interface/Web.pm:681]
        [/opt/rt4/sbin/../lib/RT/Interface/Web.pm:369]
        [/opt/rt4/share/html/autohandler:53] (/opt/rt4/sbin/../lib/RT.pm:959)
[31238] [Fri Oct 19 14:02:59 2018] [warning]: The __Bookmarks__ query syntax is deprecated, and will be removed in RT 4.4.  You should use id = '__Bookmarked__' instead.  Object: RT::Attribute #311.  Call stack:
        [/opt/rt4/share/html/Elements/MyRT:99]
        [/opt/rt4/share/html/index.html:78]
        [/opt/rt4/sbin/../lib/RT/Interface/Web.pm:681]
        [/opt/rt4/sbin/../lib/RT/Interface/Web.pm:369]
        [/opt/rt4/share/html/autohandler:53] (/opt/rt4/sbin/../lib/RT.pm:959)
[31147] [Fri Oct 19 14:03:06 2018] [notice]: More than 50 possible Owners found for Ticket 118858; switching to autocompleter.  See the $AutocompleteOwners configuration option (/opt/rt4/share/html/Elements/SelectOwnerDropdown:88)
[31147] [Fri Oct 19 14:03:06 2018] [warning]: Use of uninitialized value $Text::Template::GEN21::comment in substitution (s///) at template line 1. (template:1)
[31147] [Fri Oct 19 14:03:06 2018] [warning]: Use of uninitialized value $Text::Template::GEN23::comment in substitution (s///) at template line 1. (template:1)
Child[0] died, respawn
[31238] [Fri Oct 19 14:03:09 2018] [notice]: More than 50 possible Owners found for Queue 9; switching to autocompleter.  See the $AutocompleteOwners configuration option (/opt/rt4/share/html/Elements/SelectOwnerDropdown:88)
[31265] [Fri Oct 19 14:03:25 2018] [warning]: The __Bookmarks__ query syntax is deprecated, and will be removed in RT 4.4.  You should use id = '__Bookmarked__' instead.  Object: RT::Attribute #311.  Call stack:
        [/opt/rt4/share/html/Elements/MyRT:99]
        [/opt/rt4/share/html/index.html:78]
        [/opt/rt4/sbin/../lib/RT/Interface/Web.pm:681]
        [/opt/rt4/sbin/../lib/RT/Interface/Web.pm:369]
        [/opt/rt4/share/html/autohandler:53] (/opt/rt4/sbin/../lib/RT.pm:959)
[31265] [Fri Oct 19 14:03:25 2018] [warning]: The __Bookmarks__ query syntax is deprecated, and will be removed in RT 4.4.  You should use id = '__Bookmarked__' instead.  Object: RT::Attribute #311.  Call stack:
        [/opt/rt4/share/html/Elements/MyRT:99]
        [/opt/rt4/share/html/index.html:78]
        [/opt/rt4/sbin/../lib/RT/Interface/Web.pm:681]
        [/opt/rt4/sbin/../lib/RT/Interface/Web.pm:369]
        [/opt/rt4/share/html/autohandler:53] (/opt/rt4/sbin/../lib/RT.pm:959)
[31147] [Fri Oct 19 14:03:35 2018] [info]: <rt-4.2.15-31147-1539957815-1333.118562-16-0@Help>

I’m not a sysadmin so unfortunately can’t really suggest anything to do, but it looks like you’re out of memory. Both your RAM and Swap are saturated. Look in the system logs for hints (/var/log/messages maybe?)

I know nginx is the rage, but web servers are finicky, and I’ve had the most success with mod_perl and mod_fcgid:

https://docs.bestpractical.com/rt/4.4.3/web_deployment.html#mod_fcgid

Your log you’re showing here isn’t relevant, those aren’t errors, those are warnings and notices.

Anyways, something is sucking up all your RAM (usually a memory leak), and my initial reaction would be to look somewhere other than RT. Is this server dedicated to RT?

Your RT database name (help_rt) is not standard, so that also leads me to believe there are some customizations going on here. Is there custom functionality in this RT?

We are using this setup.

https://docs.bestpractical.com/rt/4.4.3/web_deployment.html#nginx

The only thing I am noticing is attachment table is 12GB. Is that too large for RT application? Would compression fix it on make it worse?

[root@help help_rt]# for f in $(find . -type f); do du -h $f; done;
12K     ./ACL.frm
12M     ./Attributes.ibd
12K     ./CustomFields.frm
96K     ./ObjectTopics.ibd
96K     ./ObjectCustomFields.ibd
12K     ./Links.frm
96K     ./Topics.ibd
12G     ./Attachments.ibd
12K     ./sessions.frm
12K     ./Articles.frm
12K     ./ObjectTopics.frm
15M     ./Users.ibd
104M    ./Groups.ibd
12K     ./ObjectScrips.frm
12K     ./GroupMembers.frm
12K     ./Scrips.frm
12K     ./Queues.frm
21M     ./sessions.ibd
16K     ./Users.frm
12K     ./Topics.frm
12K     ./Classes.frm
68M     ./ObjectCustomFieldValues.ibd
12K     ./Templates.frm
12K     ./Transactions.frm
96K     ./ScripActions.ibd
128K    ./Templates.ibd
12K     ./ObjectCustomFieldValues.frm
12K     ./ScripConditions.frm
20M     ./Links.ibd
96K     ./ObjectClasses.ibd
112K    ./ObjectScrips.ibd
12K     ./ScripActions.frm
341M    ./Transactions.ibd
96K     ./Articles.ibd
12K     ./ObjectCustomFields.frm
144K    ./CustomFieldValues.ibd
12K     ./Tickets.frm
96K     ./CustomFields.ibd
12K     ./Attributes.frm
176M    ./CachedGroupMembers.ibd
12K     ./CustomFieldValues.frm
96K     ./Scrips.ibd
12K     ./Attachments.frm
36M     ./Principals.ibd
12K     ./Groups.frm
112K    ./ACL.ibd
12K     ./ObjectClasses.frm
44M     ./GroupMembers.ibd
4.0K    ./db.opt
96K     ./ScripConditions.ibd
12K     ./CachedGroupMembers.frm
48M     ./Tickets.ibd
96K     ./Classes.ibd
128K    ./Queues.ibd
12K     ./Principals.frm
mysql> desc select * from Attachments;
+----+-------------+-------------+------+---------------+------+---------+------+---------+-------+
| id | select_type | table       | type | possible_keys | key  | key_len | ref  | rows    | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+---------+-------+
|  1 | SIMPLE      | Attachments | ALL  | NULL          | NULL | NULL    | NULL | 2212640 |       |
+----+-------------+-------------+------+---------------+------+---------+------+---------+-------+
1 row in set (0.00 sec)

Its not so much the size of the table, but the size of the any attachments in it. If you’ve got hundreds of thousands of small attachments, that would be of less concern to me that one or two huge ones.

You might want to see if you’ve got one or two really large attachments that are causing you memory issues. Try a MySQL query such as:

select Attachments.Filename as Filename, Transactions.ObjectType as ObjectType, Transactions.ObjectId as ObjectId, length(Attachments.Content) from Attachments, Transactions where Attachments.TransactionId = Transactions.id order by length(Content) desc limit 10;

This should give you your Top Ten largest attachments (and hopefully ticket Ids that they are attached to). If you’ve got any really large ones, that might be causing resource exhaustion whilst pulling them from the database, through RT and the web server to the client. You may then want to check if they are something that really needs to be attached to a ticket (or if the ticket is still open even).

thanks i tried that. there were a few tickets with >20mb and > 15mb attachments. We do get desktop support sending tickets in and attaching the log files from desktops e.t.c.

i removed them. no luck. I will look around to see if I can find tickets that were sent in the last month that have strange attachments included in them.

system RAM is as per below and its still hiccuping.

            total       used       free     shared    buffers     cached
Mem:          4.9G       1.6G       3.2G        32K       142M       596M
-/+ buffers/cache:       914M       4.0G
Swap:         3.9G        19M       3.9G

Well that doesn’t appear to show exhaustion of the RAM. Does nginx log anything when it crashes? Those logs you sent above don’t really show anything aside from warnings and notices RT is sending out (which are fine) aside from this line:

Child[0] died, respawn

Now I don’t run nginx, so I don’t know if that’s a normal part of the webserver running or not.

When you say,

is there anything common to what is being accessed in RT at the time? Ie is it always when someone is interacting with one (or a limited number) of tickets, or what a member of a particular group is using it, or is it completely random?

One last ditch option you have is to try replacing nginx with Apache, as lots of folk are running that with big RT instances and thus its known to work.