Externalizing Attachments?

3 questions about externalizing…

1. Where to put the Cutoff?

According to the doc there is a line in the sand for what to put into external storage and what to keep in the db. This line is adjusted with $ExternalStorageCutoffSize. So there is a default value but how is that selected. There is of course plenty of things to consider when selecting where to put this line. But thinking about this I consider moving virtually everything out of the db and into regular files. What are the penalties of this?

Without knowing exactly how the size is calculated I assumed that it was the size of Content and produced the following query to get an idea about where to put the limit:

mysql> SELECT
    ->     digits,
    ->     cnt,
    ->     FLOOR(POW(10,digits+0.5)*cnt) AS sum_bytes,
    ->     FLOOR(POW(10,digits+0.5)*cnt/(1024*1024)) AS sum_Mb,
    ->     FLOOR(POW(10,digits+0.5)*cnt/(1024*1024*1024)) AS sum_Gb
    -> FROM (
    ->     SELECT
    ->         FLOOR(LOG10(CHAR_LENGTH(Content))) as digits,
    ->         count(*) AS cnt
    ->     FROM Attachments
    ->     GROUP BY 1 
    ->     ORDER BY 1) foo;
+--------+--------+-------------+--------+--------+
| digits | cnt    | sum_bytes   | sum_Mb | sum_Gb |
+--------+--------+-------------+--------+--------+
|   NULL | 375821 |        NULL |   NULL |   NULL |
|      0 |  19620 |       62043 |      0 |      0 |
|      1 |  55758 |     1763222 |      1 |      0 |
|      2 | 837271 |   264768337 |    252 |      0 |
|      3 | 612762 |  1937723583 |   1847 |      1 |
|      4 |  41616 |  1316013471 |   1255 |      1 |
|      5 |  25654 |  8112507109 |   7736 |      7 |
|      6 |   7971 | 25206515229 |  24038 |     23 |
|      7 |    114 |  3604996532 |   3437 |      3 |
+--------+--------+-------------+--------+--------+
9 rows in set, 16432 warnings (4 min 3,97 sec)
  • The log-column is the length-1 of the size of Content, in other words The size of the row.
  • The cnt-column is the count of rows of a specific size.
  • The sum-columns is calculated as the estimated average (+0.5) size times the number of rows of that size.

So if we set the cutoff at 10Kb (digits = 4) 37Gb will be moved out of the db and 1 will be left). This would be a huge benefit when handling and doing backups.

Where is the drawbacks of this?

2. Reverting externalized objects?

Is there a way to move objects back into the db ones they have been externalized?

3. More Cutoff-options?

To rt-shredder you can define several rules of what to shred so that only object older than X days are shredded. Is there some similar thing for rt-externalize-attachments? Our tickets have an active life of at most one month I would say, but we need to keep all for backtracking. So for us it would be convenient to externalize everything older than a ~month regardless of size since there is no speed-requirements on those old tickets. How could this be achieved?

1 Like

Regarding number 3: If it helps anyone I have added 3 more options (--age, --batchsize and --dry-run) to rt-externalize-attachments and made a pullrequest out of it: