Making "make initdb" idempotent?

Hi all

I’m in the process of packing up various small bits of RT code for our
RT install into an extension that can be properly versioned and tracked.

There’s just one frustrating barrier to this: RT’s extension system
provides a “make initdb”, but no way to make the application of the
initial data provided in etc/initialdata idempotent. If you run “make
initdb” twice, it duplicates everything, not just adding any new records.

Is this a problem anyone else has looked into? I’m considering teaching
“make initdb” to do a basic upsert: Lock the table in question in
EXCLUSIVE mode (PostgreSQL, allows SELECTs but nothing else in other
transactions), then do the usual portable two-command sequence:

UPDATE thetable
SET description = ‘thedescription’, …
WHERE name = ‘thename’;

INSERT INTO thetable(name, description, …)
SELECT ‘thename’, ‘thedescription’, …
WHERE NOT EXISTS (SELECT 1 FROM thetable WHERE thetable.name = ‘thename’);

For other DBs, just do a “LOCK tablename;” (unless MySQL, etc, users
want to suggest an appropriate lock mode).

The problem with this is that the name isn’t really meant to be a key,
for all sorts of reasons. It’s user-visible, it’s in natural language,
and it’s translatable.

It’d be strongly preferable to add a new field to the initdb parameter
hash that provides a non-translatable internal text identifier for the
command of interest. Here’s what I’m thinking of:

  • Add a new nullable unique text column to scrips, scripactions, and
    scripconditions named ‘extension_identifier’.

  • Have ‘make initdb’ check for a hash param ‘ExtensionIdentifier’ in
    each row. If none is present, behave exactly as now, leaving the
    extension_identifier column NULL.

  • If the ExtensionIdentifier key is found, Do an upsert with table lock
    as described above for each row when we ‘make initdb’, but instead of
    using the ‘name’ field, discriminate based on the ‘extension_identifier’
    column.

This would allow extensions to provide a unique identifier for their
scrips, scrip actions, scrip conditions, etc, that if used consistently
would let them make the initdb command idempotent, so it can be used
when an extension is updated, not just first installed.

Guidance for extension authors would suggest that the extension’s name
be used to prefix the extension identifier so as to avoid clashes.

No support for deleting obsolete entries would be provided; if an entry
vanishes from the initialdata but was installed from a previous version,
it would not be touched in the DB. That’s let extensions continue to
provide support for old actions, conditions, etc without exposing them
to new users.

By using the extension name and a text identifier the need to have a
central identifier registry is avoided; CPAN’s Perl namespace registry
already serves the required purpose.

Thoughts/opinions?

Would the RT team consider accepting a patch implementing the above as a
change to RTxInitDB ?

If so, how would you deal with the addition of the new column to each of
the tables? Do it as part of the upgrade script for the next RT release
that needs an upgrade script anyway, and if it isn’t present, ignore the
ExtensionIdentifier field in the initialdata for backward compatibility?
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Hi,

Some of what you mention may be partially implemented in a branch (upgrade
history). Note that “Don’t create if it exists” concept is not something
we’re targeting at. Our approach is to register upgrade files so they can
not be applied twice so easy.On Fri, Jun 21, 2013 at 8:52 AM, Craig Ringer craig@2ndquadrant.com wrote:

Hi all

I’m in the process of packing up various small bits of RT code for our
RT install into an extension that can be properly versioned and tracked.

There’s just one frustrating barrier to this: RT’s extension system
provides a “make initdb”, but no way to make the application of the
initial data provided in etc/initialdata idempotent. If you run “make
initdb” twice, it duplicates everything, not just adding any new records.

Is this a problem anyone else has looked into? I’m considering teaching
“make initdb” to do a basic upsert: Lock the table in question in
EXCLUSIVE mode (PostgreSQL, allows SELECTs but nothing else in other
transactions), then do the usual portable two-command sequence:

UPDATE thetable
SET description = ‘thedescription’, …
WHERE name = ‘thename’;

INSERT INTO thetable(name, description, …)
SELECT ‘thename’, ‘thedescription’, …
WHERE NOT EXISTS (SELECT 1 FROM thetable WHERE thetable.name = ‘thename’);

For other DBs, just do a “LOCK tablename;” (unless MySQL, etc, users
want to suggest an appropriate lock mode).

The problem with this is that the name isn’t really meant to be a key,
for all sorts of reasons. It’s user-visible, it’s in natural language,
and it’s translatable.

It’d be strongly preferable to add a new field to the initdb parameter
hash that provides a non-translatable internal text identifier for the
command of interest. Here’s what I’m thinking of:

  • Add a new nullable unique text column to scrips, scripactions, and
    scripconditions named ‘extension_identifier’.

  • Have ‘make initdb’ check for a hash param ‘ExtensionIdentifier’ in
    each row. If none is present, behave exactly as now, leaving the
    extension_identifier column NULL.

  • If the ExtensionIdentifier key is found, Do an upsert with table lock
    as described above for each row when we ‘make initdb’, but instead of
    using the ‘name’ field, discriminate based on the ‘extension_identifier’
    column.

This would allow extensions to provide a unique identifier for their
scrips, scrip actions, scrip conditions, etc, that if used consistently
would let them make the initdb command idempotent, so it can be used
when an extension is updated, not just first installed.

Guidance for extension authors would suggest that the extension’s name
be used to prefix the extension identifier so as to avoid clashes.

No support for deleting obsolete entries would be provided; if an entry
vanishes from the initialdata but was installed from a previous version,
it would not be touched in the DB. That’s let extensions continue to
provide support for old actions, conditions, etc without exposing them
to new users.

By using the extension name and a text identifier the need to have a
central identifier registry is avoided; CPAN’s Perl namespace registry
already serves the required purpose.

Thoughts/opinions?

Would the RT team consider accepting a patch implementing the above as a
change to RTxInitDB ?

If so, how would you deal with the addition of the new column to each of
the tables? Do it as part of the upgrade script for the next RT release
that needs an upgrade script anyway, and if it isn’t present, ignore the
ExtensionIdentifier field in the initialdata for backward compatibility?

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Best regards, Ruslan.

Hi,

Some of what you mention may be partially implemented in a branch
(upgrade history). Note that “Don’t create if it exists” concept is not
something we’re targeting at. Our approach is to register upgrade files
so they can not be applied twice so easy.

Hm. That’d work if an extension could provide multiple upgrade files, so
each new version could add new upgrade files.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Hi,

Some of what you mention may be partially implemented in a branch
(upgrade history). Note that “Don’t create if it exists” concept is not
something we’re targeting at. Our approach is to register upgrade files
so they can not be applied twice so easy.

Hm. That’d work if an extension could provide multiple upgrade files, so
each new version could add new upgrade files.

Our extensions do this. Each version that needs DB upgrade provides a
separate upgrade dir with schema, content… like RT itself.


Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Best regards, Ruslan.

> Hi,
>
> Some of what you mention may be partially implemented in a branch
> (upgrade history). Note that "Don't create if it exists" concept
is not
> something we're targeting at. Our approach is to register upgrade
files
> so they can not be applied twice so easy.
>

Hm. That'd work if an extension could provide multiple upgrade files, so
each new version could add new upgrade files.

Our extensions do this. Each version that needs DB upgrade provides a
separate upgrade dir with schema, content… like RT itself.

OK, that sounds interesting. Are there any docs on this? The RT
extensions I’ve looked into, like RT::Extension::SLA, don’t seem to use
this mechanism.

Are you talking about a new facility that’s added as part of the branch
you describe, or something supported in mainline?

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Our extensions do this. Each version that needs DB upgrade provides a
separate upgrade dir with schema, content… like RT itself.

OK, that sounds interesting. Are there any docs on this? The RT
extensions I’ve looked into, like RT::Extension::SLA, don’t seem to use
this mechanism.

Are you talking about a new facility that’s added as part of the branch
you describe, or something supported in mainline?

RTIR does it, as did RTFM before it was cored.

RT-Extension-SLA contains etc/upgrade/0.06/content, so works like core.

I have branches for RT and Module-Install-RTx which will allow ‘make
upgrade-database’ to work.

RT’s master branch also has new code to track each of these
inserts/updates. This will be new in 4.2.

-kevin