POSTGRESQL DATABASE MAINTENANCE

 Routine

backup of daily database maintenance work , regular “clean up” database, periodic log file management
check_postgres can be used to detect database health and report abnormal conditions

1. Daily cleaning

PostgreSQL database requires periodic cleaning and maintenance. For many installations, it is sufficient to have the automatic cleanup daemon perform cleanup

1.1. Basic knowledge of cleaning

The PostgreSQL VACUUM command must periodically process each table for several reasons:
1. Restore or reuse the disk space occupied by updated or deleted rows.
2. Update the data statistics used by the PostgreSQL query planner.
3. Update the visibility map, it can speed up the scan using only the index.
4. Protect old data from being lost due to transaction ID rewinding or multi-transaction ID rewinding.
There are two variants of VACUUM: standard VACUUM and VACUUM FULL.
VACUUM FULL can reclaim more disk space but runs slower. VACUUM FULL requires an exclusive lock on the table it works on, so it cannot be paralleled with other uses of this table.
Standard form of VACUUM can run in parallel and production database operations (SELECT, INSERT, UPDATE, and DELETE commands will continue to work normally, but during the clean up so you can not
define using the ALTER TABLE command to update the table, etc.).
VACUUM will generate a lot of I/O traffic, which will cause poor performance of other active sessions.
-bash-4.2$ vacuumdb — help
vacuumdb cleans and analyzes a PostgreSQL database.

Usage:
vacuumdb [OPTION]… [DBNAME]

Options:
-a, — all vacuum all databases
-d, — dbname=DBNAME database to vacuum
-e, — echo show the commands being sent to the server
-f, — full do full vacuuming
-F, — freeze freeze row transaction information
-j, — jobs=NUM use this many concurrent connections to vacuum
-q, — quiet don’t write any messages
-t, — table=’TABLE[(COLUMNS)]’ vacuum specific table (s) only
-v, — verbose write a lot of output
-V, — version output version information, then exit
-z, — analyze update optimizer statistics
-Z, — analyze-only only update optimizer statistics; no vacuum
— analyze-in-stages only update optimizer statistics, in multiple
stages for faster results; no vacuum
-?, — help show this help, then exit

Connection options:
-h, — host=HOSTNAME database server host or socket directory
-p, — port=PORT database server port
-U, — username=USERNAME user name to connect as
-w, — no-password never prompt for password
-W, — password force password prompt
— maintenance-db=DBNAME alternate maintenance database

Read the description of the SQL command VACUUM for details.

Report bugs to <pgsql-bugs@postgresql.org>.
-bash-4.2$ vacuumdb -d hq -f -z -v >> /tmp/vacuumdb.log
INFO: vacuuming “pg_catalog.pg_statistic”
INFO: “pg_statistic”: found 127 removable, 402 nonremovable row versions in 26 pages

1.2. Recover disk space

In PostgreSQL, UPDATE or DELETE of a row does not immediately remove the old version of the row. This method is necessary to benefit from multi-version concurrency control (MVCC):
when the old version may still be visible to other transactions, it cannot be deleted. But in the end, any transaction is no longer on an outdated or deleted row version invigoration
interest. The space it occupies must be reclaimed for new lines, so as to avoid unlimited growth in disk space requirements. This is done by running VACUUM.
The standard form of VACUUM removes the dead row version in tables and indexes and marks the space as reusable in the future. However, it will not return the space to the operating system unless in
exceptional circumstances one or more pages at the end of the table become completely free and an exclusive table lock can be easily obtained.
On the contrary, VACUUM FULL actively shrinks the table by writing the contents of the dead space into a complete new version of the table file. This will minimize the size of the table, but it will take longer.
It also requires additional disk space for a new copy of the table until the operation is complete.
The general goal of routine cleanup is to do more standard VACUUM to avoid the need for VACUUM FULL.
Tip: When a table contains a large number of dead row versions due to a large number of update or delete activities, pure VACUUM may not be satisfactory.
If you have such a table and you need to reclaim the excess disk space it takes up, you will need to use VACUUMFULL, or CLUSTER, or one of the table rewrite variants of ALTER TABLE.
Tip: If you have a table, its entire content will be deleted periodically, consider using TRUNCATE instead of DELETE and then VACUUM.

1.3. Update planner statistics

The PostgreSQL query planner relies on statistical information about table contents to generate good plans for queries.
These statistics are collected by the ANALYZE command, which can be called as an optional step of VACUUM in addition to being called directly.
It is important to have moderately accurate statistics, otherwise poor planning may degrade database performance.
If the automatic cleanup daemon is enabled, when the content of a table is changed enough, it will automatically issue the ANALYZE command.
Tip: The automatic cleanup daemon will not issue the ANALYZE command for external tables because it cannot determine an appropriate frequency.

1.4. Update visibility map

The cleanup mechanism maintains a visibility map for each table, which is used to track which pages only contain tuples that are visible to all active transactions.
This has two purposes.
First, the cleanup itself can skip such pages in the next run, because nothing in it needs to be cleaned up.
Second, this allows PostgreSQL to answer some queries that only use indexes without having to refer to the underlying table.

1.5. Prevent transaction ID rollback failure

PostgreSQL’s MVCC transaction semantics rely on being able to compare transaction ID (XID) numbers.
If the inserted XID of a row version is greater than the XID of the current transaction, it is “belonging to the future” and should not be visible to the current transaction.
It is necessary to clean up every table in every database at least every 2 billion transactions.
The reason that periodic cleanup can solve the problem is that VACUUM will mark the rows as frozen, which means that they were inserted by a transaction submitted in the past far enough,
so from the MVCC perspective, the effect is that the insert transaction It is of course visible to all current and future affairs.
PostgreSQL retains a special XID (FrozenTransactionId), this comparison does not follow the rules of ordinary XID and XID is always considered than any
where ordinary XID is older. Ordinary XID uses modulo-232 algorithm to compare.
VACUUM usually skips pages that do not contain any dead row versions, but does not skip pages that contain row versions with old XID values.
To ensure that all old row versions have been frozen, a scan of the entire table is required. vacuum_freeze_table_age control VACUUM when to do this: If the table was
too vacuum_freeze_table_age minus vacuum_freeze_min_age transaction has not yet been fully scanned, it will force a full table clean. Setting this parameter to 0
will force VACUUM to always scan all pages and actually ignore the visibility map.
The datfrozenxid column of a database’s pg_database row is the lower bound of the unfrozen XID that appears in that database — it’s just the minimum value of the relfrozenxid value for each table in the database.

SELECT c.oid::regclass as table_name,
greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as age
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN (‘ r’,’m’);
SELECT datname, age(datfrozenxid) FROM pg_database;

The age column measures the number of transactions from the truncated XID to the current transaction XID.
VACUUM usually only scans pages that have been modified since the last cleanup, but relfrozenxid can only be advanced when the full table is scanned.
When relfrozenxid is older than vacuum_freeze_table_age transactions, when the FREEZE option of VACUUM is used, or when all pages happen to require cleanup to remove the dead row version, the full table will be scanned.
If for some reason automatic cleanup fails to clear the old XID from a table, the system will start issuing a warning message like this when the database reaches 10 million transactions between the oldest XID and the rewind point:
WARNING: database “mydb” must be vacuumed within 177009986 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in “mydb”.
If these warnings are ignored, once there are only 1 million transactions left before the rollback point, the The system will shut down and refuse to start any new transactions:
ERROR: database is not accepting commands to avoid wraparound data loss in database mydb
HINT: Stop the postmaster and vacuum that database in single-user mode.
As soon as the system enters safe shutdown mode , It will not execute the command. The only way to do this is to stop the server and start the server as a single user to execute VACUUM.
1.5.1. Multiple transactions and rollbacks
Multixact ID is used to support rows locked by multiple transactions.
As long as more than one transaction locks a row concurrently, the lock information will be encoded using a “multiple transaction ID” (or simply multiple transaction ID).
As a safety device, a full table cleanup scan will occur for any table whose multi-transaction age exceeds autovacuum_multixact_freeze_max_age.
If the used member storage space exceeds 50% of the total, the full table cleanup scan will also be performed on all tables gradually, which will start with the tables with the oldest multi-transaction age.

1.6. Automatically clean up background processes

PostgreSQL has an optional but highly recommended feature autovacuum, its purpose is to automatically execute VACUUM and ANALYZE commands.
When it is enabled, automatic cleanup checks for tables that have been heavily inserted, updated, or deleted. These checks make use of statistical information collection functions,
so unless track_counts is set to true, automatic cleanup cannot be used. In the default configuration, automatic cleaning is enabled and related configuration parameters have been correctly configured.
#autovacuum = on # Enable autovacuum subprocess?’on’
# requires track_counts to also be on.
#track_counts = on
“Automatically clean up background processes” actually consists of multiple processes. There is a resident background process called the automatic cleanup initiator, which is responsible for starting the automatic cleanup worker process for all databases.

2. Daily rebuild index

In some cases, it is worth periodically using the REINDEX command or a series of independent reconstruction steps to rebuild the index.
B-tree index pages that have become completely empty are reclaimed and reused. However, there is still a possibility of inefficient space utilization: if all keys except a few index keys on a page are deleted, the page is still allocated.
Therefore, in this usage mode in which most but not all keys in each range are eventually deleted, it can be seen that the space usage is very poor. For
such usage patterns, periodic reindexing is recommended.
The possible expansion of non-B-tree indexes has not been well quantified. It is a good idea to regularly monitor the physical size of the index when using non-B-tree indexes.
For a B-tree index, a newly created index is accessed slightly faster than an index that has been updated multiple times, because in the newly created index, logically adjacent pages are usually also physically adjacent
. Just to improve access speed is also worth re-indexing regularly.
REINDEX can be used safely and easily in all situations. But since the command requires an exclusive table lock, a better approach is to use a sequence of creation and replacement steps to perform index reconstruction.
Index types that support CREATE INDEX with the CONCURRENTLY option can be rebuilt in this way.

3. Log file maintenance

It is a good idea to keep the log output of the database server in one place, rather than just discarding them via /dev/null. When diagnosing problems, log output is very valuable.
If you simply direct stgrer of postgres to a file, you will get log output, but the only way to truncate the log file is to stop and restart the server. This is
a better way is to send the server’s stderr output to some kind of log rotation program.

logging_collector is true in postgresql.conf


Comments

Popular posts from this blog

PostgreSQL - Architecture

PostgreSQL - Backup & Point In Time Recovery