Skip to main content

Huge archive generation with PostgreSQL cluster??

 Recently I found huge archive logs generation on one of the servers(for my client). Around 3500 files per day and it needs 3500 * 16 MB = 54GB space every day and space is low on the server. First thing came to my mind was checkpoint parameters, I had a look at them and they were at default. Hmm, so had changed them as below:


checkpoint_segments = 180
checkpoint_timeout = 30min
checkpoint_completion_target = 0.9 

Monitored for couple of days and found that archive logs generation decreased to 50%(around 1700 files). 

With the above parameter settings, archive log generation should be minimal. However if not, Here are some general suggestions to look at:

-- Make sure that you don't UPDATE a row to the same values it already has. The row will be logged even if old and new values match. One way to handle this is to make sure that any UPDATE statements which might set a column to its existing value include in the WHERE clause a test that oldvalue IS DISTINCT FROM new value. Or you can consider creating trigger on problem tables using the Postgres built-in function suppress_redundant_updates_trigger (you might want to look here: http://www.postgresql.org/docs/9.3/static/functions-trigger.html) 

-- Use one UPDATE statement instead of several where possible. If updates to different columns or from multiple processing steps can be combined, this can reduce WAL logging.

-- Consider using UNLOGGED or TEMPORARY tables where possible. For example, if data is inserted into the database and updated right away, perform the initial steps in an UNLOGGED or TEMPORARY table, and INSERT into the main table once the data has been prepared.

-- Identify and eliminate unused indexes. Index changes are WAL-logged so that indexes can be recovered on a crash. The storage space needed for indexes can be more than the storage space needed for the data itself, with a corresponding increase in generated WAL.

Comments

Popular posts from this blog

PostgreSQL Vacuum and Vacuum full are not two different processes

  PostgreSQL’s   VACUUM   and   VACUUM FULL   are not separate processes but rather different operational modes of the same maintenance command. Here’s why: Core Implementation Both commands share the same underlying codebase and are executed through the  vacuum_rel()  function in PostgreSQL’s source code ( src/backend/commands/vacuum.c ). The key distinction lies in the  FULL  option, which triggers additional steps: Standard  VACUUM : Removes dead tuples (obsolete rows) and marks space reusable  within PostgreSQL Updates the visibility map to optimize future queries Runs concurrently with read/write operations VACUUM FULL : Rewrites the entire table into a new disk file, compressing it and reclaiming space for the  operating system Rebuilds all indexes and requires an  ACCESS EXCLUSIVE  lock, blocking other operations Key Differences in Behavior Aspect Standard VACUUM VACUUM FULL Space Reclamation Internal reuse onl...

PostgreSQL Health Check - Performance Audit and Recommendations

   PostgreSQL Server Capacity Planning and Optimization: Conduct a thorough assessment of current and projected workload demands on the database server's critical resources. This evaluation should cover key factors such as CPU utilization, RAM consumption, storage requirements, I/O performance, and network bandwidth usage. Consider both short-term fluctuations and long-term usage patterns to accurately forecast future needs and potential bottlenecks. Collect and analyze performance metrics over extended periods, including peak and off-peak hours, to gain a comprehensive understanding of server behavior. Use appropriate monitoring tools to gather detailed data on query execution times, resource utilization patterns, and system responsiveness under varying loads. Identify recurring performance issues, resource contention points, or capacity limitations that may affect the overall efficiency of the PostgreSQL environment. Based on this analysis, develop actionable recommendations...

PostgreSQL - Architecture

When you start PostgreSQL`s instance (via pg_ctl start) the main process of the databases is started. In the past versions it was known as the postmaster process and now days it is called postgres. When this processes is started it allocates memory for two main things : shared memory and other background processes. The shared memory is a memory that the database uses mainly for cashing. Its main goal is to reduce the I/O operations and as a result of that improve the performance of the database. The other processes that the postgres process starts are responsible for the ongoing work of the database.  PostgreSQL`s architecture is based on a client/server communication. In the server side there is a server process that manages the files, database operations, connection and so on.. On the client side there is a tool/program which can be a graphical tool (gui)  like pgAdmin or a command line tool(cli) like psql  that is repnsible for connection the database and all...