Saturday, 18 January 2020

Tablespace in Postgresql


#Command to find the size of all Databases : 

postgres=# SELECT datname,pg_size_pretty(pg_database_size(datname)) FROM pg_database;
     datname     | pg_size_pretty
-----------------+----------------
 template1            | 6393 kB
 template0            | 6385 kB
 postgres               | 6501 kB
 test1                     | 6517 kB
 testdb1                  | 6393 kB
 mydb                    | 6501 kB
 foo                        | 6501 kB
 example_backups | 6517 kB


# Command to find the size of all tablespaces : 

postgres=# SELECT spcname, pg_size_pretty(pg_tablespace_size(spcname)) FROM pg_tablespace;
  spcname   | pg_size_pretty
------------+----------------
 pg_default | 44 MB
 pg_global  | 437 kB
 data_tbs   | 6509 kB
(3 rows)

# Command to find the size of particular tablespace : 

postgres=# SELECT pg_size_pretty(pg_tablespace_size('data_tbs'));
 pg_size_pretty
----------------
 6509 kB

About tablespaces in Postgresql : 

Tablespaces in PostgreSQL allow database administrators to define locations in the file system where the files representing database objects can be stored. Once created, a tablespace can be referred to by name when creating database objects.
By using tablespaces, an administrator can control the disk layout of a PostgreSQL installation. This is useful in at least two ways. First, if the partition or volume on which the cluster was initialized runs out of space and cannot be extended, a tablespace can be created on a different partition and used until the system can be reconfigured.
Second, tablespaces allow an administrator to use knowledge of the usage pattern of database objects to optimize performance. For example, an index which is very heavily used can be placed on a very fast, highly available disk, such as an expensive solid state device. At the same time a table storing archived data which is rarely used or not performance critical could be stored on a less expensive, slower disk system.

=>Tablespaces in PostgreSQL allow database administrators to define locations in the file system
   where the files representing database objects can be stored.
=>An administrator can control the disk layout of a PostgreSQL installation.
=>Creation of the tablespace itself must be done as a database superuser.
=>Tables, indexes, and entire databases can be assigned to particular tablespaces.

To create tablespace:
os level directory must create  one folder and change postgres user Permission's.
1.mkdir /tbs1
2.chown -R postgres:postgres /tbs
3.CREATE TABLESPACE fastspace LOCATION '/mnt/sda1/postgresql/data';
4.CREATE TABLESPACE tbs1 LOCATION '/tbs1;
alter database data11 set tablespace tbs1;


SET default_tablespace = fastspace;
CREATE TABLE foo(i int);
create table t1(id int) tablespace tbs1;

Default two tablespace Will create after database create
1.PG_GLOBAL
2.PG_DEFAULT

 =>The pg_global tablespace is used for shared system catalogs.
 =>The pg_default tablespace is the default tablespace of the template1 and template0 databases.


 TOAST (The Oversized-Attribute Storage Technique):-

Looking at page size it is easy to discover that some data cannot be stored in such a small space.
For these cases there is a mechanism called TOAST.

By default PostgreSQL has two variables, toast_tuple_threshold and toast_tuple_target with value 2K.
 When a tuple is being stored and is larger than 2K, the fields where it can be applied (not all of them apply to TOAST) are stored in a TOAST table.


PostgreSQL uses a fixed page size (commonly 8Kb), and does not allow tuples to span multiple pages,
it's not possible to store very large field values directly.

Physical Layout :

All the data needed for a database cluster is stored within the cluster's data directory, commonly referred to as PGDATA

PG_VERSION A file containing the major version number of PostgreSQL
base Subdirectory containing per-database subdirectories
global Subdirectory containing cluster-wide tables, such as pg_database
pg_clog Subdirectory containing transaction commit status data
pg_subtrans Subdirectory containing subtransaction status data
pg_tblspc Subdirectory containing symbolic links to tablespaces
pg_xlog Subdirectory containing WAL (Write Ahead Log) files
postmaster.opts A file recording the command-line options the postmaster was last started with
postmaster.pid A lock file recording the current postmaster PID and shared memory segment ID
 (not present after postmaster shutdown)


Free Space Map:

Each heap and index relation, except for hash indexes, has a Free Space Map (FSM)
  --to keep track of available space in the relation.
The Free Space Map is organized as a tree of FSM pages.
The bottom level FSM pages store the free space available on each heap (or index) page.
The contrib/pg_freespacemap module can be used to examine the information stored in free space maps (see Section F.26).
SELECT * FROM pg_freespace('foo');

Visibility Map:

Each heap relation has a Visibility Map (VM) to keep track of which pages contain only tuples that are known to be
 visible to all active transactions.



 To view tablespace information:-

SELECT spcname, spclocation FROM pg_tablespace;

 \db+


Rename:-
 ------
alter tablespace tbs1 rename to tbsnew

privilage
alter tablespace tbs1 owner to scott;

set for database
alter tablespace tbs1 set (default_tablespace='tbs');

reset
alter tablepace tbs1 reset default_tablespace;

Drop

drop tablespace tbs1;


Disk space occupied by a tablespace
select pg_size_pretty(pg_tablespace_size('tbs1'));

/u02/tbs1/du -c -h


SELECT spcname, spclocation FROM pg_tablespace;


select
  tablename
, pg_relation_size(tablename)
, pg_size_pretty(pg_relation_size(tablename) ) as relsize
, pg_size_pretty(pg_total_relation_size(tablename) ) as disksize
, pg_total_relation_size(tablename)
from pg_tables where schemaname <> 'information_schema'
order by 2 desc



SELECT
    *
FROM pg_class
    JOIN pg_tablespace ON reltablespace = pg_tablespace.oid
WHERE
    spcname <> 'name of tablespace';

Tuesday, 14 January 2020

Oracle Vs PostgreSQL

I.Oracle Tools equivalence to postgresql :-
1.SQLPLUS : PSQL but much more
2.TOAD / Oracle SQL Developper : TORA or pgAdmin
3.EXPLAIN PLAN : EXPLAIN ANALYZE
4.ANALYZE TABLE : ANALYZE
5.Cold backup : both are file system backup
6.Hot backup : REDOLOGS = ARCHIVELOGS
7.Logical Export : exp = pg_dump
8.Logical Import : imp = pg_restore or psql
9.SQL Loader : pgLoader
10.RMAN : Barman or Pitrery
11.AUDIT TRAIL : pgAudit
12.Pooling / Dispatcher :
– PgBouncer
– PgPool
13.Active Data Guard :
– PostgreSQL master / slave replication
– Slony
14.Replication master / master :
– PostgreSQL-XC
– Bucardo
15.Logical replication :
– PostgreSQL 9.5 / 10
– Slony
16.RAC Horizontal scaling : PostgreSQL-XC – PostgreSQL-XL - plProxy, pg_shard
17.Oracle => Postgres Plus Advanced Server
– Same as PostgreSQL but with proprietary code and database feature compatibility for Oracle.
– Compatible with applications written for Oracle.
– No need to rewrite PL/SQL into PLPGSQL
– Applications written for Oracle run on Postgres Plus Advanced Server without modification.

II.Postgres Monitoring / Audit tools:-
1.PgBadger: A fast PostgreSQL log analyzer

2.PgCluu: PostgreSQL and system performances monitoring and auditing tool

3.Powa: PostgreSQL Workload Analyzer. Gathers performance stats and provides real-time charts and graphs to help monitor and tune your PostgreSQL servers. Similar to Oracle AWR.

4.PgObserver: monitor performance metrics of different PostgreSQL clusters.

5.OPM: Open PostgreSQL Monitoring. Gather stats, display dashboards and send warnings when something goes wrong. Tend to be similar to Oracle Grid Control.

6.check_postgres: script for monitoring various attributes of your database. It is designed to work with Nagios, MRTG, or in standalone scripts.

7.Pgwatch: monitor PostgreSQL databases and provides a fast and efficient overview of what is really going on.

8.pgAgent:pgAgent is a job scheduler for PostgreSQL which may be managed using pgAdmin. Prior to pgAdmin v1.9, pgAgent shipped as part of pgAdmin. From pgAdmin v1.9 onwards, pgAgent is shipped as a separate application.

9.pgbouncer:PgBouncer is a lightweight connection pooler for PostgreSQL.It contains the connection pooler and it is used to establish connection between application and database

III.oracle vs postgres data types comparison
1.VARCHAR2 : VARCHAR or TEXT
2.CLOB, LONG : VARCHAR or TEXT
3.NCHAR, NVARCHAR2, NCLOB : VARCHAR or TEXT
4.NUMBER : NUMERIC or BIGINT or INT or SMALLINT or DOUBLE PRECISION or REAL (bug potential)
5.BINRAY FLOAT/BINARY DOUBLE : REAL/DOUBLE PRECISION
6.BLOB, RAW, LONG RAW : BYTEA (additional porting required)
7.DATE : DATE or TIMESTAMP

Wednesday, 8 January 2020

PostgreSQL - Architecture

When you start PostgreSQL`s instance (via pg_ctl start) the main process of the databases is started. In the past versions it was known as the postmaster process and now days it is called postgres. When this processes is started it allocates memory for two main things : shared memory and other background processes. The shared memory is a memory that the database uses mainly for cashing. Its main goal is to reduce the I/O operations and as a result of that improve the performance of the database. The other processes that the postgres process starts are responsible for the ongoing work of the database. 


PostgreSQL`s architecture is based on a client/server communication. In the server side there is a server process that manages the files, database operations, connection and so on.. On the client side there is a tool/program which can be a graphical tool (gui)  like pgAdmin or a command line tool(cli) like psql  that is repnsible for connection the database and allowing the user to do some database operations. PostgreSQL supports multiple connections to the database. Each client that is trying to connect to the database is sending a request to the main postgres process. The postgres process allocated for each client a server process that handles the clients requests. Each server process is allocating an amount of memory from the ram that is called "Process local memory" and is used by 
.the server process for his daily work


Shared memory



 The shared memory consist of 2 main parts : shared buffer,wal buffer. The shared buffer contains data (pages) that is accessed very often(by one user or many..). You should ask your self - how can we benefit from saving that data in the memory ? Take a minute and think about it before you read the anwer. Well, if you need to get access to specific data very often it means that you access the datafiles in the file system and you generate I/O. Instead, we can store part of it or all of it (depends on the size..) in the memory. Reading from memory is much faster than reading from disk (lets assume its not ssd) and therefore we will have better performance. In addition, we reduce the amount of I/O on disks and that will have an impact on other operations that need to do big I/O.

The next one is the wal buffer. If you are familiar with oracle database then wals are corresponding to oracle`s redo logs. In general, wals are files that contains the changes that happens in the database. It is usefull in case we want to restore our database. In that case we can use the wals to "check" what changes were made and roll back the database. Before the changes written to the wal files they are written to the wal buffer. If we will write every change that happens immediatly to the wal files we will have alot of I/O operations on disk. Therefore, we write the changes to the wal buffer and in predetermined point in time it is flushed from the wal buffer to the disk.

work_mem & maintenance_work_mem

The work mem is a memory that is allocated for sort or related join operations. We can set its value in one of postgresql`s config files(postgresql.conf) that I will describe in the next post. This memory is per operation, which means that if we will have multiple queries on same time that involves joins or sorts each operation will have its own memory for itself and its size will be work_mem. The maintenance work mem is a  memory that is allocated for maintenance operations like : building an index, alter table, vacuum. Like the work_mem parameter this parameter is also set in same config file(postgresql.conf). 


Processes

(All the processes that related to postgresql on my machine)

Now lets focus abit on the most important processes in our instance. I'm going to explain the basics on each process. Each process has also many config parameters in the config file of postgresql (postgresql.conf) which I wont mention in this post and I will cover in the next posts.

 Lets start with the checkpointer. When a user perform changes to data that was loaded to the memory(shared buffer to be specific) that data is marked as "dirty". When I say data I mean blocks because PostgreSQL always writes and reads blocks . The checkpointer is responsible for writing those dirty blocks into the data files in the file system(step 1 in the diagram). After it flushes all the data to the disk the checkpointer process writes a special checkpoint record to the wals(step 2 in the diagram). This record guarantees that all the changes that happened before that this record was written to the wal are already in the data files. In other word, in case of a crash, the database can look on those checkpoint records and start applying only the changes after the last checkpoint record. Lets use the diagram as an example : Lets assume that after checkpoint #2 I did an insert and after it a delete. Those changes were flushed from the wal buffer to the wal files(commit). Suddenly after the delete the database crushed. When I started the database I knew that all the data before checkpoint 2# is already in the datafiles. However the insert and the delete that I did after the checkpoint werent in the datafiles. So in order to restore the database I just had to apply the next 2 rows in the wals that come right after the "checkpoint #2" record. 

Lets move on to the next process - bg writer. The background writer also known as the writer process. The writer process is responsible for writing specific dirty blocks from the shared buffer to the disk based on a unique algorithem. This process purpse is not flushing dirty blocks into the disks but rather make sure that there are enough free buffers in the shared buffer. In addition, because the bg writer also flushes dirty blocks into the disk it reduces some of the work for the checkpointer.

In the section on the memry of the instance I explained what are the wal buffer and the wal files. The wal writer process is responsible for writing the changes that are written to the wal buffer into the wal files in case of a commit. Well to be honest, its not so accurate because its operation depends on many parameters but I just want you get the idea of its purpose. This process is crucial for our database because as I mentioned earlier the wal files contain all the changes that happened in the database.

The stats collector as you can guess is reponsible for gathering statistics about the postgres instance in our server. It collects statistics on tables (like num of rows,vacuum and analyze operations), indexex,functions and so on.. 

The last 4 parameters are wal reciever,wal sender,autovacuum and archiver. All those 4 parameters arent a must and they are optional. The wal reciever and wal sender are necessary in case we are using replication. They are responsible for sending(MASTER side) and reciving the wals(STANDBY side). The archiver is a process that generates archives from the wals. We can save those archives somewhere and in case we want to restore out database to a specific point in time we can use them. 

In postgres when you delete records from a table those records are not deleted immediatly from the data files. Those records are just flaged as "deleted" but they are still exist. In other words, even after a delete operation on a table the disk space that those records occupied is not freed. The vacuum process marks the locations of those "deleted" records as a location that can be used to store new data that is entered into the table. When new data is entered into the table (insert) it will overwrite the "deleted" data and we can save space. In addtition,Vacuum also gather statistics on the table. The vacuum process can be run simultaneously only to dml queries! There is another mode of vacuum that is called vacuum full. Vacuum full has the same idea behind him but the differences are that it frees the space that was occupied by those deleted records and that disk space is given back to the operating system and is not exclusive only for that table. Moreover, vacuum full requires a full lock on the table and that is why it is not so recommended to use it in production environments. The autovauum process is reponsible for scheduling the vacuum operation based on threshold of dead rows and more interesting parameters that are available in postgresql.conf.

                                                                   (An illustration of the vacuum process)


Thats all for today. In the next post I will talk about the physical structure(directories and files structuree) of the database, how to connect to the database with psql and pgadmin, understand some of the basics of sql. In addition I will explain about tablespaces,schemas and the important catalog and information_schema tables. If you think that I should add or edit something feel free to write it in the comments.

PostgreSQL Health Check - Performance Audit and Recommendations

   PostgreSQL Server Capacity Planning and Optimization: Conduct a thorough assessment of current and projected workload demands on the data...