Wednesday, September 23, 2009

PostgreSQL point in time recovery PITR

PostgreSQL has a few ways of backing up databases. In this article I will discuss only the point in time recovery method(PITR). Before we begin here are two environment variables set for the postgreSQL installation on Feodra Core 10 Linux.

$ export PG=/opt/PostgreSQL/8.4
$ export PGDATA=$PG/data

the PITR method uses continuous archiving which logs all database changes(using WALs) in the $PGDATA/pg_xlog directory. So with a copy of the previous full database backup and the all the WAL files since that backup, we should able to recover the database up to the moment before a crash.

To enable archive_mode, you'll need to modify the postgresql config file
$PGDATA/postgresql.conf and set the following

archive_mode = on
archive_command = 'cp %p /var/backup/postgres/archivedir/%f

I have set up my archive directory to be in /var/backup/postgres/archivedir for this test purpose.

First let's create a full backup. A full backup is used as a reference point in time frp, where you want to start your recovery.

postgres=# select pg_start_backup('t1');

what this does is postgreSQL performs a checkpoint and writes current location of the WAL file into $PGDATA/backup_label. With this information postgreSQL knows when this full backup began. A checkpoint is performed when  postgreSQL flushes out all the commited transactions that are still in memory to disk.

Now let's tar up the $PGDATA directory into a storage location. In this case.

$ tar cvf /var/backup/postgres/data-t1.tar $PGDATA

So what happens when more transactions are performed while we are running tar in this case? Postgres keeps on archiving eventhough tar is happening at the same time. All the database transactions perform as normal as you we run the tar command.

once tar is done run

postgres=# select pg_stop_backup();

This finishes the backup , switches to the next WAL segment and creates a file TTTTTTTTSSSSSSSSSSSSSSS.OOOOOOOO.backup
in the $PGDATA/pg_xlog directory and switches to the next WAL log.
where the T's part is for the timeline and S's part is for the segment.
I will explain the T's and the S's later on.

for example a typical TTTTTTTTSSSSSSSSSSSSSSS.OOOOOOOO.backup file
would look like
*****************************************
$ cat 000000010000000000000007.00000020.backup
START WAL LOCATION: 0/7000020 (file 000000010000000000000007)
STOP WAL LOCATION: 0/E000000 (file 00000001000000000000000E)
CHECKPOINT LOCATION: 0/7000020
START TIME: 2009-09-21 15:10:45 EDT
LABEL: t3
STOP TIME: 2009-09-21 15:11:51 EDT

*****************************************

In this case it means that during the backup eight WAL files were switched in just more than a minute. the files 7, 8, 9, A, B, C, D and E It usually means it's an active database system due to so many log switches.

these TTTTTTTTSSSSSSSSSSSSSSS.OOOOOOOO.backup files are created each time you do a full database backup and they are also archived into the archive directory.

Now we have full database backup, we can now do a point in time recovery. A reminder here, Sometimes people make the mistake of doing PITR using the current database with a previous set of WALs. This is not possible because current database is already up to date and postgreSQL can't replay WALs on an up to date database. So weed need a database backup from which it was done before a set WAL files.

Let's create a table and insert it with values since the last full backup.

create table t (i int);
-- timeline 1
insert into t values (10),(11);
select now();
select * from t;
select pg_switch_xlog();

And here is the output.
*****************************************
postgres=# create table t(i int);
CREATE TABLE
postgres=# --- time line 1
postgres=# insert into t values (10),(11);
INSERT 0 2
postgres=# select now();
now
-------------------------------
2009-09-22 00:46:46.534329-04
(1 row)

postgres=# select * from t;
i
----
10
11
(2 rows)

postgres=# select pg_switch_xlog();
pg_switch_xlog
----------------
0/2010D90
(1 row)
*****************************************

and with the time stamp just right after the insert.
2009-09-22 00:46:46.534329-04

we'll use this time stamp to restore the database to the point just before we drop the table.

now let's drop the table t;

drop table t;

At this state , the table is gone from the current database. But the inserted values  still exist in the archived logs. Let's check out what is in the $PGDATA/pg_xlog at this moment in time.

*****************************************
$ ls -ls $PGDATA/pg_xlog
000000010000000000000001 000000010000000000000003
000000010000000000000001.00000020.backup archive_status
000000010000000000000002

*****************************************

The pg_xlog directory has the WAL files from the current timeline, 1, and a few segment files. some of these segment files have been archived and some have not.

We are going to roll back the database to the point in time before we had created the table. And then reapply all the archived logs just right up to point in time where before we dropped the table t. We'll need to stop the database and copy all the WALs to the archived directory. This is in case postgreSQL didn't archive them for us.

$ pg_ctl stop
$ cd $PGDATA/pg_xlog
$ cp -i * /var/backup/postgres/archivedir

Now remove $PGDATA ,  untar to restore the previous full database backup, and remove all the previous WAL contents in $PGDATA/pg_xlog. We don't need these WAL files from the previous database backup.

$ cd $PG
$ rm -rf $PGDATA
$ tar xf /var/backup/postgres/data-t1.tar
$ cd $PGDATA/pg_xlog
$ rm -rf *

Now our database is at the same state that was before we created table t and in  timeline 1. Copy a recovery.conf file from the share directory and edit the restore_command and the recovery_target_time variables with the time stamp right after the insert.

$ cp $PG/share/postgresql/recovery.conf $PGDATA
$ vi recovery.conf

set these two variables to

restore_command = 'cp /var/backup/postgres/archivedir/%f %p'
recovery_target_time = '2009-09-22 00:46:46.534329-04'


By default if the recovery.conf exists in the $PGDATA directory postgreSQL automatically goes into recovery. Once the recovery is done it renames the recovery.conf into recovery.done and postgreSQL is ready for connection.

$ pg_ctl start

Let's check out what's in the $PGDATA/pg_log file. It gets interesting from the log file
*****************************************
2009-09-22 00:43:59 EDTLOG: database system was shut down at 2009-09-22 00:43:14 EDT
2009-09-22 00:43:59 EDTLOG: autovacuum launcher started
2009-09-22 00:43:59 EDTLOG: database system is ready to accept connections
2009-09-22 00:44:22 EDTLOG: received smart shutdown request
2009-09-22 00:44:22 EDTLOG: autovacuum launcher shutting down
2009-09-22 00:44:22 EDTLOG: shutting down
2009-09-22 00:44:23 EDTLOG: database system is shut down
2009-09-22 00:49:35 EDTLOG: database system was shut down at 2009-09-22 00:44:23 EDT
2009-09-22 00:49:35 EDTLOG: creating missing WAL directory "pg_xlog/archive_status"
2009-09-22 00:49:35 EDTLOG: starting archive recovery
2009-09-22 00:49:35 EDTLOG: restore_command = 'cp /var/backup/postgres/archivedir/%f %p'
2009-09-22 00:49:35 EDTLOG: recovery_target_time = '2009-09-22 00:46:46.534329-04'
cp: cannot stat `/var/backup/postgres/archivedir/00000001.history': No such file or directory
2009-09-22 00:49:35 EDTLOG: restored log file "000000010000000000000002" from archive
2009-09-22 00:49:35 EDTLOG: automatic recovery in progress
2009-09-22 00:49:35 EDTLOG: redo starts at 0/2000068
2009-09-22 00:49:35 EDTLOG: consistent recovery state reached
2009-09-22 00:49:35 EDTLOG: restored log file "000000010000000000000003" from archive
2009-09-22 00:49:35 EDTLOG: recovery stopping before commit of transaction 658, time 2009-09-22 00:48:09.849851-04
2009-09-22 00:49:35 EDTLOG: redo done at 0/30002F8
2009-09-22 00:49:35 EDTLOG: last completed transaction was at log time 2009-09-22 00:46:46.533274-04
cp: cannot stat `/var/backup/postgres/archivedir/00000002.history': No such file or directory
2009-09-22 00:49:35 EDTLOG: selected new timeline ID: 2
cp: cannot stat `/var/backup/postgres/archivedir/00000001.history': No such file or directory
2009-09-22 00:49:35 EDTLOG: archive recovery complete
2009-09-22 00:49:35 EDTLOG: autovacuum launcher started
2009-09-22 00:49:35 EDTLOG: database system is ready to accept connections

*****************************************

Postgresql uses the restore_command to look for previous archived WAL files. Now the database is restored to the previous state just right after the insert statement. And from this point on, the current database starts up as timeline '2'. The previous full database backup was known as timeline '1'. For each recovery postgreSQL creates a TTTTTTTT.history file. The file tells postgreSQL the parent timeline of the current database. where exactly current database had branched off of the parent archive logs.

now let's make sure the table is restored and see how the WAL files have switched over to the new timeline.


*****************************************
$ psql
psql (8.4.0)
Type "help" for help.

postgres=# \dt
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | t | table | postgres
(1 row)
postgres=# select * from t;
i
----
10
11
(2 rows)
postgres=# \q
$ ls -ls pg_xlog/
total 16412
16404 -rw------- 1 postgres postgres 16777216 2009-09-22 00:49 000000020000000000000003
4 -rw------- 1 postgres postgres 83 2009-09-22 00:49 00000002.history
4 drwx------ 2 postgres postgres 4096 2009-09-22 00:49 archive_status
$
*****************************************
The database is on timeline '2' and we have recovered the dropped table. Now lets insert more data into the database and work on more test scenarios. the commands are as followed.

-- time line 2
insert into t values (20),(21);
select now();
select * from t;
select pg_sleep(2);
insert into t values (25),(26);
select now();
select * from t;
select pg_switch_xlog();

*****************************************
postgres=# insert into t values (20),(21);
INSERT 0 2
postgres=# select now();
now
-------------------------------
2009-09-22 00:50:38.234784-04
(1 row)

postgres=# select * from t;
i
----
10
11
20
21
(4 rows)

postgres=# select pg_sleep(2);
pg_sleep
----------

(1 row)

postgres=# insert into t values (25),(26);
INSERT 0 2
postgres=# select now();
now
-------------------------------
2009-09-22 00:50:40.242744-04
(1 row)

postgres=# select * from t;
i
----
10
11
20
21
25
26
(6 rows)

postgres=# select pg_switch_xlog();
pg_switch_xlog
----------------
0/3000558
(1 row)
*****************************************

Now let's drop the table t and recover the database to point in time where we had just inserted values 20 and 21.

*****************************************
postgres=# drop table t;
DROP TABLE
postgres=# \q
$ pg_ctl stop
waiting for server to shut down.... done
server stopped
$ cd pg_xlog/
$ ls
000000020000000000000003 000000020000000000000005 archive_status
000000020000000000000004 00000002.history
$ cp * /var/backup/postgres/archivedir # archive the rests of the WALs
$ cd $PG
$ rm -rf $PGDATA
$ tar xf /var/backup/postgres/data-t1.tar
$ cp $PG/share/postgresql/recovery.conf $PGDATA

*****************************************

edit the recovery.conf file and set the three recovery variables to the following

restore_command = 'cp /var/backup/postgres/archivedir/%f %p'
recovery_target_time = ' 2009-09-22 00:50:38.234784-04'
recovery_target_timeline = '2'

start up the server and let's check out the $PGDATA/pg_log/postgres.log
*****************************************
$ pg_ctl start
$ less $PGDATA/pg_log/postgresql.log
...
...
2009-09-22 00:55:16 EDTLOG: starting archive recovery
2009-09-22 00:55:16 EDTLOG: restore_command = 'cp /var/backup/postgres/archivedir/%f %p'
2009-09-22 00:55:16 EDTLOG: recovery_target_time = '2009-09-22 00:50:38.234784-04'
2009-09-22 00:55:16 EDTLOG: recovery_target_timeline = 2
2009-09-22 00:55:16 EDTLOG: restored log file "00000002.history" from archive
2009-09-22 00:55:16 EDTLOG: restored log file "00000002.history" from archive
cp: cannot stat `/var/backup/postgres/archivedir/000000020000000000000002': No such file or directory
2009-09-22 00:55:16 EDTLOG: restored log file "000000010000000000000002" from archive
2009-09-22 00:55:16 EDTLOG: automatic recovery in progress
2009-09-22 00:55:16 EDTLOG: redo starts at 0/2000068
2009-09-22 00:55:16 EDTLOG: consistent recovery state reached
2009-09-22 00:55:16 EDTLOG: restored log file "000000020000000000000003" from archive
2009-09-22 00:55:16 EDTLOG: recovery stopping before commit of transaction 660, time 2009-09-22 00:50:40.24176-04
2009-09-22 00:55:16 EDTLOG: redo done at 0/3000508
2009-09-22 00:55:16 EDTLOG: last completed transaction was at log time 2009-09-22 00:50:38.233622-04
cp: cannot stat `/var/backup/postgres/archivedir/00000003.history': No such file or directory
2009-09-22 00:55:16 EDTLOG: selected new timeline ID: 3
2009-09-22 00:55:16 EDTLOG: restored log file "00000002.history" from archive
...
...
*****
Here PostgreSQL recovers the database up to the timeline '2' where just after the inserts of values 20 and 21. And also postgreSQL starts up with  a new timeline '3' for this current database.

Let's check out the content of table t after this restore.
*****************************************
postgres=# select * from t;
i
----
10
11
20
21
(4 rows)
*****************************************

We can insert more values into table t with the following.

--- time line 3
insert into t values (30),(31);
select now();
select * from t;
select pg_switch_xlog();
**************
postgres=# insert into t values (30),(31);
INSERT 0 2
postgres=# select now();
now
-------------------------------
2009-09-22 00:56:00.924816-04
(1 row)

postgres=# select * from t;
i
----
10
11
20
21
30
31
(6 rows)

postgres=# select pg_switch_xlog();
pg_switch_xlog
----------------
0/3000748
(1 row)

**********************
And later we decided this not what we really had in mind. We want the values from timeline 2 that includes the values 25 and 26.

So let's shutdown this database and copy all the WAL files to the archive directory this is to make sure what ever decision we make we can always roll back to a point in time we.

$ ls $PGDATA/pg_xlog
000000030000000000000003 00000003.history
000000030000000000000004 archive_status/
$ cp * /var/backup/postgres/archivedir # archive the rests of the WALs
$ cd $PG
$ rm -rf $PGDATA
$ tar xf /var/backup/postgres/data-t1.tar
$ rm -rf $PGDATA/pg_xlog
$ cp $PG/share/postgresql/recovery.conf $PGDATA

and edit the three variables
-----
restore_command = 'cp /var/backup/postgres/archivedir/%f %p'
recovery_target_time = '2009-09-22 00:50:40.242744-04'
recovery_target_timeline = '2'
-----
Now we can start up the database and take a look at the $PGDATA/pg_log/postgresql.log

$ pg_ctl start
$ less $PGDATA/pg_log/postgresql.log


009-09-22 00:43:59 EDTLOG: database system was shut down at 2009-09-22 00:43:14 EDT
2009-09-22 00:43:59 EDTLOG: autovacuum launcher started
2009-09-22 00:43:59 EDTLOG: database system is ready to accept connections
2009-09-22 00:44:22 EDTLOG: received smart shutdown request
2009-09-22 00:44:22 EDTLOG: autovacuum launcher shutting down
2009-09-22 00:44:22 EDTLOG: shutting down
2009-09-22 00:44:23 EDTLOG: database system is shut down
2009-09-22 00:57:51 EDTLOG: database system was shut down at 2009-09-22 00:44:23 EDT
2009-09-22 00:57:51 EDTLOG: starting archive recovery
2009-09-22 00:57:51 EDTLOG: restore_command = 'cp /var/backup/postgres/archivedir/%f %p'
2009-09-22 00:57:51 EDTLOG: recovery_target_time = '2009-09-22 00:50:40.242744-04'
2009-09-22 00:57:51 EDTLOG: recovery_target_timeline = 2
2009-09-22 00:57:51 EDTLOG: restored log file "00000002.history" from archive
2009-09-22 00:57:51 EDTLOG: restored log file "00000002.history" from archive
cp: cannot stat `/var/backup/postgres/archivedir/000000020000000000000002': No such file or directory
2009-09-22 00:57:51 EDTLOG: restored log file "000000010000000000000002" from archive
2009-09-22 00:57:51 EDTLOG: automatic recovery in progress
2009-09-22 00:57:51 EDTLOG: redo starts at 0/2000068
2009-09-22 00:57:51 EDTLOG: consistent recovery state reached
2009-09-22 00:57:52 EDTLOG: restored log file "000000020000000000000003" from archive
2009-09-22 00:57:52 EDTLOG: restored log file "000000020000000000000004" from archive
2009-09-22 00:57:51 EDTLOG: automatic recovery in progress
2009-09-22 00:57:51 EDTLOG: redo starts at 0/2000068
2009-09-22 00:57:51 EDTLOG: consistent recovery state reached
2009-09-22 00:57:52 EDTLOG: restored log file "000000020000000000000003" from archive
2009-09-22 00:57:52 EDTLOG: restored log file "000000020000000000000004" from archive
2009-09-22 00:57:52 EDTLOG: recovery stopping before commit of transaction 661, time 2009-09-22 00:52:47.110302-04
2009-09-22 00:57:52 EDTLOG: redo done at 0/4003830
2009-09-22 00:57:52 EDTLOG: last completed transaction was at log time 2009-09-22 00:50:40.24176-04
2009-09-22 00:57:52 EDTLOG: restored log file "00000003.history" from archive
cp: cannot stat `/var/backup/postgres/archivedir/00000004.history': No such file or directory
2009-09-22 00:57:52 EDTLOG: selected new timeline ID: 4
2009-09-22 00:57:52 EDTLOG: restored log file "00000002.history" from archive
2009-09-22 00:57:52 EDTLOG: archive recovery complete
2009-09-22 00:57:52 EDTLOG: autovacuum launcher started
2009-09-22 00:57:52 EDTLOG: database system is ready to accept connections

The interesting part of this log file is that the timeline '3' already exists and postgreSQL does not overwrite any existing timelines. hence, postgreSQL creates a new timeline '4' for the current database. Now let's check out the database for our table t.

$ psql
psql (8.4.0)
Type "help" for help.

postgres=# \dt
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | t | table | postgres
(1 row)

postgres=# select * from t;
i
----
10
11
20
21
25
26
(6 rows)


Postgresql uses the TTTTTTTTT.history to keep track of its ancestral timeline. Having this file is vital to point in time recovery, If not Postgresql will not know which set archive logs to use. When postgreSQL does a recovery to timeline 't', it automatically starts a new timeline. This is to prevent postgreSQL from overwriting the the previous existing timeline 't'. As with the example where we restored the database up to values 20, 21 on
timeline '2', if postgreSQL had continue on as timeline '2', it would have overwritten the values 25 and 26. Which is why postgreSQL uses timeline '3' that branches off at the moment after inserts of values 20 and 21. And we we wanted to restore the values of 25 and 26 we just use the same timeline '2' and extended the target time stamp to the moment after the inserts of 25 and 26. PostgreSQL does not overwrite existing timelines, it merely branches
off from it.