Jump to content


Basic backup/recovery question

p4d upgrade recovery

  • Please log in to reply
14 replies to this topic

#1 briand

briand

    Advanced Member

  • Members
  • PipPipPip
  • 76 posts

Posted 16 February 2017 - 07:35 PM

I'm about ready to upgrade our server from the ancient 2012.1 to the current 2016.2. I've read several KB articles on backup, recovery, and upgrade (as well as several manual pages). Even so, there is one item I'm not completely clear on.

When you create a checkpoint using p4d -jc, it will also save and truncate the journal. When I'm recovering from that checkpoint using p4d -jr, do I also need to recover from the journal file too or does the checkpoint contain a complete and up to date backup of the server?

Thanks.

EDIT:  Just in case it wasn't clear, my intent is to stop p4d before creating the checkpoint.
--
Brian

#2 Matt Janulewicz

Matt Janulewicz

    Advanced Member

  • Members
  • PipPipPip
  • 99 posts
  • LocationSan Francisco, CA

Posted 16 February 2017 - 08:25 PM

At the point you shut down p4d, the db will contain all the items from the journal. In my mind, I conceptualize it as the journal and the db being written to at the same time.

Plus, replaying the journal into the db after recovering the checkpoint wouldn't cause any harm as long as p4d hasn't been started up again in the meantime. At least I'm pretty sure that's true. I've done that in the past because I'm overly paranoid and it doesn't seem to have wrecked anything. :)
-Matt Janulewicz
Staff SCM Engineer, Perforce Administrator
Dolby Laboratories, Inc.
1275 Market St.
San Francisco, CA 94103, USA
majanu@dolby.com

#3 Mailman Sync

Mailman Sync

    Advanced Member

  • Maillist Aggregator
  • 2476 posts

Posted 16 February 2017 - 08:35 PM

Originally posted to the perforce-user mailing list by: Michael Mirman


The checkpoint will be complete - at the time you are creating that checkpoint.
The reason to rotate the journal is that whatever goes to the next journal is not in the checkpoint.
Therefore, when you restore, you may want to replay the next journal(s) after you restore the checkpoint.

--
Michael Mirman
MathWorks, Inc.
3 Apple Hill Drive, Natick, MA 01760
508-647-7555

Quote

-----Original Message-----
From: perforce-user [mailto:perforce-user-bounces@perforce.com] On
Behalf Of briand
Sent: Thursday, February 16, 2017 2:40 PM
To: perforce-user@perforce.com
Subject: [p4] Basic backup/recovery question

Posted on behalf of forum user 'briand'.

I'm about ready to upgrade our server from the ancient 2012.1 to the current
2016.2. I've read several KB articles on backup, recovery, and upgrade (as
well as several manual pages). Even so, there is one item I'm not completely
clear on.

When you create a checkpoint using p4d -jc, it will also save and truncate the
journal. When I'm recovering from that checkpoint using p4d -jr, do I also
need to recover from the journal file too or does the checkpoint contain a
complete and up to date backup of the server?

Thanks.



--
Please click here to see the post in its original format:
  http://forums.perfor...backuprecovery-
question
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user



#4 briand

briand

    Advanced Member

  • Members
  • PipPipPip
  • 76 posts

Posted 16 February 2017 - 10:59 PM

Thanks for the thoughts.

In my test environment, when I'm doing an offline checkpoint, I see a couple of lines present in the journal after the checkpoint is complete (for example, setting the "journal" and "lastCheckpointAction" counters), plus, the journal was last written after the checkpoint file was last written, so that makes me think that the Perforce metadata gets updated after the checkpoint is complete, even when performing an offline checkpoint.
--
Brian

#5 Sambwise

Sambwise

    Advanced Member

  • Members
  • PipPipPip
  • 344 posts

Posted 16 February 2017 - 11:48 PM

I don't have a journal file in front of me to check, but those lines might not represent database writes.  In particular @vv@ is a verification (assert that this counter equals this value and halt the restore if not -- this is to help keep you from accidentally restoring checkpoint/journal files in the wrong sequence since that'll usually lead to an inconsistent state) and @nx@ is a "note" (for debugging/analysis to get more context on what was happening at the time the journal was written, e.g. timestamps and markers to indicate an atomic transaction).  IIRC the journal counter gets incremented before the journal gets rotated (so that @rv@ will be the last thing in the rotated journal, and it will have a matching @vv@ at the start of the new journal).

#6 briand

briand

    Advanced Member

  • Members
  • PipPipPip
  • 76 posts

Posted 17 February 2017 - 01:47 AM

The lines I see in my journal after the checkpoint are @nx@, @vv@, @rv@, and @ex@ lines.

I'm thinking the @rv@ line is a result of a counter update. Here is the one from my after-checkpoint journal:

@rv@ 1 @db.counters@ @lastCheckpointAction@ @1487275408 (2017/02/16 12:03:28 -0800 PST) checkpoint completed@

12:03:28 is the time the checkpoint completed.
--
Brian

#7 Mailman Sync

Mailman Sync

    Advanced Member

  • Maillist Aggregator
  • 2476 posts

Posted 17 February 2017 - 02:10 AM

Originally posted to the perforce-user mailing list by: Michael Mirman


First, let's make sure that we mean the same thing when we say "offline checkpoint".
For me, it means, the real server keeps going, and the checkpoint is done on a replica.
*Not* we take the server offline and take a checkpoint there.

The "standard" (and simplest) procedure is the latter. In that case, what is in the journal is only for the future - the db is not changed during creation of the checkpoint.
We have never used this approach because we like the idea of having our server 24x7, so we don’t take it offline.
Rather, we take a checkpoint on a replica, and we do *not* truncate the journal at that time.

All journals are kept available for the restore procedure.
When the restore time comes (which for us is every night - when we test the restore on a test stack), first, we rebuild the db from the checkpoint, and then replay all journals starting with the "right" one.

Logically, we need to start replaying journals with the first journal that contains records that we did not get in the checkpoint.
Practically, it is a bit tricky because our checkpointing procedure is completely asynchronous to the journal rotation.

The way we do is:
We grep the @db.counter@ record for the journal counter from the checkpoint - now we know the journal number from the checkpoint itself (as opposed to relying on having the journal counter embedded in the checkpoint name, although this is a fine approach, too).
Also, from the checkpoint, we get the time we started writing this checkpoint - the first @ex@ record.

Then, we go through our journals backwards (from the most recent to the oldest), looking for the journal with the @ex@ record, indicating the time preceding the @ex@ time from the checkpoint.
After that, we know all the journals we have to replay - from the oldest to the most recent.

We had had several versions of the logic how to maintain the perfect correctness of what is replayed.
This logic may be a bit overcomplicated, but it has been working fine for years.

--
Michael Mirman
MathWorks, Inc.
508-647-7555

-----Original Message-----
From: perforce-user [mailto:perforce-user-bounces@perforce.com] On Behalf Of briand
Sent: Thursday, February 16, 2017 6:00 PM
To: perforce-user@perforce.com
Subject: Re: [p4] Basic backup/recovery question

Posted on behalf of forum user 'briand'.

Thanks for the thoughts.

In my test environment, when I'm doing an offline checkpoint, I see a couple
of lines present in the journal after the checkpoint is complete (for example,
setting the "journal" and "lastCheckpointAction" counters),
plus, the journal was last written after the checkpoint file was last written,
so that makes me think that the Perforce metadata gets updated after the
checkpoint is complete, even when performing an offline checkpoint.



--
Please click here to see the post in its original format:
  http://forums.perfor...covery-question
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user


#8 Sambwise

Sambwise

    Advanced Member

  • Members
  • PipPipPip
  • 344 posts

Posted 17 February 2017 - 07:07 AM

View Postbriand, on 17 February 2017 - 01:47 AM, said:

The lines I see in my journal after the checkpoint are @nx@, @vv@, @rv@, and @ex@ lines.

I'm thinking the @rv@ line is a result of a counter update. Here is the one from my after-checkpoint journal:

@rv@ 1 @db.counters@ @lastCheckpointAction@ @1487275408 (2017/02/16 12:03:28 -0800 PST) checkpoint completed@

12:03:28 is the time the checkpoint completed.

That's not a built-in counter name that I remember.  Does your backup script perhaps update this counter after taking a checkpoint?  :)

#9 Domenic

Domenic

    Advanced Member

  • Members
  • PipPipPip
  • 88 posts

Posted 17 February 2017 - 07:26 AM

Based on https://www.perforce...4_counters.html it looks like lastCheckpointAction is a built-in one.

#10 Mailman Sync

Mailman Sync

    Advanced Member

  • Maillist Aggregator
  • 2476 posts

Posted 17 February 2017 - 01:40 PM

Originally posted to the perforce-user mailing list by: Michael Mirman


Peculiar.
I see in the release notes in the
Major new functionality in 2010.2
section:
#257688 **
    To help administrators keep track of successful checkpoints, a
    new internally generated counter 'lastCheckpointAction' has
    been added which contains the operation timestamp.  Also, when
    the checkpoint completes, the MD5 digest of the checkpoint is
    written to the file 'checkpoint.N.md5'. Together these data points
    can help in verifying complete and undamaged checkpoints.  Note
    that if the -z flag was used to compress the checkpoint, it must
    be uncompressed to verify the checksum.  When restoring from a
    journal, the server will now produce a warning if the journal
    was written by a different version of the server, and will produce
    an error if the journal was written using different case-handling
    flags than are currently defined for the server.

We are running 2016.1.
I don’t see the counter:
-> p4 counter lastCheckpointAction
0

IOW, even if it's set, we cannot query it.
Not too useful for the user, IMHO.


--
Michael Mirman
MathWorks, Inc.
3 Apple Hill Drive, Natick, MA 01760
508-647-7555

Quote

-----Original Message-----
From: perforce-user [mailto:perforce-user-bounces@perforce.com] On
Behalf Of Domenic
Sent: Friday, February 17, 2017 2:30 AM
To: perforce-user@perforce.com
Subject: Re: [p4] Basic backup/recovery question

Posted on behalf of forum user 'Domenic'.

Based on
https://www.perforce.../cmdref/p4_coun
ters.html
it looks like lastCheckpointAction is a built-in one.



--
Please click here to see the post in its original format:
  http://forums.perfor...backuprecovery-
question
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user



#11 Domenic

Domenic

    Advanced Member

  • Members
  • PipPipPip
  • 88 posts

Posted 17 February 2017 - 02:53 PM

View PostMailman Sync, on 17 February 2017 - 01:40 PM, said:

Originally posted to the perforce-user mailing list by: Michael Mirman

We are running 2016.1.
I don’t see the counter:
-> p4 counter lastCheckpointAction
0

IOW, even if it's set, we cannot query it.
Not too useful for the user, IMHO.


From our experience, it seems that the counter only gets updated when the checkpoint action is against the main server. For example, ours is:
lastCheckpointAction = 1476690993 (2016/10/17 00:56:33 -0700 Pacific Daylight Time) checkpoint restored

..even though we take nightly checkpoints off a replica.

Maybe you don't see it because you've always done your checkpoints off a replica?

#12 Mailman Sync

Mailman Sync

    Advanced Member

  • Maillist Aggregator
  • 2476 posts

Posted 17 February 2017 - 05:15 PM

Originally posted to the perforce-user mailing list by: Michael Mirman


Quote

Maybe you don't see it because you've always done your checkpoints off a
replica?

Ah! Of course! This makes perfect sense.
Now I understand why Sven asked me that question!
:-)

--
Michael Mirman
MathWorks, Inc.
3 Apple Hill Drive, Natick, MA 01760
508-647-7555

Quote

-----Original Message-----
From: perforce-user [mailto:perforce-user-bounces@perforce.com] On
Behalf Of Domenic
Sent: Friday, February 17, 2017 9:55 AM
To: perforce-user@perforce.com
Subject: Re: [p4] Basic backup/recovery question

Posted on behalf of forum user 'Domenic'.



[http://forums.perfor...dule=forums
tion=findpost&pid=21198]
Mailman Sync, on 2017/02/17 13:40:10 UTC, said:

Quote

    Originally posted to the perforce-user mailing list by: Michael Mirman

  We are running 2016.1.
  I don’t see the counter:
  -> p4 counter lastCheckpointAction
  0

  IOW, even if it's set, we cannot query it.
  Not too useful for the user, IMHO.


From our experience, it seems that the counter only gets updated when the
checkpoint action is against the main server. For example, ours is:
lastCheckpointAction = 1476690993 (2016/10/17 00:56:33 -0700 Pacific
Daylight
Time) checkpoint restored

..even though we take nightly checkpoints off a replica.

Maybe you don't see it because you've always done your checkpoints off a
replica?



--
Please click here to see the post in its original format:
  http://forums.perfor...backuprecovery-
question
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user




#13 briand

briand

    Advanced Member

  • Members
  • PipPipPip
  • 76 posts

Posted 18 February 2017 - 12:10 AM

Our "standard" nightly backup procedure is to take a checkpoint from a replica and rotate the journals at the same time. Journals get rotated several times throughout the day and get save along with the checkpoints.

For this specific instance, since I'm upgrading from a very old p4d version to the current p4d version, I need to completely rebuild the databases from a checkpoint (KB article 5469). I've gotten approval from upper management to shut down Perforce for the weekend, so that I can perform the upgrade without time pressures (we need to do some major VMware maintenance at the same time, so it works out well).

Once I shut down the master p4d server, I'll create a new checkpoint (p4d -jc). This checkpoint and the journal file left over after the checkpoint (containing the four lines discussed above) will then be used to rebuild the databases with the new version of p4d and reseed the replicas. It may not be the procedure that takes the least time, but I believe it will be the least error-prone. This procedure is consistent with the steps outlined in KB article 5469.
--
Brian

#14 Mailman Sync

Mailman Sync

    Advanced Member

  • Maillist Aggregator
  • 2476 posts

Posted 18 February 2017 - 10:35 PM

Originally posted to the perforce-user mailing list by: Michael Mirman


I don’t see any holes in this approach.
If I were doing it, I would probably consider shutting down the server, then restarting it in a way that nobody can access it except me, and rotating the journal. Then, I could shut down the server, and create a full checkpoint not to worry about the journals.

My 0.02

--
Michael Mirman
MathWorks, Inc.
508-647-7555

-----Original Message-----
From: perforce-user [mailto:perforce-user-bounces@perforce.com] On Behalf Of briand
Sent: Friday, February 17, 2017 7:15 PM
To: perforce-user@perforce.com
Subject: Re: [p4] Basic backup/recovery question

Posted on behalf of forum user 'briand'.

Our "standard" nightly backup procedure is to take a checkpoint from a
replica and rotate the journals at the same time. Journals get rotated several
times throughout the day and get save along with the checkpoints.

For this specific instance, since I'm upgrading from a very old p4d version
to the current p4d version, I need to completely rebuild the databases from a
checkpoint (KB article 5469). I've gotten approval from upper management to
shut down Perforce for the weekend, so that I can perform the upgrade without
time pressures (we need to do some major VMware maintenance at the same time, so
it works out well).

Once I shut down the master p4d server, I'll create a new checkpoint (p4d
-jc). This checkpoint and the journal file left over after the checkpoint
(containing the four lines discussed above) will then be used to rebuild the
databases with the new version of p4d and reseed the replicas. It may not be the
procedure that takes the least time, but I believe it will be the least
error-prone. This procedure is consistent with the steps outlined in KB article
5469.



--
Please click here to see the post in its original format:
  http://forums.perfor...covery-question
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user


#15 briand

briand

    Advanced Member

  • Members
  • PipPipPip
  • 76 posts

Posted 22 February 2017 - 09:28 PM

Thanks all for your input. My upgrade went fine. The only issue was with reseeding the replica. I couldn't use the same checkpoint/journal I used for re-creating the master (one of the tables would end up with a bad checksum). I had to re-create the master first, take a second checkpoint from the updated master, then reseed the replica from that. That added several hours to the upgrade process, but in the end it worked.
--
Brian





Also tagged with one or more of these keywords: p4d, upgrade, recovery

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users