Jump to content


autoreload, especially vs replicas: odd behaviors

label autoreload replicas

  • Please log in to reply
4 replies to this topic

#1 Miles O'Neal

Miles O'Neal

    Advanced Member

  • Members
  • PipPipPip
  • 128 posts

Posted 16 July 2019 - 09:29 PM

I've been looking for a while at disk size discrepancies, and have narrowed a big part down to autoreload labels.

For historical reasons, part of our CI process[1] generates several autoreoad labels and tags many files (potentially hundreds of thousands) with each label. Some of these labels are removed very soon after the tag (or labelsync, I forget which). I've run across multiple issues that look related to this. The level of resultant discrepancies (outlined below) varies between master, local replicas, and remote replicas (>120mS pings).
  • After running this way for several years, the number of dirs/files living in the unload depot varies widely across the systems (currently ranging from 70,000 to > 90,000 unload files).
  • Some of the dirs (ex: /p4/1/depots/unloaded/label/42,d/label:thing1-fred.ckp,d/ ) contain the expected file named 1.0 . Others were empty. Others contained a temp file (I believe it was tmp.nn.nnnn or something simlar). A few contained both a 1.0 and a tmp file!
  • Some labels no longer had a directory, much less a file, in the unload depot.
  • Some dirs and files in the unload depot no longer had a label associated.
Some of the tmp files are transitory; I'm not speaking of those. I assume the aberrations with just a tmp file are from interrupted transfers[2]. I'm not sure why some dirs would have both. No file within the directory? No clue.

I also don't understand why the directory would be missing when the label is there.

Has anyone else noticed this?

For the record, we're still at 2017.1 . We have a workaround for the 2018 showstopper we hit, but are hopefully going to 2019 soon.


[1] We're reworking that process.
[2] Some of the temp files are deleted within a few seconds of creation. On busy servers, I can see the journal entry to delete a label hitting the replica while a large file was still being transferred. This could leave a directory or file behind without a label.

#2 Matt Janulewicz

Matt Janulewicz

    Advanced Member

  • Members
  • PipPipPip
  • 176 posts
  • LocationSan Francisco, CA

Posted 17 July 2019 - 08:57 AM

Howdy!

We don't have this exact workflow but we use our unload depot extensively and I might (or might not) have a little insight into some of this stuff.

The first thing I'd say, if I'm understanding things correctly here (again, maybe I'm not) is that the unload depot is not a global namespace. Meaning, servers that are writable (notably edge servers) have their own unload depot. If you unload a workspace (our most popular unloaded thing), then they are confined to just that edge server (and need to be backed up separately.) If you're unloading on a server other than the master/commit, I would expect the unload depot there to be unique.

We force global labels (and unload very few of them) so at least our unloaded labels are confined to one server, but the size and contents of our unload depots on different servers vary wildly.

As far as I know, those temp files are what a pull thread initially writes to when a file is in transit between two severs, then it renames it to what it's supposed to be after it finishes. The last set of numbers in the name of the file, I believe, is the PID of the thread that wrote it. As you already know, these are likely interrupted transfers. Pull threads retry transfers and in my travels I come across tons of of these tmp files in directories where the actual files still exists. I occasionally wipe those things out and it's never hurt me (ymmv.)

I also expect, generally, a lot of empty directories to be lying around. I have a vague recollection that a few years ago we were relocating a lot of files (p4 relocate + p4 obliterate) and it left all the empty ,d/ directories lying around. Not sure if that's fixed yet but I'm never surprised to find empty directories in my depots.

If you're more concerned with aesthetics and recovering disk space, a dumb/fun/dangerous thing to do that I've certainly done from time to time might be:

1. On a replica of your master, rm -rvf /p4/1/depots/unloaded/*
2. p4 verify -qUt //unloaded/...
3. p4 pull -ls (until it's finished transferring)
4. On the upstream master, rm -rvf /p4/1/depots/unloaded/*
5. Rsync unloaded depot from replica to master

You've been around a long time, as have I, so I know you know all the precautions to take when doing something like that. But if a new p4 admin is reading this, don't even think of doing that unless you understand exactly what you're doing. :)

Sometimes I really go nuts and build an entire server/replica with 'p4 verify -qt' which gives me a nice, clean, pedantic server ... for about a day or two. Ha ha.
-Matt Janulewicz
Staff SCM Engineer, Perforce Administrator
Dolby Laboratories, Inc.
1275 Market St.
San Francisco, CA 94103, USA
majanu@dolby.com

#3 Miles O'Neal

Miles O'Neal

    Advanced Member

  • Members
  • PipPipPip
  • 128 posts

Posted 17 July 2019 - 09:41 PM

Thanks, Matt.

We haven't gone to edge/commit yet, so these are all forwarding or r/o replicas and a master. So my unload namespace should be global (AFAIK).
I have a script that generates the list of extra files so I can delete them. I don't care too much about the extra dirs; I'm mainly concerned with wasted space. We have thousands of 100-200MB files and many more 10-100MB files. That adds up.

I run verify when I spin up a server. I'm looking at a script to verify all the depots on each server without bogging anything down. That won't, of course, clean up the orphaned files; I need to finish my script to not just identify them, but clean up. I wish Helix did a better job of that. It's not crucial, but it is annoying.

I'm working with the methodology team to not use autoreload for the short duration labels, and to establish rules for when to use them otherwise.

#4 Matt Janulewicz

Matt Janulewicz

    Advanced Member

  • Members
  • PipPipPip
  • 176 posts
  • LocationSan Francisco, CA

Posted 18 July 2019 - 07:23 PM

Yeah, you're right. In this situation your unload depots should be identical. This makes me wonder if mine are all identical where they should be, I've never rooted around in there. Hmmmm.

Not that you specifically asked, but we run a verify weekly on all our crucial servers. I started out with the verify script in the SDP and evolved it from there. At this point I'm generating a full list of revisions and running multiple p4 verify threads through GNU 'parallel' (we're all Linux, all the time) which allows me to tune the number of threads and not overload the Commit (master) server. But then I can go hog wild on read-only replicas if I want. It's essentially a one-liner:

p4 -F "%depotFile%#=%headRev%" fstat -Oc -Of -F lbrIsLazy=0 //depot/path/... | parallel -j10 -L30 p4 verify -q "{}"

Note that this presumes you are verifying the entire server. If you're doing a one-off depot or path take off the '-F lbrIsLazy=0' part to be sure you hit all revisions.


The arguments for parallel are number of threads (-j) and number of revisions to pass into 'p4 verify' at once (-L). 10 and 30 are the sweet spots for our hardware to not have a hugely significant impact on CPU, disk or memory usage.

Our servers have too much data to do 'p4 fstat //...' all in one shot so I break it down to the second level of directories under all the depots and take it from there.

Last note just as a reminder to anyone else reading this, unload depots and shelves require special handling, as do other types of depots (that we don't use.) Be sure to cover that stuff separately if you do something like this.
-Matt Janulewicz
Staff SCM Engineer, Perforce Administrator
Dolby Laboratories, Inc.
1275 Market St.
San Francisco, CA 94103, USA
majanu@dolby.com

#5 Miles O'Neal

Miles O'Neal

    Advanced Member

  • Members
  • PipPipPip
  • 128 posts

Posted 18 July 2019 - 08:01 PM

I do have a script for quicker verification that uses xjobs (instead of parallel). The script feeds depots up to a certain size straight into xjobs to run verify against, but breaks the largest depots up into smaller chunks to verify separately.
I hadn't thought about checking all revisions; thanks for the tip.





Also tagged with one or more of these keywords: label, autoreload, replicas

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users