Jump to content


Determine if depots directory contains unnecessary files

p4d depots

  • Please log in to reply
6 replies to this topic

#1 Br.Bill

Br.Bill

    Member

  • Members
  • PipPip
  • 21 posts

Posted 13 August 2016 - 08:28 PM

I inherited some p4d instances that were backed up regularly with rsync. This is awesome, because we had a disk crash on one of the instances and I was able to recover it from the backup. Cool.

Unfortunately, I found that the rsync command used for those backups was not doing exact sync. It did not --delete files in the target that were removed in the source. This went on for years and now there are gobs of files in the depots that are unnecessary. Orphaned.

Is there a way to determine which files stored in the depots are orphans and could be removed? I don't want to keep that stuff around. Some of them are likely blob archives and take up a lot of space. I just don't know which ones p4d knows about.

Thanks.

#2 Matt Janulewicz

Matt Janulewicz

    Advanced Member

  • Members
  • PipPipPip
  • 144 posts
  • LocationSan Francisco, CA

Posted 15 August 2016 - 05:27 PM

I can think of two ways to attack this, but there's probably more.

First, if you have a reasonably new p4d, run 'p4 fstat -Oc' on all your files, filtering on lbrFile:

$ p4 fstat -Oc -T lbrFile ywaves.err1
... lbrFile //depot/demo/majanu/ywaves.err1

This points at where your library file is, relative to your server's root. In my case, this archive would be at /p4/1/root/depot/demo/majanu/ywaves.err[undetermined]

The "[undetermined]" tag in the above line is because text files (historically) were stored in an RCS file with a ',v' extension. Binary files are in their own subdirectory with a ',d' extension. Complicating matters is that you could set the server these days to store everything in an individual gz set of archives, so now even text files might be in a ',d' directory. If you wanted to be safe, you'd parse out all those paths and rsync both the ',v' files and ',d' directories.

As it happens, I've written a script to to this. You'll also want the script for shelves:

https://swarm.worksh...ind_archives.sh
https://swarm.worksh...helved_files.sh

Personally, I think that way is a bit sloppy, but if you only have one master server it's probably the only way to go. Our environment consists of a commit server, a bunch of edges and numerous read-only instances. I found myself building out new servers a lot this year and came across the same problem, to the tune of around half a terabyte of cruft. So I wrote another script to build out a replica using 'p4 verify -qt' which transfers needed archive files automatically. This is probably the best way to do it if you want a 100% concise server with no cruft in it whatsoever. Depending on how big your set of library/archives is, you might be building it out for a few days. Even though it might be a hassle, if you had the hardware to do it, it might be worth making a temporary read-only replica of your server and using this script to populate the library/archive files:

https://swarm.worksh...seed_replica.sh

A fun little project is doing test runs of the transfer using smaller sets of files, different number of pull threads, etc. to find out what the limits of your system are. The comments at the beginning of that script are where our hardware's sweet spot was, yours may vary.

FINAL IMPORTANT NOTE:

Your library/archive files are not the only unique things about your server, as far as those types of files go. You also need to worry about shelves and unload repos (and probably archive repos, but we don't have any of those.) No matter what type of build-out you go with, when you're done you need to be sure you include shelves and unload. For scenario #1, the list_shelved_files.sh script will help. With scenario #2 you can transfer shelves programmatically:

$ p4 changes -s shelved | awk '{ print $2 }' | xargs -n1 -I {} p4 verify -qSt @={}

And unload:

$ p4 verify -qUt //unload/...

In either case, when you think you're done, don't decommission or otherwise destroy your original server without performing a complete backup, then doing a complete verify, with no errors:

> p4 verify -qz //...
> p4 changes -s shelved | awk '{ print $2 }' | xargs -n1 -I {} p4 verify -qS @={}
> p4 verify -qU //...

-Matt Janulewicz
Staff SCM Engineer, Perforce Administrator
Dolby Laboratories, Inc.
1275 Market St.
San Francisco, CA 94103, USA
majanu@dolby.com

#3 Br.Bill

Br.Bill

    Member

  • Members
  • PipPip
  • 21 posts

Posted 15 August 2016 - 11:33 PM

This is great info, Matt. Thanks. And yes, I would never do anything like this without making full backups first!

#4 P4Sam

P4Sam

    Advanced Member

  • Members
  • PipPipPip
  • 484 posts
  • LocationSan Francisco, CA

Posted 16 August 2016 - 03:41 PM

View PostMatt Janulewicz, on 15 August 2016 - 05:27 PM, said:

$ p4 fstat -Oc -T lbrFile ywaves.err1
... lbrFile //depot/demo/majanu/ywaves.err1

This points at where your library file is, relative to your server's root. In my case, this archive would be at /p4/1/root/depot/demo/majanu/ywaves.err[undetermined]

The "[undetermined]" tag in the above line is because text files (historically) were stored in an RCS file with a ',v' extension. Binary files are in their own subdirectory with a ',d' extension. Complicating matters is that you could set the server these days to store everything in an individual gz set of archives, so now even text files might be in a ',d' directory. If you wanted to be safe, you'd parse out all those paths and rsync both the ',v' files and ',d' directories.

Given that you've already written a script around it this probably isn't useful, but if you want to figure this out from the metadata with no guessing, look at the lbrType field:

C:\test>p4 fstat -Oc foo#1 | grep lbr
... lbrFile //stream/rel1/foo
... lbrRev 1.10
... lbrType text
... lbrIsLazy 1

The lbrType is the storage type; if it says "text" (or anything+D) the revision is in a ,v file (named lbrFile,v) and if it says "binary" (or anything+C) the revision is a gzip in a directory (named lbrFile,d).  The lbrRev tells you the name the revision is stored under.

#5 Matt Janulewicz

Matt Janulewicz

    Advanced Member

  • Members
  • PipPipPip
  • 144 posts
  • LocationSan Francisco, CA

Posted 16 August 2016 - 03:51 PM

Excellent! I will admit that I got a little lazy doing this stuff. :)

For completeness, would you have to do this for each file revision? I haven't checked (again, lazy), but if you had a server that's been going for a long time and had a lot of rcs files in it, then one day you set it to use full file storage on some text files (or globally), I think you'd end up with a combination of ,v and ,d for the same file ...?
-Matt Janulewicz
Staff SCM Engineer, Perforce Administrator
Dolby Laboratories, Inc.
1275 Market St.
San Francisco, CA 94103, USA
majanu@dolby.com

#6 P4Sam

P4Sam

    Advanced Member

  • Members
  • PipPipPip
  • 484 posts
  • LocationSan Francisco, CA

Posted 16 August 2016 - 06:11 PM

Yup, each revision is its own story.  

If you have large binary files and you're trying to reclaim space, you definitely want to be looking at each revision anyway, since for those you'll want to delete individual orphaned revisions out of the ,d directory rather than treating each file as all or nothing.

#7 Br.Bill

Br.Bill

    Member

  • Members
  • PipPip
  • 21 posts

Posted 16 August 2016 - 07:05 PM

I know we have files that have had text to binary type changes and vice versa. So I do probably want to look at each revision.





Also tagged with one or more of these keywords: p4d, depots

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users