Jump to content


reverts hanging

revert hang

  • Please log in to reply
4 replies to this topic

#1 Miles O'Neal

Miles O'Neal

    Advanced Member

  • Members
  • PipPipPip
  • 56 posts

Posted 03 July 2018 - 10:13 PM

A user asked for help with a hung revert. With 32+K accidentally deleted files, it documented reverting one hundred or so, and then froze. Two hours later I got to look at it and killed the process on the server. I became that user via sudo and played with things; it would consistently hang a short ways into the revert. I eventually ran "p4 opened //foo/... | sed -e 's/#.*//' > /tmp/revert.dat to get the paths of all the files that were open, then ran a loop to revert each file individually. It took about 1 minute per 1,000 files (32 minutes or so), but none of them hung. There were no streams or labels involved.
This occurred from both P4v (the user) and p4 (me).
Any ideas?
Thanks!

RHEL6 (required by third party software we use)
Rev. P4V/LINUX26X86_64/2017.2/1532340
Rev. P4/LINUX26X86_64/2017.1/1534792 (2017/07/26).
Rev. P4D/LINUX26X86_64/2017.1/1534792 (2017/07/26).

#2 Sambwise

Sambwise

    Advanced Member

  • Members
  • PipPipPip
  • 548 posts

Posted 04 July 2018 - 12:46 AM

That's an interesting one.  Did the hundred or so files actually get reverted?  Would the revert always revert a hundred files and then hang, or after the first time did it hang without reverting anything?  My guess would be that if it hung after 100 files it was something specific to the 101st file (meaning that thereafter it would always hang immediately on that same file) rather than the number of files reverted (meaning that each attempt would hang on a different file since it'd process 100 and then get stuck in a new place).  That'd be a good thing to validate, though, and if you know a particular file that's triggering the bug then it's a lot easier to scrutinize it for possible root causes.

As to why a particular file would be hanging, but only as part of a larger revert, my guess would be that there was some cleanup that needed to happen on that file (like something involving a move pair, or a shelf, or even a have record) that because the file was in an unusual state required a db probe with the reverted path as a filter (which means that it'd go a lot quicker if you were reverting that file by itself as opposed to as part of a larger batch).

The way I'd try to diagnose exactly what was going on would be to run the server with the -vdmc=5 (or some other N) flag to see if it dumps any helpful logging.  I think -vdb=N might also give something helpful.  At the end of the day, though, the exact diagnosis isn't that helpful if you don't have access to the source to fix it.  If you're trying to get a fix from the development team, sending a checkpoint that reproduces the problem is probably the most expeditious way.

#3 p4rfong

p4rfong

    Advanced Member

  • Staff Moderators
  • 239 posts

Posted 06 July 2018 - 05:52 PM

You can also try running
p4 monitor show -ael
p4 lockstat
p4 lockstat -C
as seen in "Fixing a hung server" https://community.pe.../s/article/3785
Also make sure you are running.
p4 configure set db.monitor.interval=30
This won't help this time, but it will help by allowing
p4 monitor terminate <pid>
to work in the future.   The results of the above lockstat commands may provide a clue.

#4 Miles O'Neal

Miles O'Neal

    Advanced Member

  • Members
  • PipPipPip
  • 56 posts

Posted 16 July 2018 - 03:40 PM

Thanks, y'all.
Yes, everything reverted. When I would kill the hung revert and start again, it would take up where it left off until it hung farther on at a different location, so no specific file was not revertable. When I dumped the list of opened files in the depot path, and reverted them individually, they all reverted with no problems.
I'd tried some of these things; there were no obvious lock problems.
If it happens again will check them all.
Thanks again.

#5 Sambwise

Sambwise

    Advanced Member

  • Members
  • PipPipPip
  • 548 posts

Posted 16 July 2018 - 08:38 PM

I dimly remember there being a bug at some point where TCP windows being too big (or too small?) would cause a hang with sync after a certain number of files (some kind of problem with the duplex transfer where it'd get wedged waiting for one of the buffers to get filled and sit there forever rather than flushing it).  The same thing would probably have happened on revert.  Is this with a direct connection or is there a proxy (or edge, or broker, or...) in between?





Also tagged with one or more of these keywords: revert, hang

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users