Jump to content


Processes hanging for days


  • Please log in to reply
13 replies to this topic

#1 adams_s

adams_s

    Member

  • Members
  • PipPip
  • 20 posts

Posted 05 October 2020 - 07:48 AM

Our Perforce server has started taking days to complete processes which should be taking minutes. Most recently, I did a +w type change on several gigs (say 30-ish) of files, and the typemap process has been running for 72 hours. Lockstat says there are no locked tables.

This isn't the first time it's happened, we've routinely had submits run for days without exiting. In all cases, there is no unusual CPU or disk activity on the server, it just idles and the processes never exit.

Is there some way I can figure out what is blocking these processes? Our server is running in a Proxmox guest VM, and we're not detecting anything untoward there on the hardware level.

#2 Miles O'Neal

Miles O'Neal

    Advanced Member

  • Members
  • PipPipPip
  • 214 posts
  • LocationAustin. Texas. Y'all.

Posted 05 October 2020 - 04:01 PM

Which version of p4d? We never had this exact problem, but we had similar issues with 2017.1  and before.

Is there a broker involved?

I assume the submits were not sitting in an editor...

#3 adams_s

adams_s

    Member

  • Members
  • PipPip
  • 20 posts

Posted 06 October 2020 - 10:21 AM

We're on 2019.2/1918134, Linux64. Submits were done from the command line, sometimes on the server itself. No broker involved. 92 hours is the current record for a "p4 type" process that's just idling.

#4 Sambwise

Sambwise

    Advanced Member

  • Members
  • PipPipPip
  • 1191 posts

Posted 06 October 2020 - 03:17 PM

View Postadams_s, on 06 October 2020 - 10:21 AM, said:

Submits were done from the command line, sometimes on the server itself.

Like Miles, I'm curious what stage the submits were "stuck" on.  What was the last thing printed to the command line?  Submit goes through multiple phases and looking at what output you've gotten from the server so far is a good clue as to where specifically it might have gotten stuck.

If it seems like nothing at all got printed, like Miles said, it's likely just stuck at the "edit the spec" stage -- double-check P4EDITOR to make sure it's set to something reasonable.

Are there any triggers configured?  Other than waiting on a response from the client, that's the most likely way for a command to get "stuck" that doesn't involve lock contention.

#5 adams_s

adams_s

    Member

  • Members
  • PipPip
  • 20 posts

Posted 07 October 2020 - 12:58 PM

On the server we're using vi, on client machines dos cmd + notepad and linux bash + nano, or just plain bash when running p4convert. Most of our lockups have been when using p4convert, in which case the last console line is the last file submitted for the given SVN import/convert being run. That same last line is also the last thing we see in the server logs.

The P4 command for a hanging p4convert is "dm-submitChange". For a "+w type" executed from dos cmd it's "typemap". Pretty sure I recall the p4 type command exiting immediately on the client end without any errors, but the processes run on the server forever. In cases where the client end waits forever, abort from the client doesn't end the server process. We eventually have to do a "kill -9 ID" on the server, or just reboot.

We have no triggers on this server that could be blocking our submits, but these errors do feel like they're idling waiting for something like a trigger to return. There are no detectable errors, locks, data corruption, cpu activity, disk activity, nada.

The context for all of this is that we're trying to migrate a large SVN project into Perforce. We do a large initial import, then several smaller incremental updates with recent changes. The large initial import works, the incremental updates eventually start locking up. When we obliterate+delete+recreate the depot we import into, the initial huge import works again, then subsequent incremental imports lock up again. The problem eventual spread to other depots on our server, it seems that updating or changing large number of files makes the server sad, but it's hard to say with certainty.

#6 Miles O'Neal

Miles O'Neal

    Advanced Member

  • Members
  • PipPipPip
  • 214 posts
  • LocationAustin. Texas. Y'all.

Posted 07 October 2020 - 03:05 PM

I have no idea what p4convert is doing with the description field on the submit. Does "p4 monitor show -al" show anything useful? (Assuming you have the server's monitor configurable cranked up; you might set it to 25 for now to debug.)

I have never seen a submit hang except waiting for an edit session to finish, but we also jumped right over 2019.x .

Not that this solves the problem, but you said you had to kill the server PID. Did you try "p4 monitor terminate" for the child PID in question?

#7 adams_s

adams_s

    Member

  • Members
  • PipPip
  • 20 posts

Posted 08 October 2020 - 07:59 AM

View PostMiles O, on 07 October 2020 - 03:05 PM, said:

I have no idea what p4convert is doing with the description field on the submit. Does "p4 monitor show -al" show anything useful? (Assuming you have the server's monitor configurable cranked up; you might set it to 25 for now to debug.)

I have never seen a submit hang except waiting for an edit session to finish, but we also jumped right over 2019.x .

Not that this solves the problem, but you said you had to kill the server PID. Did you try "p4 monitor terminate" for the child PID in question?

No, we've tried several p4 monitor values, nothing useful there.

And yes, we tried p4 monitor terminate, that marked them for termination but they still refused to exit. kill -9 was the only way.

We've been through this for several days with our Perforce rep and can't find anything wrong with Perforce itself. Was just hoping someone else might recognize this behavior. Time to call in a hardware exorcist, because that's what this is starting to smell like.

#8 Sambwise

Sambwise

    Advanced Member

  • Members
  • PipPipPip
  • 1191 posts

Posted 08 October 2020 - 02:33 PM

If it happens only with p4convert, I'm suspicious that it might be a client-side problem -- IIRC you can get "zombie" processes like this if the client opens a connection and holds it open forever without closing it.  P4V used to do this once upon a time and clog p4d with zombies, although AFAIK that got fixed years ago.

#9 adams_s

adams_s

    Member

  • Members
  • PipPip
  • 20 posts

Posted 12 October 2020 - 08:59 AM

Hardware checks came back clean, ie, we've managed to reproduce the error using a totally new server install on different hardware, so we're back to suspecting that the client is doing something weird, as these errors always occur when or after we run p4convert.

What could be the cause of a Perforce process that is marked for termination with "p4 monitor terminate" but still manages to live on days later? To me that suggests something at the OS level is blocking Perforce from terminating processes normally.

#10 MiteshPatel

MiteshPatel

    Newbie

  • Members
  • Pip
  • 2 posts
  • LocationIndia

Posted 13 October 2020 - 08:23 AM

Which version of p4d  you are going to use?
Is in this process any broker is involved

#11 adams_s

adams_s

    Member

  • Members
  • PipPip
  • 20 posts

Posted 13 October 2020 - 03:10 PM

View PostMiteshPatel, on 13 October 2020 - 08:23 AM, said:

Which version of p4d  you are going to use?
Is in this process any broker is involved

2019.2/1918134, no broker.

Also just noticed that each time we've experienced this weirdness, p4convert has tried to push a single changeset containing 100+ gigs of data and 500K+ files. Not sure that's such a good thing.

#12 Miles O'Neal

Miles O'Neal

    Advanced Member

  • Members
  • PipPipPip
  • 214 posts
  • LocationAustin. Texas. Y'all.

Posted 13 October 2020 - 04:36 PM

Yikes. It would be interesting to run atop and see what's happening (if anything).

Does "p4 lockstat" or "p4 monitor show -aL" show any locks in use?

#13 Matt Janulewicz

Matt Janulewicz

    Advanced Member

  • Members
  • PipPipPip
  • 230 posts
  • LocationSan Diego, CA

Posted 20 October 2020 - 05:38 AM

Just one more suggestion to try during a long submit is to tail/follow the live journal to see if it's actually writing anything to the database. If not, it might be more evidence that the client is waiting for the server and the server is waiting for the client.

Even with a modest server spec, 100+ GB and 500K+ files doesn't seem like it should be taking days, unless your network is slow or being throttled/QOS'ed. Or ...

If this really is a VM, how much memory is allocated to it? Is it reserved? How big is your database vs memory? How many CPU's? Is parallel submit being invoked?

I've seen similar things happen on VM's (or even hardware) where a p4d process is wanting for memory or CPU's. Certain transactions will seem to be waiting for an event that never happens, but throwing more memory at it often clears it up. P4D relies on the system's own caching so (if this is Linux) you might run 'htop' to see how much memory is being used, how much is cached, and if you're swapping.

'strace' (also Linux) is one of my go-to tools to see what a process is touching when all the obvious ideas have been exhausted.
-Matt Janulewicz
Currently unemployed, looking for work in Boise, ID!

#14 Robert Cowham

Robert Cowham

    Advanced Member

  • PCP
  • 279 posts
  • LocationLondon, UK

Posted 23 October 2020 - 09:56 AM

Agree with Matt - 'strace' is your friend for seeing what is happening.
Also the tool 'lslocks -J -o +BLOCKER' is a great help for locks on Linux.
Co-Author of "Learning Perforce SCM", PACKT Publishing, 25 September 2013, ISBN 9781849687645

"It's wonderful to see a new book about Perforce, especially one written by Robert Cowham and Neal Firth. No one can teach Perforce better than these seasoned subject matter experts"
  • Laura Wingerd, author of Practical Perforce, former VP of Product Technology at Perforce




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users