Jump to content


p4 monitor terminate never seems to work


  • Please log in to reply
25 replies to this topic

#1 UnstoppableDrew

UnstoppableDrew

    Advanced Member

  • Members
  • PipPipPip
  • 53 posts

Posted 17 September 2014 - 01:58 PM

I've been noticing that the monitor terminate command doesn't seem to work most of the time, at least when it comes to sync commands. One of my Jenkins jobs was hanging trying to sync, so I killed the job. However the sync process was still going, so I used p4 monitor terminate to try and kill it. 17 hours later, it's still going. Normally I would go onto the Perforce server itself to take out the rogue process, but at my current job while I'm a p4 super user, I don't have access to the server.

#2 P4Shimada

P4Shimada

    Advanced Member

  • Members
  • PipPipPip
  • 831 posts

Posted 18 September 2014 - 12:12 AM

Hi,

Sorry to hear that you have a hung sync command(s). From your 'p4 info' output can you give us the full server version string? That way in terms of trouble-shooting and solutions we can give you suggestions according to your OS environment and server version.

#3 UnstoppableDrew

UnstoppableDrew

    Advanced Member

  • Members
  • PipPipPip
  • 53 posts

Posted 18 September 2014 - 01:28 PM

For additional data points, I'm going through a local proxy, and it looks like we're using a broker as well based on p4 info:

Server address: localhost:1667
Server root: /p4/1/root
Server date: 2014/09/18 06:18:53 -0700 PDT
Server uptime: 124:18:09
Server version: P4D/LINUX26X86_64/2013.1/685046 (2013/08/07)
Broker address: perforce-new1:1666
Broker version: P4BROKER/LINUX26X86_64/2013.1/659207
Proxy address: pforce.mycompany.com:1999
Proxy version: P4P/LINUX26X86_64/2013.1/610569 (2013/03/19)

It seems like our office is particularly prone to this problem, one of the developers here has at least half a dozen processes that are running forever. At least a couple are caused by starting a sync, then killing it with ^C.

Looking at the proxy's log I see a bunch of these:

Perforce proxy error:
Date 2014/09/17 23:01:15:
Connection from 10.95.7.39 broken.
TCP receive failed.
read: socket: Connection timed out

There were 4 identical entries each for 23:01:14 & 23:01:15.


#4 P4Shimada

P4Shimada

    Advanced Member

  • Members
  • PipPipPip
  • 831 posts

Posted 19 September 2014 - 08:02 PM

Thank you for sending the proxy error message from your log along with your Perforce system info.

The error message means that the connection between the server and client was unexpectedly terminated or that the client exited an unexpected time.  This can be cause by someone killing off the client or a user issued CTRL-C, a network link dropping, a server reboot etc.

Try using the "p4 lockstat" command to know if Perforce is running and if the troublesome Perforce sync command is locking the database. See the following for an example of how to use this command on a hung server:

    http://answers.perfo...Perforce-server

In general, you can try this and run:

    p4 lockstat -c <client>

and you see that the Perforce database is locked. To confirm if a process is still locked, run:

    ps -elf | grep p4d

In any case, you can run:

    p4 monitor terminate <pid>

Let us know if this frees up your Perforce system again and whether this works for you.

REFERENCES

http://answers.perfo...-Metadata-Locks

http://answers.perfo...n-Unix-Systems/

http://answers.perfo...hild-Processes/

http://answers.perfo...-a-hung-server/

#5 UnstoppableDrew

UnstoppableDrew

    Advanced Member

  • Members
  • PipPipPip
  • 53 posts

Posted 19 September 2014 - 08:59 PM

Ok, so it looks like the clients database is the problem here:

p4 lockstat -C
Read : clients/aem_gem
Read : clients/jenkins_gem-master-96886
Read : clients/jenkins_gne_1000
Read : clients/jenkins_gne_1100
Read : clients/jenkins_xms_master
Read : clients/service.engweb_bccm-jenkins

All the stuck jobs are in one of those clients:

1233 T buildmaste 24:59:48 sync -f //jenkins_xms_master/...@636431
9316 T buildmaste 71:58:02 sync //jenkins_gem-master-96886/...@629941
15293 R buildmaste 01:59:12 sync //jenkins_gne_1000/...@636273
15525 T drew 48:54:17 client -d aem_gem
6058 R buildmaste 24:19:19 sync -f //jenkins_xms_master/...@636431
22212 T service.en 71:17:37 sync //service.engweb_bccm-jenkins/...@629965
23675 R buildmaste 01:51:15 sync -f //jenkins_gne_1100/...@636286
27779 T drew 50:07:47 change -i
29045 T drew 71:40:59 sync c:\p4\aem_gem\...#head

Unfortunately, monitor terminate doesn't help, which is why I started this thread in the first place. You can see most of these are already marked for terminate, and the ones that have been running for 71 hours now were marked as such within the first hour, some much less than that. In previous jobs, I have used p4d -c "kill <PID>" directly on the server to handle these things, but in the current job, I do not have direct access to the Perforce server.

#6 ThatGuy

ThatGuy

    Advanced Member

  • Members
  • PipPipPip
  • 33 posts

Posted 19 September 2014 - 11:07 PM

Interested to find out the solution to this problem as well. I have seen p4 monitor terminate take hours to terminate processes as well. I have no details I can share but I know this is a problem that occurs sometimes.
Certified P4.

#7 G Barthelemy

G Barthelemy

    Advanced Member

  • Members
  • PipPipPip
  • 65 posts
  • LocationUnited Kingdom

Posted 23 September 2014 - 02:19 PM

View PostUnstoppableDrew, on 19 September 2014 - 08:59 PM, said:

Unfortunately, monitor terminate doesn't help, which is why I started this thread in the first place. You can see most of these are already marked for terminate, and the ones that have been running for 71 hours now were marked as such within the first hour, some much less than that. In previous jobs, I have used p4d -c "kill <PID>" directly on the server to handle these things, but in the current job, I do not have direct access to the Perforce server.

You will find that killing the client process will let the process on the server terminate. I have always been wary of killing hanged syncs on the server for fear of DB corruption, although I have to admit this is more out of superstition than rationality, as by then the sync is no longer streaming data to the client and it is no longer writing to the database, crucially. Using p4d -c "kill <PID> " is a great idea. I must say that without access to the server, even tracing the client PID (e.g. using netstat -p and matching ports) is not going to be straight forward if the client is on a busy shared host (and the chain is longer to follow if you use proxies and brokers).

I have found that there is always a valid explanation when a process squat the monitor table even when p4-terminated but in the case of syncs I have never gone to the bottom of the underlying reason, for lack of time. With syncs, it is often caused by the client issuing a INT or TERM signal without actually exiting the client process (often in scripts or client applications): the sync stops, the peer p4d process does no longer access the database, but the socket pair stays open and seem to keep each other alive (tcpdump / wireshark show that both send each other short packets at regular interval and neither will timeout). There is a TCP-ish flavour to this issue, my gut feeling is that it's not necessarily just at the application layer. We seem to have this issue exclusively with remote sites connecting to the Perforce server through TCP accelerators, but that could be just a coincidence.

Now sometimes processes hang simply because they depend on others, so for example just a few days ago a user caused a sync to hang. Then he proceeded to delete his problematic client from P4V, not once but 4 times probably because the client just would not go away, my guess is because there was a client lock on it due to the sync (lockstat -c or -C is not related to database table access, by the way, it just reports on client locks). He eventually exited P4V. The 4 "client -d" commands remained in the monitor table (with no tcp peer at the client end) and the client spec was still in the database. I made him kill his script related to the sync. As soon as he did, all the related p4d processes exited gracefully, including the 4 "client -d" (the first one of which actually deleted the client spec from the DB).

Another scenario where processes can't be p4-terminated: someone is in the middle of editing a spec and their window manager die for example. Again, this is understandable (if the editor process did not exit) and is cleaner to resolve at the client end...

I used to be a little OCD with processes that were just hanging in the monitor table, but now I tend to let them go away by themselves. Maybe it's an age thing :-) Clients eventually reboot their machines, etc... As long as your server is not busy to the point where it threatens to run out of PIDs, of course...

#8 P4Shimada

P4Shimada

    Advanced Member

  • Members
  • PipPipPip
  • 831 posts

Posted 23 September 2014 - 06:22 PM

Hi Drew,

Thanks for sending your output of lockstat and showing the commands marked for terminate. Since your server is earlier than 2013.2, you may want to disable the server.locks.dir until going to the 2013.3+ versions. (With 2013.2 or later server's, administrators may set server.locks.sync=0 to specify that the sync command not take the client workspace lock at all; at the default setting of 1, the client workspace lock is taken in shared mode as before.)

The following doc shows how to disable this:

  http://answers.perfo...-Metadata-Locks

#9 BrianH

BrianH

    Member

  • Members
  • PipPip
  • 15 posts
  • LocationMiddleton, WI

Posted 16 October 2017 - 03:02 PM

We are on the newest server version available and are experiencing this same issue. Our culprit appears to be "reconcile". We have one that's been running all weekend. Terminate, of course, does not terminate. Why is that there again? To make administrators feel good?

p4 lockstat -C also shows a write lock on a client.

We will attempt to kill any P4 related processes on the infected computer and let you know how it goes.

#10 Miles O'Neal

Miles O'Neal

    Advanced Member

  • Members
  • PipPipPip
  • 143 posts

Posted 16 October 2017 - 04:10 PM

FWIW: We don't have to run terminate too often, but I find that at least half the time, I end up having to kill the process on the server. I don't have any data on which commands cause this. I've seen this since I started working with Perforce, including 2013.x, 2015.x or 2016.x, and 2017.1 .

#11 p4rfong

p4rfong

    Advanced Member

  • Staff Moderators
  • 343 posts

Posted 16 October 2017 - 04:54 PM

We now have a solution for this.  Run
p4 configure set db.monitor.interval=30
Then the next time a process is hung, you can safely run

p4 monitor terminate pid
See http://answers.perfo...rticles/KB/3785

#12 Matt Janulewicz

Matt Janulewicz

    Advanced Member

  • Members
  • PipPipPip
  • 187 posts
  • LocationSan Francisco, CA

Posted 16 October 2017 - 06:51 PM

View PostMiles O, on 16 October 2017 - 04:10 PM, said:

FWIW: We don't have to run terminate too often, but I find that at least half the time, I end up having to kill the process on the server. I don't have any data on which commands cause this. I've seen this since I started working with Perforce, including 2013.x, 2015.x or 2016.x, and 2017.1 .

I find that I have to do a proper 'kill' 100% of the time. My usual process is 'p4 monitor terminate pid', 'p4 monitor clear pid', 'kill pid'.

I also find that on our Commit server we typically have around 100 pid's active (ps ax | grep p4d), though 'p4 monitor show' usually only shows about 10-15 processes, what I'd expect. I think perhaps when people 'ctl+c' on a process it doesn't get killed on the server. I long ago accepted this as a fact of life.
-Matt Janulewicz
Staff SCM Engineer, Perforce Administrator
Dolby Laboratories, Inc.
1275 Market St.
San Francisco, CA 94103, USA
majanu@dolby.com

#13 BrianH

BrianH

    Member

  • Members
  • PipPip
  • 15 posts
  • LocationMiddleton, WI

Posted 16 October 2017 - 08:27 PM

View Postp4rfong, on 16 October 2017 - 04:54 PM, said:

We now have a solution for this.  Run
p4 configure set db.monitor.interval=30
Then the next time a process is hung, you can safely run

p4 monitor terminate pid
See http://answers.perfo...rticles/KB/3785

The super cool thing is that we have that enabled on our server already and have recently rebooted so I know it should have picked it up. Terminate still does nothing.

#14 Mailman Sync

Mailman Sync

    Advanced Member

  • Maillist Aggregator
  • 2495 posts

Posted 16 October 2017 - 08:30 PM

Originally posted to the perforce-user mailing list by: Michael Mirman


Quote

Quote

I also find that on our Commit server we typically have around 100 pid's
active (ps ax | grep p4d), though 'p4 monitor show' usually only shows
about 10-15 processes, what I'd expect.

I can't explain this with Ctl-C. Presumably, the p4d process *should* go away (at least eventually) if the client does Ctl-C.
The p4d log often says "Partner exited unexpectedly", and at least in some cases it could be traced back to Ctrl-C.

However, there are often cases when p4d leaks processes. Sometimes they are shown in "p4 monitor", sometimes they aren't.
In our case, there is almost always an edge server and a broker in the chain, and that possibly plays an important role in leaking.

We have a script, which runs as a cron job, looking for leaked and/or hung processes, and either kills them or it can report them for an investigation.
Our latest bane is leaking "login" processes ("p4 login -s", "p4 login", "p4 login -a").
Users never complain - this is purely the problem on the server side. We get about a dozen of those processes on a good day, and several dozen processes on a bad day.

Quote

I long ago accepted this as a fact of life.

:-(

--
Michael Mirman
MathWorks, Inc.
3 Apple Hill Drive, Natick, MA 01760
508-647-7555

Quote

-----Original Message-----
From: perforce-user [mailto:perforce-user-bounces@perforce.com] On Behalf
Of Matt Janulewicz
Sent: Monday, October 16, 2017 2:55 PM
To: perforce-user@perforce.com
Subject: Re: [p4] p4 monitor terminate never seems to work

Posted on behalf of forum user 'Matt Janulewicz'.



[https://forums.perfo...e=forums
n=findpost&pid=22495]
Miles O, on 2017/10/16 16:10:52 UTC, said:

Quote

   FWIW: We don't have to run terminate too often, but I find that at least half
the time, I end up having to kill the process on the server. I don't have any data
on which commands cause this. I've seen this since I started working with
Perforce, including 2013.x, 2015.x or 2016.x, and 2017.1 .

Quote


I find that I have to do a proper 'kill' 100% of the time. My usual
process is 'p4 monitor terminate pid', 'p4 monitor clear pid',
'kill pid'.

I also find that on our Commit server we typically have around 100 pid's
active (ps ax | grep p4d), though 'p4 monitor show' usually only shows
about 10-15 processes, what I'd expect. I think perhaps when people
'ctl+c' on a process it doesn't get killed on the server. I long ago
accepted this as a fact of life.



--
Please click here to see the post in its original format:
  http://forums.perfor...itor-terminate-
never-seems-to-work
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user




#15 Matt Janulewicz

Matt Janulewicz

    Advanced Member

  • Members
  • PipPipPip
  • 187 posts
  • LocationSan Francisco, CA

Posted 16 October 2017 - 08:34 PM

View PostMailman Sync, on 16 October 2017 - 08:30 PM, said:

Originally posted to the perforce-user mailing list by: Michael Mirman




I can't explain this with Ctl-C. Presumably, the p4d process *should* go away (at least eventually) if the client does Ctl-C.
The p4d log often says "Partner exited unexpectedly", and at least in some cases it could be traced back to Ctrl-C.

However, there are often cases when p4d leaks processes. Sometimes they are shown in "p4 monitor", sometimes they aren't.
In our case, there is almost always an edge server and a broker in the chain, and that possibly plays an important role in leaking.

We have a script, which runs as a cron job, looking for leaked and/or hung processes, and either kills them or it can report them for an investigation.
Our latest bane is leaking "login" processes ("p4 login -s", "p4 login", "p4 login -a").
Users never complain - this is purely the problem on the server side. We get about a dozen of those processes on a good day, and several dozen processes on a bad day.



Yeah, edge servers. We have them.

We do the 'optimized db swap-into-production' mambo once a week, which requires each service to be restarted, so that clears up the rogues quite nicely. I realize that's the 'Did you try turning it off and back on?' solution but it works well for us.
-Matt Janulewicz
Staff SCM Engineer, Perforce Administrator
Dolby Laboratories, Inc.
1275 Market St.
San Francisco, CA 94103, USA
majanu@dolby.com

#16 Mailman Sync

Mailman Sync

    Advanced Member

  • Maillist Aggregator
  • 2495 posts

Posted 16 October 2017 - 08:45 PM

Originally posted to the perforce-user mailing list by: Michael Mirman


Quote

... we have that enabled on our server already...

We turned it on in February, and removed it a month later because it was causing p4d segfaults in certain cases (like "p4 admin stop").
It was fixed later in 2016.2, and we set it again about a month ago.
I am pretty sure "p4 monitor terminate" does not always terminate the server process, but whether those processes would actually go away on their own after 30 minutes(?) or not, I can't say. I'm not that patient, and "kill -15" works every time.

--
Michael Mirman
MathWorks, Inc.
3 Apple Hill Drive, Natick, MA 01760
508-647-7555

Quote

-----Original Message-----
From: perforce-user [mailto:perforce-user-bounces@perforce.com] On Behalf
Of BrianH
Sent: Monday, October 16, 2017 4:30 PM
To: perforce-user@perforce.com
Subject: Re: [p4] p4 monitor terminate never seems to work

Posted on behalf of forum user 'BrianH'.



[https://forums.perfo...e=forums
n=findpost&pid=22496]
p4rfong, on 2017/10/16 16:54:12 UTC, said:

Quote

   We now have a solution for this.  Run
  p4 configure set db.monitor.interval=30
  Then the next time a process is hung, you can safely run

  p4 monitor terminate  pid
  See    http://answers.perfo...rticles/KB/3785

The super cool thing is that we have that enabled on our server already and
have
recently rebooted so I know it should have picked it up. Terminate still does
nothing.



--
Please click here to see the post in its original format:
  http://forums.perfor...itor-terminate-
never-seems-to-work
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user
_______________________________________________
perforce-user mailing list  -  perforce-user@perforce.com
http://maillist.perf...o/perforce-user




#17 BrianH

BrianH

    Member

  • Members
  • PipPip
  • 15 posts
  • LocationMiddleton, WI

Posted 18 October 2017 - 03:19 PM

So, should I open a support ticket on this?

View PostBrianH, on 16 October 2017 - 08:27 PM, said:

View Postp4rfong, on 16 October 2017 - 04:54 PM, said:

We now have a solution for this.  Run
p4 configure set db.monitor.interval=30
Then the next time a process is hung, you can safely run

p4 monitor terminate pid
See http://answers.perfo...rticles/KB/3785

The super cool thing is that we have that enabled on our server already and have recently rebooted so I know it should have picked it up. Terminate still does nothing.


#18 p4rfong

p4rfong

    Advanced Member

  • Staff Moderators
  • 343 posts

Posted 18 October 2017 - 06:07 PM

Yes, do open a support ticket.  Use of "p4 monitor terminate" used in conjunction with db.monitor.interval ought to be working.

#19 Sambwise

Sambwise

    Advanced Member

  • Members
  • PipPipPip
  • 946 posts

Posted 18 October 2017 - 06:33 PM

I haven't played with db.monitor.interval, but from the description of it in the KB article:

Quote

p4 configure set db.monitor.interval=30
which, for new commands processed by the server going forward, configures the server to check for and terminate processes that have had p4 monitor terminate run on them and are blocked on client input.

it sounds like it's checking for commands that are in a very specific state, i.e. waiting on a client request to complete (and it might be even more specific than that, e.g. it's checking client-EditData requests but not, say, client-ReconcileAdd).  Expecting it to be a panacea for all hung commands might be overly optimistic.

If you're able to track down the client that's running this reconcile and do more investigation it might be possible to figure out exactly what it's doing (if anything -- it wouldn't shock me if the client has in fact gone away and the server somehow lost track of it).  If it's a case of the reconcile command being really performance-intensive (e.g. you've got a hundred potentially-renamed files that are all near matches for each other and it's in combinatoric hell) that should be pretty easy to diagnose/reproduce, and you can fix it going forward by tweaking configurables to either make reconcile less fastidious or lower whatever other threshold is currently set too high.

#20 BrianH

BrianH

    Member

  • Members
  • PipPip
  • 15 posts
  • LocationMiddleton, WI

Posted 26 October 2017 - 04:21 PM

View Postp4rfong, on 18 October 2017 - 06:07 PM, said:

Yes, do open a support ticket.  Use of "p4 monitor terminate" used in conjunction with db.monitor.interval ought to be working.

We've had that setting active for a long time and have restarted the server many times. We have not upgraded server versions, only our P4Admin version to 2017.3 just today. We had a long standing process sync of over an hour that was due to a dropped VPN connection and we tried terminate thinking it wouldn't work.

It worked, inexplicably. I need to sit down. I have no explanation other than black undocumented magic.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users