Jump to content


2014.2 p4p issues with 2012.1 p4d server

p4p proxy slow sync

  • Please log in to reply
8 replies to this topic

#1 Bruce Mc

Bruce Mc

    Advanced Member

  • Members
  • PipPipPip
  • 84 posts
  • LocationSeattle Area

Posted 10 March 2015 - 11:53 PM

I wanted to share an issue we saw recently using a 2014.2 proxy server with 2012.1 p4d server.

We are preparing to update our primary p4d server and related Perforce infrastructure to current version and were looking to find ways to shorten the downtime for the update process. Reading the release notes for the 2014.2 version of p4p, we saw the requirements listed "Perforce server: 2002.2 or higher". With this version interoperability available of the p4p server we upgraded some of our proxy servers to 2014.2.

We started noticing that some of our automated builds were taking longer. Investigating, we found that fulling syncing a workspace of ~ 8 Gb was taking 7 to 8 times longer for some builds. We found that the start of these long syncs coincided with the update of some of the proxy servers and the builds that didn't change their sync times were using the un-updated proxy servers. We moved the update proxy servers back to their original version and the sync times returned to normal.

Is this a known issue?

I'll know the support people will want to know the versions involved and see some log entries for a slow sync and normal sync. I'll include those below.

Bruce M

The server and proxy details
p4d - 2012.1
Perforce - The Fast Software Configuration Management System.
Copyright 1995-2014 Perforce Software. All rights reserved.
Portions copyright 1991-2005 Compuware Corporation.
This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/)
See 'p4 help legal' for full OpenSSL license information
Version of OpenSSL Libraries: OpenSSL 1.0.1h 5 Jun 2014
Rev. P4D/LINUX26X86_64/2012.1/876219 (2014/06/10).
License: none

p4p - 2014.2
Perforce - The Fast Software Configuration Management System.
Copyright 1995-2014 Perforce Software. All rights reserved.
This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/)
See 'p4 help legal' for full OpenSSL license information
Version of OpenSSL Libraries: OpenSSL 1.0.1i 6 Aug 2014
This product includes software developed by the OpenLDAP Foundation
(http://www.openldap.org/)
This product includes software developed by Computing Services
at Carnegie Mellon University: Cyrus SASL (http://www.cmu.edu/computing/)
See 'p4 help legal' for full Cyrus SASL and OpenLDAP license information
Version of OpenLDAP Libraries: 2.4.39
Version of Cyrus SASL Libraries: 2.1.26
Rev. P4P/LINUX26X86_64/2014.2/962050 (2014/11/13).

p4p - 2012.1
Perforce - The Fast Software Configuration Management System.
Copyright 1995-2012 Perforce Software. All rights reserved.
This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/)
See 'p4 help legal' for full OpenSSL license information
Version of OpenSSL Libraries: OpenSSL 1.0.0i 19 Apr 2012
Rev. P4P/LINUX26X86_64/2012.1/549175 (2012/11/05).

Some Perforce log examples
Perforce server info:
	 2015/03/02 07:34:27 pid 302 completed 3364s 26686+1668us 407856+641280io 0+0net 815144k 232pf
Perforce server info:
	 2015/03/02 06:38:23 pid 302 [email="p4user@Jenkins-jsvm-9-1.2.1a-7890r"]p4user@Jenkins-jsvm-9-1.2.1a-7890r[/email] 10.19.13.134/10.19.13.131 [p4/2010.2/LINUX26X86/295040] 'user-sync -q [email="...@438710'"]...@438710'[/email]
--- lapse 3364s
--- usage 26686+1668us 407856+641280io 0+0net 815144k 232pf
--- rpc msgs/size in+out 710827+1200813/192mb+665mb himarks 64835/64835 snd/rcv .017s/3323s
--- db.have
--- pages in+out+cached 49531+89612+96
--- pages reordered internal+leaf 7+400
--- locks read/write 1/4755 rows get+pos+scan put+del 0+1+1 488126+0
--- total lock wait+held read/write 0ms+0ms/74ms+14028ms
--- max lock wait+held read/write 0ms+0ms/1ms+1554ms
--- db.rev
--- pages in+out+cached 20568+0+96
--- locks read/write 1/0 rows get+pos+scan put+del 0+1+517905 0+0
--- total lock wait+held read/write 0ms+3139ms/0ms+0ms

Perforce server info:
	 2015/03/07 07:34:22 pid 8605 completed 417s 25657+1818us 71216+884064io 0+0net 819164k 0pf
Perforce server info:
	 2015/03/07 07:27:25 pid 8605 [email="p4user@Jenkins-jsvm-9-1.2.1a-7890r"]p4user@Jenkins-jsvm-9-1.2.1a-7890r[/email] 10.19.13.134/10.19.13.131 [p4/2010.2/LINUX26X86/295040] 'user-sync -q [email="...@439788'"]...@439788'[/email]
--- lapse 417s
--- usage 25657+1818us 71216+884064io 0+0net 819164k 0pf
--- rpc msgs/size in+out 710750+1200576/192mb+664mb himarks 64835/64835 snd/rcv .024s/369s
--- db.have
--- pages in+out+cached 168805+152967+96
--- pages reordered internal+leaf 499+24500
--- locks read/write 1/4244 rows get+pos+scan put+del 0+1+1 488130+0
--- total lock wait+held read/write 0ms+0ms/196ms+21796ms
--- max lock wait+held read/write 0ms+0ms/152ms+1516ms
--- db.rev
--- pages in+out+cached 20580+0+96
--- locks read/write 1/0 rows get+pos+scan put+del 0+1+517979 0+0
--- total lock wait+held read/write 0ms+1899ms/0ms+0ms

Perforce server info:
	 2015/03/01 05:40:23 pid 10413 completed 5345s 30114+2301us 131648+962032io 0+0net 780416k 640pf
Perforce server info:
	 2015/03/01 04:11:18 pid 10413 [email="p4user@Jenkins-jsvm-2-1.2.2b-0987r"]p4user@Jenkins-jsvm-2-1.2.2b-0987r[/email] 10.19.13.134/10.19.13.124 [p4/2010.2/LINUX26X86/295040] 'user-sync -q [email="...@438713'"]...@438713'[/email]
--- lapse 5345s
--- usage 30115+2301us 131648+962032io 0+0net 780416k 640pf
--- rpc msgs/size in+out 642888+1088582/172mb+602mb himarks 64835/64835 snd/rcv .864s/5300s
--- db.have
--- pages in+out+cached 148178+134343+96
--- pages reordered internal+leaf 440+21846
--- locks read/write 1/3153 rows get+pos+scan put+del 0+1+1 442940+0
--- total lock wait+held read/write 0ms+0ms/58ms+16252ms
--- max lock wait+held read/write 0ms+0ms/3ms+536ms
--- db.rev
--- pages in+out+cached 25189+0+96
--- locks read/write 1/0 rows get+pos+scan put+del 0+1+579311 0+0
--- total lock wait+held read/write 0ms+3817ms/0ms+0ms

Perforce server info:
	 2015/03/08 04:23:49 pid 28797 completed 759s 23750+1774us 626472+759936io 0+0net 782300k 624pf
Perforce server info:
	 2015/03/08 04:11:10 pid 28797 [email="p4user@Jenkins-jsvm-2-1.2.2b-0987r"]p4user@Jenkins-jsvm-2-1.2.2b-0987r[/email] 10.19.13.134/10.19.13.124 [p4/2010.2/LINUX26X86/295040] 'user-sync -q [email="...@439786'"]...@439786'[/email]
--- lapse 759s
--- usage 23750+1774us 626472+759936io 0+0net 782300k 624pf
--- rpc msgs/size in+out 642939+1087920/172mb+599mb himarks 64835/64835 snd/rcv .212s/689s
--- db.have
--- pages in+out+cached 111978+117153+96
--- pages reordered internal+leaf 313+15841
--- locks read/write 1/3052 rows get+pos+scan put+del 0+1+1 442952+0
--- total lock wait+held read/write 0ms+1ms/33ms+41341ms
--- max lock wait+held read/write 0ms+1ms/1ms+1699ms
--- db.rev
--- pages in+out+cached 25207+0+96
--- locks read/write 1/0 rows get+pos+scan put+del 0+1+579516 0+0
--- total lock wait+held read/write 0ms+2112ms/0ms+0ms


#2 Bruce Mc

Bruce Mc

    Advanced Member

  • Members
  • PipPipPip
  • 84 posts
  • LocationSeattle Area

Posted 11 March 2015 - 06:32 PM

I should clarify for the two log snippet groups...
  • The log snippets are grouped by workspace..
  • The first log snippet (with the earlier date) in each group is when the 2014.2 p4p was in use
  • The second log snippet (with the later date) in each group is when the 2012.1 p4p was in use
Bruce M

#3 P4Shimada

P4Shimada

    Advanced Member

  • Members
  • PipPipPip
  • 831 posts

Posted 11 March 2015 - 06:36 PM

Hi Bruce,

Thank you for the full version strings and the server log snippets right away and nicely formatted. I noticed from the couple of sync command examples (p4 sync -q ...@change) that:
  • The 2012.1 p4p client transferred:
    • 665 MB to the 2010.2 p4 client with a lapse time of 3364 seconds
    • 664 MB to the 2010.2 p4 client with a lapse time of 417s seconds
  • The 2014.2 p4p client transferred:
    • 602 MB to the 2010.2 p4 client with a lapse time of 5345 seconds
    • 599 MB to the 2010.2 p4 client with a lapse time of 759s seconds
Thus roughly it took 1 1/2 times longer to sync/transfer slightly less data with the updated proxy (2014.2 p4p client).

* NOTE: Also,I noticed the command line 'p4' client is the 2010.2 version
  while the server and proxy are at the 2012.1 version. Thus we recommend
  to please upgrade your p4 client to at least the 2012.1 version:

  http://ftp.perforce....n.linux26x86/p4
  http://ftp.perforce....inux26x86_64/p4

  It is beneficial to upgrade out-of-date clients with the server
  to at least the same version as your server - otherwise commands
  will take longer to run using older clients against more recent
  server versions.

  The proxy release notes for 2014.2 also mention any functionality
  new to 2013.2 requires you to upgrade the client, the proxy and/or
  the server. If you have a test server, you may want to test out
  performance after upgrading your server version to the same
  version as the 2014.2 proxy.

Let us know if swapping your p4 client binary to the same 2014.2 proxy version (or p4p 2012.1) resolves the issue.

I am going to do some more research but there was a change included in the 2014.1 version of the proxy that may be affecting this behavior where a proxy sync hangs if rpc.deliver.duplex on the server was not set to true (1). Notice the mention in the proxy and server release notes below:

http://www.perforce....er/p4pnotes.txt
	Bugs fixed in 2013.2

	#671142 (Bug #53464) **
	 A change to the server's handling of proxied file delivery
	 requests may improve performance over long haul networks. See the
	 server release notes for details on this change.

http://www.perforce....er/relnotes.txt

	#671142 (Bug #53464) **
	 The network-related performance of a 'p4 sync -f', 'p4 sync -p',
	 or 'p4 print' command issued via a proxy over a long-haul network
	 is improved by performing fewer network duplex flush operations.
	 The new behavior, if problematic, may be disabled using the
	 undocumented tunable 'rpc.deliver.duplex=1'.

	 #771811 **
	 The undocumented tunable rpc.deliver.duplex now defaults to 1.
	 This means that the behavior described in change 671142 is now
	 opt-in, rather than opt-out; you must explicitly set
	 rpc.deliver.duplex=0 to enable the change 671142 behavior.

We cannot know for sure yet if you are experiencing the same issue but the resolution is to run the server command:

	p4 configure set rpc.deliver.duplex=1

on the server. If this does not fix the issue, consider setting the configurable back to retain performance:

	p4 configure set rpc.deliver.duplex=0

However, I think this setting may have been for 2013.2+ servers so it may not apply to you. Just in case I wanted to mention it.

#4 P4Shimada

P4Shimada

    Advanced Member

  • Members
  • PipPipPip
  • 831 posts

Posted 11 March 2015 - 07:44 PM

Hi Bruce,

Thanks for the clarification on the log snippet. I think we are on the same page. My summary is:

BEFORE
======
P4D/LINUX26X86_64/2012.1/876219 (2014/06/10)
[p4/2010.2/LINUX26X86/295040]
P4P/LINUX26X86_64/2012.1/549175 (2012/11/05)

- 'user-sync -q [email="...@438710'"]...@438710'[/email]
--- lapse 3364s
--- usage 26686+1668us 407856+641280io 0+0net 815144k 232pf
--- rpc msgs/size in+out 710827+1200813/192mb+665mb himarks 64835/64835 snd/rcv .017s/3323s


- 'user-sync -q [email="...@439788'"]...@439788'[/email]
--- lapse 417s
--- usage 25657+1818us 71216+884064io 0+0net 819164k 0pf
--- rpc msgs/size in+out 710750+1200576/192mb+664mb himarks 64835/64835 snd/rcv .024s/369s


[*] 26686+1668us = Time spent executing user instructions + Time spent in operating system code on behalf of processes. (Times in milliseconds)
[*] 407856+641280io = The number of times the file system had to read from the disk on behalf of processes plus (+) the number of times the file system had to write on behalf of processes. 0+0net = IPC (Inter Process Communication) messages received on behalf of processes + IPC message sent on behalf of processes.
[*] 815144k = The maximum number of kilobytes of physical memory that processes used simultaneously.
[*] 232pf = The number of page faults that were serviced by doing I/O



AFTER
=====
P4D/LINUX26X86_64/2012.1/876219 (2014/06/10)
[p4/2010.2/LINUX26X86/295040]
P4P/LINUX26X86_64/2014.2/962050 (2014/11/13)

- 'user-sync -q [email="...@438713'"]...@438713'
--- lapse 5345s
--- usage 30115+2301us 131648+962032io 0+0net 780416k 640pf
--- rpc msgs/size in+out 642888+1088582/172mb+602mb himarks 64835/64835 snd/rcv .864s/5300s

- 'user-sync -q [email="...@439786'"]...@439786'
--- lapse 759s
--- usage 23750+1774us 626472+759936io 0+0net 782300k 624pf
--- rpc msgs/size in+out 642939+1087920/172mb+599mb himarks 64835/64835 snd/rcv .212s/689s

One other solution you may want to check with your 2014.2 slower proxy performance is the proxy 'pdb.lbr'. Check whether processes are hung on the pdb.lbr as mentioned here under number 3 in the article 'Sync through Perforce Proxy hangs'

http://answers.perfo...rticles/KB/7685

#5 Bruce Mc

Bruce Mc

    Advanced Member

  • Members
  • PipPipPip
  • 84 posts
  • LocationSeattle Area

Posted 11 March 2015 - 08:53 PM

We're close. Events happened in this order
  • Update p4p to 2014.2

  • A couple of days go by

  • Investigate why builds are taking longer

  • See the longer sync time and associate them with the p4p update

  • Set p4p back to 2012.1

  • See sync times are back to expected values

  • We're busy and 3 - 4 days pass

  • I go to document this occurrence. For two different workspaces, capture log entries showing p4p-2014.2 in use, and capture log entries after p4p-2012.1 is back in use
With the log snippets I provided, the before case would be in when p4p-2014.2 is in use and the after case is when reverted back to p4p-2012.1

So something like this... (I've replaced the @ symbol with AT since it is being interpreted as an email address)
BEFORE
=====
P4D/LINUX26X86_64/2012.1/876219 (2014/06/10)
[p4/2010.2/LINUX26X86/295040]
P4P/LINUX26X86_64/2014.2/962050 (2014/11/13)

- 'user-sync -q ...AT438710' (workspace 1)
--- lapse 3364s
--- usage 26686+1668us 407856+641280io 0+0net 815144k 232pf
--- rpc msgs/size in+out 710827+1200813/192mb+665mb himarks 64835/64835 snd/rcv .017s/3323s

- 'user-sync -q ...AT438713' (workspace 2)
--- lapse 5345s
--- usage 30115+2301us 131648+962032io 0+0net 780416k 640pf
--- rpc msgs/size in+out 642888+1088582/172mb+602mb himarks 64835/64835 snd/rcv .864s/5300s


AFTER
======
P4D/LINUX26X86_64/2012.1/876219 (2014/06/10)
[p4/2010.2/LINUX26X86/295040]
P4P/LINUX26X86_64/2012.1/549175 (2012/11/05)

- 'user-sync -q ..AT439788' (workspace 1)
--- lapse 417s
--- usage 25657+1818us 71216+884064io 0+0net 819164k 0pf
--- rpc msgs/size in+out 710750+1200576/192mb+664mb himarks 64835/64835 snd/rcv .024s/369s

- 'user-sync -q ...AT439786' (workspace 2)
--- lapse 759s
--- usage 23750+1774us 626472+759936io 0+0net 782300k 624pf
--- rpc msgs/size in+out 642939+1087920/172mb+599mb himarks 64835/64835 snd/rcv .212s/689s



[*] 26686+1668us = Time spent executing user instructions + Time spent in operating system code on behalf of processes. (Times in milliseconds)
[*] 407856+641280io = The number of times the file system had to read from the disk on behalf of processes plus (+) the number of times the file system had to write on behalf of processes. 0+0net = IPC (Inter Process Communication) messages received on behalf of processes + IPC message sent on behalf of processes.
[*] 815144k = The maximum number of kilobytes of physical memory that processes used simultaneously.
[*] 232pf = The number of page faults that were serviced by doing I/O


On the undocumented configurable, rpc.deliver.duplex, which is about improving performance over long haul networks.
Is it relevant here?
  • The configurable is not present in p4d-2012.1 (not present in output from 'p4 configure show undoc')
  • The four systems in play are on the same subnet
    • 10.19.13.134  --  p4p server
    • 10.19.13.131  --  workspace 1
    • 10.19.13.124  --  workspace 2
    • 10.19.13.75  --  p4d server (not previously shared here)
On the issue of hanging issue of hanging on pdb.lbr, we are not seeing hangs. We are seeing slow sync times when using p4p-2014.2 with p4d-2012.1. The syncs do not appear to start and stop, they are just slower.

Bruce M

#6 P4Shimada

P4Shimada

    Advanced Member

  • Members
  • PipPipPip
  • 831 posts

Posted 11 March 2015 - 11:28 PM

Hi Bruce,

Thank you for the additional feedback of your steps and clarification regarding syncs and that there are no hangs.

If your server is not a 2013.2 or higher than the 'rpc.deliver.duplex' would not apply to you. So the next step would be to:
  • Swap in the 2014.2 or 2012.1 binary of p4 for those workspaces such as 'Jenkins-jsvm-9-1.2.1a-7890r' and 'Jenkins-jsvm-2-1.2.2b-0987r'

  • Capture log entries showing p4p-2014.2 in use with an updated p4.
This is a more simple and quick test especially on Linux.

#7 Bruce Mc

Bruce Mc

    Advanced Member

  • Members
  • PipPipPip
  • 84 posts
  • LocationSeattle Area

Posted 13 March 2015 - 11:57 PM

Hi Shimada,

With the p4p-2012.1 server in place, last evening I placed the current p4 client in place and let automation runs builds overnight so I could capture a baseline.

Today, in the late morning, I put the p4p-2014.2 server in place and captured a comparable sync. The two sync are on related branches with a very small divergence in term of workspace disk space footprint and file count.

The results of using a current client, p4p-2014.2, and a p4d-2012.1 appear be worse than when an older client was used.

I'm adding the log snippets and some screen captures taken of from a v-sphere web console showing info on the VM hosting the proxy server. For context in the screen captures, I put the 2012.1 proxy server back in service at ~4:00.

You can see the drop off in activity when the switch out occured and then work happening as interrupted syncs were restarted by clients. I'd like to draw your attention to difference in disk activity before and after the switch out back to 2012. The amount of writes compared to reads when using p4p-2014.1 does not make sense to me given the vast majority of versioned files were already cached and the network rate graph does not show a comparible amount of received data.

The log snippets, p4p-2012.1 first, the @ symbol was replaced by -AT- to prevent the code box from interpretting email addresses.

Regards,

Bruce M

p4p-2012.1
Perforce server info:
2015/03/13 04:22:09 pid 6168 completed 657s 28136+1638us 27224+566440io 0+0net 783536k 597pf
Perforce server info:
2015/03/13 04:11:11 pid 6168 p4user-AT-jsvm-9-1.2.2a-7410r 10.192.130.134/10.192.130.131 [p4/2014.2/LINUX26X86/978861] 'user-sync -q ...@440591'
--- lapse 657s
--- usage 28136+1638us 27224+566440io 0+0net 783536k 597pf
--- rpc msgs/size in+out 642932+1087517/172mb+597mb himarks 64835/195906 snd/rcv .121s/613s
--- db.have
--- pages in+out+cached 39257+74128+96
--- pages reordered internal+leaf 6+409
--- locks read/write 1/2455 rows get+pos+scan put+del 0+1+1 442958+0
--- total lock wait+held read/write 0ms+0ms/6772ms+3333ms
--- max lock wait+held read/write 0ms+0ms/6737ms+69ms
--- db.rev
--- pages in+out+cached 28006+0+96
--- locks read/write 1/0 rows get+pos+scan put+del 0+1+630763 0+0
--- total lock wait+held read/write 0ms+4292ms/0ms+0ms

p4p-2014.2
Perforce server info:
2015/03/13 15:56:56 pid 3749 completed 8361s 24740+1732us 487736+646296io 0+0net 738776k 6628pf
Perforce server info:
2015/03/13 13:37:34 pid 3749 p4user-AT-jsvm-2-1.2.3a-7410r 10.192.130.134/10.192.130.124 [p4/2014.2/LINUX26X86/978861] 'user-sync -q ...@440731'
--- lapse 8361s
--- usage 24740+1732us 487736+646296io 0+0net 738780k 6628pf
--- rpc msgs/size in+out 655285+1108855/175mb+608mb himarks 64835/195906 snd/rcv .012s/8311s
--- db.have
--- pages in+out+cached 63663+86666+96
--- pages reordered internal+leaf 28+1876
--- locks read/write 1/3844 rows get+pos+scan put+del 0+1+1 451763+0
--- total lock wait+held read/write 0ms+1ms/47ms+12406ms
--- max lock wait+held read/write 0ms+1ms/1ms+7748ms
--- db.rev
--- pages in+out+cached 16768+0+96
--- locks read/write 1/0 rows get+pos+scan put+del 0+1+452289 0+0
--- total lock wait+held read/write 0ms+2071ms/0ms+0ms
vmstats-01.jpg
vmstats-02.jpg
vmstats-03.jpg

#8 Bruce Mc

Bruce Mc

    Advanced Member

  • Members
  • PipPipPip
  • 84 posts
  • LocationSeattle Area

Posted 19 March 2015 - 05:45 AM

Has there been a chance to look at this?

#9 P4Shimada

P4Shimada

    Advanced Member

  • Members
  • PipPipPip
  • 831 posts

Posted 19 March 2015 - 05:51 PM

Hi Bruce,

Thank you for the log snippets using the updated 2014.2 p4 command binary along with your screen-shots. I have not combed through all your data yet. If you ever need someone urgently, you can email support@perforce.com or call the Support number and one of our Support staff will be happy to assist.

I forgot to ask, did you get a chance to check out your proxy 'pdb.lbr' to confirm if any processes are hung there?





Also tagged with one or more of these keywords: p4p, proxy, slow sync

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users