Jump to content

Miles O'Neal

Member Since 28 Oct 2014
Offline Last Active Dec 05 2019 11:18 PM

Posts I've Made

In Topic: config: startup pull batch logic

05 December 2019 - 11:22 PM

Agreed. And if the pull threads ran independently, just scarfing the next appropriate files from the queue.
Oh, well.

In Topic: config: startup pull batch logic

03 December 2019 - 11:13 PM

After discussions with support and lots of tweaking and monitoring, it appears that the pull threads are not independent in terms of scheduling. They get set up with one or more files each to run, then they all run, then they sit idle until they're all free, at which point the next cycle starts. This comes watching them with and without batching. I currently have millions of files queued for transfer. All the threads run, then there's a seven second pause while a single p4d scans the entire table, then allocates a file to each thread, and then they all run. Lather, rinse, repeat. Even without the delay of rescanning the table (instead of just pulling them off the front of a queue), this seems like it would be suboptimal for a busy server. I expected each thread to run, then see if there was work to do, and run again. It appears the master thread does all the rdb.lbr scanning, and possibly passes out files to the worker threads.

I didn't get much insight into how batching works. Instead I was told, "[font="Segoe UI", "Segoe UI Web (West European)", "Segoe UI", -apple-system, BlinkMacSystemFont, Roboto, "Helvetica Neue", sans-serif]Batching can do good in some environments but can cost you more in the long run.  We don't recommend it for that reason." That's disappointing, and the docs should reflect that, IMO. All they say is, "[/font][font="open sans"]The default value of [/font]1[font="open sans"] is usually adequate. For high-latency configurations, a larger value might improve archive transfer speed for large numbers of small files. (Use of this option requires that both master and replica be at version 2015.2 or higher.)"
[font=lucida sans unicode,lucida grande,sans-serif]And apparently 130ms ping time is not high latency. While I know it could be worse, in my world, it is high.

So, the moral of the story is to run compression and as many single-file pull threads as you can get away with.[/font]

In Topic: config: startup pull batch logic

02 December 2019 - 11:55 PM

The behavior with large batch numbers was extremely bursty in terms of network I/O. It's much flatter without batch, and in the hours since I turned off batching, I'm seeing better overall throughput. Ugh.

In Topic: config: startup pull batch logic

02 December 2019 - 10:02 PM

Or does it skip the big file and try to keep filling up thread 1 before moving on to thread 2?
Either way, I'd hope each thread would get filled as quickly as possible, then transfer in parallel. I know that without batching the threads work in parallel just fine.
Running "top" with the batched config, it's rare to see more than one p4d running at a time, even now with over 5,000 files active, spanning 62MB, per "p4 pull -ls". Running "tail -f" on the log file seems to agree that this is the behavior.

In Topic: config: startup pull batch logic

02 December 2019 - 09:48 PM

Yes, I assume that if there are no more files to transfer in the queue, it goes ahead and transfers. I'm thinking more for large queues. So if I have 50,000 files containing a mix of files skewed toward the small, after loading 500 tiny files into thread 1, does a large file going into thread 3 cause thread 1 to start transferring, or does it resume filling up in hopes of getting to 1000 files?

And does it wait until it's full (or as full as it will get this time around) to start compressing, or is it compressing along the way?