Jump to content


Problem retrieving MetaData for a large number of files


  • Please log in to reply
5 replies to this topic

#1 DaveCh

DaveCh

    Advanced Member

  • Members
  • PipPipPip
  • 40 posts

Posted 06 November 2018 - 12:16 PM

I've written a C# tool where the user supplies a starting point in a depot and the code then uses Repository.GetFileMetaData() to do 'stuff' with every file in the tree starting from that point. Trouble is that with a large repository GetFileMetaData() is throwing an out-of-memory exception before it returns.

Is there any way to iterate over the metadata for files instead of retrieving it all at once?

If not I guess my only option is to get a list of directories and process one at a time.

#2 p4bill

p4bill

    Advanced Member

  • Members
  • PipPipPip
  • 160 posts

Posted 06 November 2018 - 03:47 PM

Are you using Repository.GetFileMetaData() with a large number of individual files here? Is using a wildcard at the top level not an option? Something like:

Repository.GetFileMetaData(null, new FileSpec(new DepotPath(@"//...")));


#3 DaveCh

DaveCh

    Advanced Member

  • Members
  • PipPipPip
  • 40 posts

Posted 06 November 2018 - 04:05 PM

Yes, using a wildcard. An example where I'm looking for specific attributes:

GetFileMetaDataCmdOptions Opts = new GetFileMetaDataCmdOptions(GetFileMetadataCmdFlags.None, null, "clientFile,headType,isMapped,headAction", -1, null, null, null);
IList<FileMetaData> files = myP4.Repository.GetFileMetaData(Opts, fileSpec);

where fileSpec contains a DepotPath of "//MyStream/MyRoot/..."

Memory usage hits about 1.5GB before throwing System.OutOfMemoryException at Perforce.P4.StrDictListIterator.NextEntry() in c:\tmp\79089881\P4.NET\r17.1\p4api.net\p4api.net\BridgeInterfaceClasses.cs:line 77

So it's running out of the normal heap space for a 32-bit C# program.

My other option will be to switch to 64-bit (which I really should do anyway). I was just wondering if there was some other way to iterate through all the files without grabbing the entire list in one go.

#4 p4bill

p4bill

    Advanced Member

  • Members
  • PipPipPip
  • 160 posts

Posted 06 November 2018 - 04:59 PM

No, there is no other way to iterate through the files. Switching to 64-bit, or doing what you also suggested earlier (getting directories) are likely your only options here.

P4API.NET is using P4API in the bridge, which is also used by P4 CLI. I'd be curious if you are able to get all of the data returned from a command line equivalent p4 -ztag fstat //MyStream/MyRoot/... If you do, there might be something that could be changed in P4API.NET to handle larger repositories better.

#5 Sambwise

Sambwise

    Advanced Member

  • Members
  • PipPipPip
  • 640 posts

Posted 06 November 2018 - 05:17 PM

If you're using the CLI (or the C++ API), the client library isn't buffering all the output; the server sends each individual fstat dict as a client call to OutputStat() (which is gonna be on the order of 1kB of data), and the client outputs it to the screen.  So it might take the command a while to run (the *server* generally has to buffer all those results since streaming directly from the db tables to the network would be bad -- this is why MaxResults limits exist) but there's no reason for it to consume a lot of memory on the client.

My guess is that P4API.NET's implementation of OutputStat() sticks all the output for a single command into a buffer to package it into a C# data structure, rather than handing off each individual dict and then freeing it...?  I'm not very familiar with C#'s garbage collection but can easily imagine that even if P4API.NET isn't explicitly trying to buffer everything, in a garbage-collected language it might be easy to inadvertently chew up a lot of memory you'd meant to free (or maybe the runtime just doesn't sweep often enough to keep you from running out of memory in this particular use case).  

In a C++ app you'd have to explicitly copy off the contents of the dict to keep them from being freed after the OutputStat() call returns (the CLI doesn't do any such thing which is why its memory footprint will be negligible), but in C# I imagine the copying would happen implicitly if you weren't careful about clearing your references.

#6 DaveCh

DaveCh

    Advanced Member

  • Members
  • PipPipPip
  • 40 posts

Posted 06 November 2018 - 06:45 PM

Yes, that "p4 fstat" and also "p4 files" are able to show the info for the 1.5 million files (don't ask).

I think I'll take the 64-bit route.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users