Jump to content

Memory Usage for S3 and large files (> 1GB)


KT Walrus

Recommended Posts

I'm looking into allowing large files to be uploaded into S3 using the Amazon Storage Method. But, it looks to me like this Storage Method reads the entire file into memory first and "puts" its entire contents to the Amazon S3 endpoint.

Is this correct?

If so, am I going to have to implement my own Amazon Storage Method to override the setFile() method to save the filename instead of the contents() into the Storage Method class and later use a stream to read the file and a stream to write the file to S3?

I think I need to use AWS PHP SDK to write the stream to S3, right? Something like this (using the S3 stream wrapper):

$in = fopen($this->temporaryFilePath, 'r'); // setFile() method sets $this->temporaryFilePath
$s3 = fopen('s3://bucket/key', 'w');  // need also to pass in Stream Context for setting headers like 'Content-Type', etc.
fwrite($s3, fread($in) );
fclose($s3);
fclose($in);

Shouldn't IPS re-implement the Amazon Storage Method using the AWS PHP SDK to handle large files using streaming? I don't really see how I can use the current Amazon Storage Method in production if I allow my users to upload large files.

Or, should I go ahead and implement my own Amazon Storage Method that uses the AWS SDK? It looks like a simple task since the SDK abstracts away all the details when you use the S3 stream wrapper.

See the Amazon S3 Stream Wrapper documentation for more on uploading larger files to S3...

Link to comment
Share on other sites

4 hours ago, bfarber said:

Your understanding is correct - the entire file is read into memory in order to transfer it at present.

This is an area on our radar to improve in a future version, but nothing to report at this time I'm afraid.

I've almost finished my own Storage Method that streams to S3 (if the caller uses setFile() before save()). This was easy since I'm using the AWS PHP SDK (specifically the S3 Stream Wrapper).

I did change contents() to stream the file from S3 (by using an s3:// path to file_get_contents()). But this doesn't help much, I think, since the entire file ends up in memory anyway (i.e., $this->contents). Maybe this doesn't matter much since I'm using presigned URLs to do most, if not all, of the large file serving and not contents(). Would be nice if Storage Methods presented a streaming interface to its callers so the Storage Extensions can keep contents in a stream (e.g. php://temp stream buffer).

You could study the AWS S3 PHP SDK to see a better interface for streams in Storage Methods. Specifically, IPS could simply use the Aws\S3\StreamWrapper class (and the other GuzzleHttp classes) to make its own ips:// stream wrapper and use that stream wrapper and the other cacheing classes) in all Storage Method classes (altered to present an interface to the Storage Extensions that supports streaming all storage access).

Edit: Actually now that I think about it, you should have all Storage Methods simply register their own Stream Wrapper (eg, aws://, db://, minio://, fs://, etc) and then use only the standard built-in PHP streaming functions (like fopen, file_get_contents, fwrite, unlink, etc) in the Storage Extensions and other users of the Storage Methods.

 

 

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...