Introducing Amazon S3 Transfer Manager in the AWS SDK for Java 2.x

Posted on

The global data ecosystem has grown faster over the past decade and it has now become a bit difficult to select a top-notch data technology. With over 32% of the global public cloud share, Amazon Web Services (AWS) is the leader in this space. It serves nearly 190 countries with scalability, durability and security. Since its inception, S3 storage has become an internal part of thousands of enterprises for data storage and management.

Amazon Simple Storage Service (S3) is a cloud storage solution provided by Amazon Web Services (AWS). With a key-based object storage architecture, Amazon S3 is well suited for storing massive amounts of structured and unstructured data. Unlike the operating systems we are all familiar with, Amazon S3 does not store files in a file system and instead it stores files as objects. Object storage allows users to upload files similar to other popular cloud storage products such as Dropbox and Google Drive.

Recommended Article: Azure vs AWS Which Works Best for Serverless Architecture

What is Amazon S3 Transfer Manager?

Transfer Manager is considered one of the important APIs of the AWS SDK (Amazon Web Service Software Development Kit). It offers easy and convenient management of uploads and downloads between your application and Amazon S3. It hides the complex file transfer process behind a simple API. Transfer Manager performs two operations, namely uploading and downloading. From here you can upload and download objects to interact with your data transfers.

Whenever possible, Transfer Manager tries to use a few threads to download multiple parts of a single download at once. When dealing with large data sets, this can lead to a significant increase in productivity. Transfer Manager sits on top of the AWS Common Runtime S3 client Java bindings.

Parallel download via multipart download

Multipart Upload offers you to upload a single object in small parts. You can download object parts independently in any order, and once all parts are downloaded, Amazon S3 presents the data as a single object. For example, when your object size reaches 100MB, you should use multiple parts instead of a single operation, as this allows you to create parallel downloads.

Transfer Manager uses the Amazon S3 Multipart Download API for the download operation; it converts a single PutObjectRequest into multiple MultiPartUpload requests and then sends these requests simultaneously to achieve more durability and high performance.

Parallel download via byte range fetches

Transfer Manager uses byte range fetches for download operations. By using the HTTP Range header in a GET object request, you can extract a range of bytes from an object to transfer only the desired part. For example, it splits a GetObjectRequest into several smaller requests, each retrieving a specific part of the object. This helps you achieve high performance compared to a single integer object query. Fetching a smaller portion of a large object also allows your application to improve retry times when requests are interrupted.

If you download an object as a single object while working with the 1.x transfer manager, the transfer manager will not be able to increase the download speed. To increase download speed in Transfer Manager 1.x, an object must be downloaded using multipart download. This is no longer a limitation in the 2.x transfer manager. With the 2.x transfer manager, downloading an object does not depend on how the object was originally downloaded.

Begin

  • Add a dependency for the transfer manager

First, include the separate dependency in the project.

XML

software.amazon.awssdk

s3-transfer-manager

2.17.123-OVERVIEW

  • Instantiate the transfer handler

You can easily instantiate the transfer handler using default parameters

Java:

S3TransferManager transferManager = S3TransferManager.create();

  • Upload a file to Amazon S3

To upload a file to Amazon S3, you must provide a file path with PutObjectRequest which should be used for the upload.

Java:

FileUpload upload = transferManager.uploadFile(b -> b.source(Paths.get(“myFile.txt”))

.putObjectRequest(req -> req.bucket(“bucket”)

.key(“key”)));

upload.completionFuture().join();

  • Download an Amazon S3 object to a file

To upload an object to Amazon S3, you must provide the destination file path along with the GetObjectRequest that should be used for the upload.

Java:

File download =

transferManager.downloadFile(b -> b.destination(Paths.get(“myFile.txt”))

.getObjectRequest(req -> req.bucket(“bucket”)

.key(“key”)));

download.completionFuture().join();

Conclusion:

Customers of all sizes and in all industries can use Amazon S3 to store and protect any amount of data for a range of use cases, such as data analytics, data lakes, backup, catering and much more. The 2.x transfer manager is better than the 1.x transfer manager in many ways. You can verify developer’s guide and the source code on Github of Transfer Manager for the AWS SDK for Java 2.x for complete documentation.


Leave a Reply

Your email address will not be published.