11 months ago

In this video I use the native S3 bucket created in the earlier video at
https://rumble.com/v32fmic-brief-walk-through-over-ontap-native-and-multi-protocol-s3-services.html

The idea is to show how one could organize multi-location ingress and processing:
- Site A generates or ingresses data via S3 or NFS (in the case of multi-protocol ONTAP S3 buckets)
- We use SnapMirror S3 to replicate to another ONTAP system such as AWS FSxN in the cloud. For multi-protocol buckets we could use SnapMirror (not SnapMirror S3!), CloudSync, rclone or some other utility
- Site B can read data using NFS (multi-protocol) or S3 - whichever works better. It is recommended to use the same config (either multi-protocol, or "pure" S3) on both sides to avoid incompatibility issues.

The video shows an example with native S3 buckets on both sides.
On-the-fly conversion from Parquet file to Panda dataframes is meant to show sometimes data doesn't even need to be copied off S3 to local disk to be converted, which is convenient as the clients don't even need to mount NFS.

The video is a bit short for so many steps, but you can check ONTAP S3-related and analytics-related solutions documentation for more comprehensive descriptions of such workflows.

Loading comments...