Setting up VPC endpoint for Amazon S3

By default, accessing S3 resources from any instance or Kubernetes pod within a VPC involves outbound traffic via NAT or IGW. Not only this is less efficient, it also incurs a service fee due to the traffic. The cost can be significant if traffic is huge.

Setup

To keep the traffic within VPC, an S3 accesspoint for the specific S3 resources and a VPC endpoint can be created by following the general instruction.

After that, double check the policies at two places.

1. Policy for IAM

Go to AWS ConsoleIAM User. It is supposed to be a service account that has following policy attached

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:ReplicateObject",
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject"
            ],
            "Resource": "*"
        }
    ]
}
As a security best practice, DO NOT grant Allow to all actions, i.e. "Action": ["*"].
2. Policy for the endpoint

Go to AWS ConsoleVPC Endpoints the endpointPolicy

{
    "Version": "2012-10-17",
    "Id": "Policy1637977229005",
    "Statement": [
        {
            "Sid": "Stmt1637977226759",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:*",
            "Resource": "*"
        }
    ]
}
This endpoint policy is less constrained than the policy for user, because want the policy for user to be more specific.

After all, run traceroute s3.ap-southeast-1.amazonaws.com from any of the instances within the VPC to verify.

  • Following output shows it is successful
$ traceroute s3.ap-southeast-1.amazonaws.com
traceroute to s3.ap-southeast-1.amazonaws.com (52.219.129.42), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  * * *
    ...
28  * * *
29  * * *
30  * * *
  • Following output shows it is still going thru NAT/IGW
$ traceroute s3.ap-southeast-1.amazonaws.com
traceroute to s3.ap-southeast-1.amazonaws.com (52.219.40.74), 30 hops max, 60 byte packets
 1  ec2-175-41-128-191.ap-southeast-1.compute.amazonaws.com (175.41.128.191)  42.290 ms ec2-175-41-128-195.ap-southeast-1.compute.amazonaws.com (175.41.128.195)  3.103 ms ec2-18-141-171-23.ap-southeast-1.compute.amazonaws.com (18.141.171.23)  36.088 ms
 2  100.65.32.176 (100.65.32.176)  1.149 ms 100.65.32.160 (100.65.32.160)  1.132 ms 100.65.33.144 (100.65.33.144)  4.671 ms
 3  100.66.16.248 (100.66.16.248)  3.400 ms 100.66.16.26 (100.66.16.26)  8.787 ms 100.66.16.244 (100.66.16.244)  4.695 ms
 4  100.66.19.122 (100.66.19.122)  3.511 ms 100.66.19.204 (100.66.19.204)  17.491 ms 100.66.18.104 (100.66.18.104)  18.193 ms
 5  100.66.3.241 (100.66.3.241)  15.377 ms 100.66.3.137 (100.66.3.137)  154.101 ms 100.66.3.61 (100.66.3.61)  19.897 ms
 6  100.66.0.135 (100.66.0.135)  10.907 ms 100.66.0.165 (100.66.0.165)  9.064 ms 100.66.0.201 (100.66.0.201)  11.826 ms
 7  100.65.2.41 (100.65.2.41)  3.090 ms 100.65.3.41 (100.65.3.41)  2.821 ms 100.65.2.41 (100.65.2.41)  2.648 ms
 8  s3-ap-southeast-1.amazonaws.com (52.219.40.74)  0.518 ms  0.569 ms  0.578 ms

Follow the troubleshooting steps if anything does not work.

Usage

Now we have both S3 access point and the VPC endpoint for S3 setup.

To access the S3 resource within VPC, instead of using s3://<bucket name> directly, use

s3://arn:aws:s3:ap-southeast-1:<account number>:accesspoint/<bucket name>

Some 3rd party library built for AWS S3 might not recognize the S3 access point URL and throw exceptions like

Caused by: java.lang.NullPointerException: null uri host.
        at java.util.Objects.requireNonNull(Objects.java:228)
        at org.apache.hadoop.fs.s3native.S3xLoginHelper.buildFSURI(S3xLoginHelper.java:71)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.setUri(S3AFileSystem.java:486)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:246)
        at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:123)
        ... 24 more

In that case, S3 access point alias could be used. It can be found under S3 BucketAccess Points. It looks something like

s3://<bucket name>-1ks47nsk5hyxi845kebsen1nyf11caps1b-s3alias