By default, accessing S3 resources from any instance or Kubernetes pod within a VPC involves outbound traffic via NAT or IGW. Not only this is less efficient, it also incurs a service fee due to the traffic. The cost can be significant if traffic is huge.
Setup
To keep the traffic within VPC, an S3 accesspoint for the specific S3 resources and a VPC endpoint can be created by following the general instruction.
After that, double check the policies at two places.
1. Policy for IAM
Go to AWS Console → IAM → User. It is supposed to be a service account that has following policy attached
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:ReplicateObject",
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": "*"
}
]
}
As a security best practice, DO NOT grantAllow
to all actions, i.e."Action": ["*"]
.
2. Policy for the endpoint
Go to AWS Console → VPC → Endpoints → the endpoint → Policy
{
"Version": "2012-10-17",
"Id": "Policy1637977229005",
"Statement": [
{
"Sid": "Stmt1637977226759",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": "*"
}
]
}
This endpoint policy is less constrained than the policy for user, because want the policy for user to be more specific.
After all, run traceroute s3.ap-southeast-1.amazonaws.com
from any of the instances within the VPC to verify.
- Following output shows it is successful
$ traceroute s3.ap-southeast-1.amazonaws.com traceroute to s3.ap-southeast-1.amazonaws.com (52.219.129.42), 30 hops max, 60 byte packets 1 * * * 2 * * * 3 * * * ... 28 * * * 29 * * * 30 * * *
- Following output shows it is still going thru NAT/IGW
$ traceroute s3.ap-southeast-1.amazonaws.com traceroute to s3.ap-southeast-1.amazonaws.com (52.219.40.74), 30 hops max, 60 byte packets 1 ec2-175-41-128-191.ap-southeast-1.compute.amazonaws.com (175.41.128.191) 42.290 ms ec2-175-41-128-195.ap-southeast-1.compute.amazonaws.com (175.41.128.195) 3.103 ms ec2-18-141-171-23.ap-southeast-1.compute.amazonaws.com (18.141.171.23) 36.088 ms 2 100.65.32.176 (100.65.32.176) 1.149 ms 100.65.32.160 (100.65.32.160) 1.132 ms 100.65.33.144 (100.65.33.144) 4.671 ms 3 100.66.16.248 (100.66.16.248) 3.400 ms 100.66.16.26 (100.66.16.26) 8.787 ms 100.66.16.244 (100.66.16.244) 4.695 ms 4 100.66.19.122 (100.66.19.122) 3.511 ms 100.66.19.204 (100.66.19.204) 17.491 ms 100.66.18.104 (100.66.18.104) 18.193 ms 5 100.66.3.241 (100.66.3.241) 15.377 ms 100.66.3.137 (100.66.3.137) 154.101 ms 100.66.3.61 (100.66.3.61) 19.897 ms 6 100.66.0.135 (100.66.0.135) 10.907 ms 100.66.0.165 (100.66.0.165) 9.064 ms 100.66.0.201 (100.66.0.201) 11.826 ms 7 100.65.2.41 (100.65.2.41) 3.090 ms 100.65.3.41 (100.65.3.41) 2.821 ms 100.65.2.41 (100.65.2.41) 2.648 ms 8 s3-ap-southeast-1.amazonaws.com (52.219.40.74) 0.518 ms 0.569 ms 0.578 ms
Follow the troubleshooting steps if anything does not work.
Usage
Now we have both S3 access point and the VPC endpoint for S3 setup.
To access the S3 resource within VPC, instead of using s3://<bucket name>
directly, use
s3://arn:aws:s3:ap-southeast-1:<account number>:accesspoint/<bucket name>
Some 3rd party library built for AWS S3 might not recognize the S3 access point URL and throw exceptions like
Caused by: java.lang.NullPointerException: null uri host.
at java.util.Objects.requireNonNull(Objects.java:228)
at org.apache.hadoop.fs.s3native.S3xLoginHelper.buildFSURI(S3xLoginHelper.java:71)
at org.apache.hadoop.fs.s3a.S3AFileSystem.setUri(S3AFileSystem.java:486)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:246)
at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:123)
... 24 more
In that case, S3 access point alias could be used. It can be found under S3 Bucket → Access Points. It looks something like
s3://<bucket name>-1ks47nsk5hyxi845kebsen1nyf11caps1b-s3alias