By default, accessing S3 resources from any instance or Kubernetes pod within a VPC involves outbound traffic via NAT or IGW. Not only this is less efficient, it also incurs a service fee due to the traffic. The cost can be significant if traffic is huge.
Setup
To keep the traffic within VPC, an S3 accesspoint for the specific S3 resources and a VPC endpoint can be created by following the general instruction.
After that, double check the policies at two places.
1. Policy for IAM
Go to AWS Console → IAM → User. It is supposed to be a service account that has following policy attached
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:ReplicateObject",
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": "*"
}
]
}
As a security best practice, DO NOT grantAllowto all actions, i.e."Action": ["*"].
2. Policy for the endpoint
Go to AWS Console → VPC → Endpoints → the endpoint → Policy
{
"Version": "2012-10-17",
"Id": "Policy1637977229005",
"Statement": [
{
"Sid": "Stmt1637977226759",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": "*"
}
]
}
This endpoint policy is less constrained than the policy for user, because want the policy for user to be more specific.
After all, run traceroute s3.ap-southeast-1.amazonaws.com from any of the instances within the VPC to verify.
- Following output shows it is successful
$ traceroute s3.ap-southeast-1.amazonaws.com
traceroute to s3.ap-southeast-1.amazonaws.com (52.219.129.42), 30 hops max, 60 byte packets
1 * * *
2 * * *
3 * * *
...
28 * * *
29 * * *
30 * * *
- Following output shows it is still going thru NAT/IGW
$ traceroute s3.ap-southeast-1.amazonaws.com traceroute to s3.ap-southeast-1.amazonaws.com (52.219.40.74), 30 hops max, 60 byte packets 1 ec2-175-41-128-191.ap-southeast-1.compute.amazonaws.com (175.41.128.191) 42.290 ms ec2-175-41-128-195.ap-southeast-1.compute.amazonaws.com (175.41.128.195) 3.103 ms ec2-18-141-171-23.ap-southeast-1.compute.amazonaws.com (18.141.171.23) 36.088 ms 2 100.65.32.176 (100.65.32.176) 1.149 ms 100.65.32.160 (100.65.32.160) 1.132 ms 100.65.33.144 (100.65.33.144) 4.671 ms 3 100.66.16.248 (100.66.16.248) 3.400 ms 100.66.16.26 (100.66.16.26) 8.787 ms 100.66.16.244 (100.66.16.244) 4.695 ms 4 100.66.19.122 (100.66.19.122) 3.511 ms 100.66.19.204 (100.66.19.204) 17.491 ms 100.66.18.104 (100.66.18.104) 18.193 ms 5 100.66.3.241 (100.66.3.241) 15.377 ms 100.66.3.137 (100.66.3.137) 154.101 ms 100.66.3.61 (100.66.3.61) 19.897 ms 6 100.66.0.135 (100.66.0.135) 10.907 ms 100.66.0.165 (100.66.0.165) 9.064 ms 100.66.0.201 (100.66.0.201) 11.826 ms 7 100.65.2.41 (100.65.2.41) 3.090 ms 100.65.3.41 (100.65.3.41) 2.821 ms 100.65.2.41 (100.65.2.41) 2.648 ms 8 s3-ap-southeast-1.amazonaws.com (52.219.40.74) 0.518 ms 0.569 ms 0.578 ms
Follow the troubleshooting steps if anything does not work.
Usage
Now we have both S3 access point and the VPC endpoint for S3 setup.
To access the S3 resource within VPC, instead of using s3://<bucket name> directly, use
s3://arn:aws:s3:ap-southeast-1:<account number>:accesspoint/<bucket name>
Some 3rd party library built for AWS S3 might not recognize the S3 access point URL and throw exceptions like
Caused by: java.lang.NullPointerException: null uri host.
at java.util.Objects.requireNonNull(Objects.java:228)
at org.apache.hadoop.fs.s3native.S3xLoginHelper.buildFSURI(S3xLoginHelper.java:71)
at org.apache.hadoop.fs.s3a.S3AFileSystem.setUri(S3AFileSystem.java:486)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:246)
at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:123)
... 24 more
In that case, S3 access point alias could be used. It can be found under S3 Bucket → Access Points. It looks something like
s3://<bucket name>-1ks47nsk5hyxi845kebsen1nyf11caps1b-s3alias












本来都到了春暖花开小鸟生蛋的节气了,谁知竟又下了一场小雪。雪片吧嗒吧嗒地打在车前,挡住了我回家的视线。许久没有开始的自行车上班计划,也随着这漫长的寒冷延了期。这个多事的季节,连个好消息也没有么?刚才从卧室走来客厅的征途中,一脚踢在了尖尖的 Futon 角上,咧着嘴的同时不由地发出类似韩国帅哥 Rain 在演唱会上那种很酷的“嘶”的声音。愚人节快到了,大家行事要小心。