Retrofitting CloudFront + WAF onto an Existing Reverse-Proxy Origin: Practical Patterns

I migrated a set of production web services from a configuration where DNS pointed directly at the origin (Docker + Traefik on a VPS) to one where CloudFront + AWS WAF sit in front of the origin. This article summarises the patterns I used and the pitfalls I did not expect, in a general form.

The goal is to help anyone migrating a similar setup avoid the same mistakes.

Before

Browser ──► DNS ──► Origin IP (reverse proxy: Traefik on VPS)
                       ├── service-a (equivalent to cultural.jp)
                       ├── service-b (equivalent to api.cultural.jp)
                       └── service-c (equivalent to webcatplus.jp)

Each service is a Docker container.
Traefik routes by the Host header and terminates TLS with Let’s Encrypt (HTTP-01).
The CrowdSec bouncer plugin handles attack detection.

After

Browser ──► DNS ──► CloudFront ──► origin domain ──► Traefik
                       │             (origin.example.com)
                       └── WAF (OWASP / known bad inputs / IP reputation / rate limit)

Three key points:

Set up a separate subdomain for the origin (origin.<service>.example.com).
Inject a “shared secret” as a custom header between CloudFront and the origin.
Reuse the existing wildcard ACM certificate on CloudFront.

1. Why set up an origin-only subdomain

When this pattern is needed

This pattern tends to be necessary under one of the following conditions:

You are retrofitting CloudFront in front of a domain that currently receives direct traffic (migration).
The origin is a VPS / non-AWS host and only has its own TLS certificate.
The origin is an AWS ALB, but the ALB certificate is not an *.elb.amazonaws.com-style cert (e.g. a custom *.example.com cert).

For a greenfield build on S3, or on an ALB with a native DNS name whose cert is sufficient, you can point CloudFront directly at the origin without a separate subdomain.

Why a separate name is needed

If you set example.com as CloudFront’s Alternate Domain Name (CNAME), you cannot also specify example.com as the origin (DNS loop). The origin side has to be reachable under a different name.

When using an ALB as the origin, the certificate the ALB’s HTTPS listener returns (in most cases *.example.com) does not match the SNI that CloudFront sends (the ALB native DNS, dualstack.*.elb.amazonaws.com), and TLS validation fails. Since ACM cannot issue *.elb.amazonaws.com certificates, setting up an origin-only subdomain is the standard solution.

Recommended: add a new `origin.<service>.<domain>` name

example.com         ALIAS → CloudFront
origin.example.com  A     → Existing origin IP

CloudFront hits origin.example.com as its origin, and Traefik accepts traffic with a router matching that hostname.

Why this approach works well:

You can run both in parallel with the existing domain (you can verify behaviour before the DNS cutover).
A wildcard cert *.example.com is usable on the origin side too (either Let’s Encrypt or ACM).
If you later want to move the origin to a different server, you only swap the CNAME — CloudFront configuration stays untouched.

Approaches to avoid:

Hardcoding the origin IP as CloudFront’s origin → CloudFront cannot send SNI and TLS validation fails.
Using the ALB’s dualstack.*.elb.amazonaws.com as the origin name → the ELB cert and SNI do not line up and validation fails (the cert is usually issued for *.example.com).

Watch the subdomain depth

A wildcard cert *.example.com only covers one level.

origin.example.com → ✅ covered
origin.api.example.com → ❌ not covered (two levels)

So for an origin corresponding to api.example.com, keeping it to a single level as origin-api.example.com (instead of origin.api.example.com) lets the existing wildcard cert keep working.

In my case, I tried to cover the API domain with origin.api.example.com relying on the wildcard, hit a TLS error, and ended up switching to origin-api.example.com to be consistent (wasted time).

2. Origin protection: secret header

For origin protection in front of CloudFront, a custom header plus header validation at the reverse proxy is simple and effective.

On the CloudFront side

custom_header {
  name  = "X-Origin-Secret"
  value = var.origin_secret  # openssl rand -hex 32
}

On the Traefik side (just add a Headers match to the router rule)

labels:
  traefik.http.routers.svc-origin.rule: >-
    Host(`origin.example.com`) &&
    Headers(`X-Origin-Secret`, `${ORIGIN_SECRET}`)

Requests without a matching header get a 404 (the router itself does not fire). So even if an attacker identifies the origin IP and hits it directly, they only see an unrelated 404.

Nginx version

server {
    server_name origin.example.com;

    if ($http_x_origin_secret != "${ORIGIN_SECRET}") {
        return 404;
    }
    ...
}

If you want to harden further

On an ALB, you can restrict the EC2 SG to allow only the com.amazonaws.global.cloudfront.origin-facing prefix list (see below).
On a VPS you can also allow-list CloudFront’s IP ranges at the firewall, but the ranges are large and change frequently, so the secret header alone tends to be enough in practice.

3. Splitting the ALB ⇄ EC2 SG (pitfall)

This was the biggest incident in this migration.

The ALB and the EC2 instances behind it shared the same Security Group. When I tried to narrow the ALB inbound down to the CloudFront prefix list by revoking 0.0.0.0/0 for 80/443, the internal ALB → EC2 traffic was collateral damage and got cut off as well.

Recommended SG layout

[ALB SG]                    [EC2 SG]
inbound:                    inbound:
  443 from CF prefix list    80 from ALB SG     (← via self/other reference)
                             22 from admin IPs

Create the ALB SG and EC2 SG as separate groups. When they are shared, narrowing one narrows both, which is exactly what caused the incident.

Splitting an existing environment where the SG is shared (with zero downtime)

Create a new SG for EC2 (allow inbound 80 from the ALB SG).
Attach both SGs to the EC2 ENI.
Verify behaviour.
Detach the old SG from the ENI.
Reorganise the old (shared) SG into ALB-only rules.

As long as you go through “attach both → verify → detach one” in the middle, you can cut over without downtime.

4. Cache key design

Baseline policy

Item	Value	Reason
Path	✅	Obvious
Query string (all)	✅	Keep separate cache entries per API/SPARQL parameter
Cookie	❌	Hit rate drops dramatically
`Authorization`	❌	Same as above; split into a separate policy for authenticated traffic
`Accept-Encoding` (gzip/br)	✅ (CF automatic)	Separate cache per compression format
`Accept-Language`	❓	Not needed if i18n is in the URL as `/ja/`, `/en/`

If i18n is URL-path-based (/ja/foo, /en/foo), you can leave Accept-Language out of the cache key. That makes a big difference — including it fragments the cache per bot/browser and hurts efficiency.

Choosing TTLs

Default TTL = 24h
Max TTL = 7 days
Min TTL = 0 (respect the origin’s Cache-Control)

Even for an API where the origin often does not return Cache-Control, CloudFront will cache based on the Default TTL.

For low-update-frequency sites, 1–7 days causes little practical harm, and invalidation on deploy makes propagation instant.

An extra policy for static assets

Hash-bearing paths such as /_next/static/* or /img/* use a separate policy:

Min/Default/Max = 1 year
Permanent cache (invalidation unnecessary because the filename includes a hash)

Effect on SPARQL and search APIs

GET requests with query strings hit for 24h as long as the URL is byte-identical.

Heavy SPARQL DESCRIBE query: first 1.1 s → warm 50 ms (~20× speedup)
Faceted search API: first 480 ms → warm 45 ms (~10×)
The effect is largest for UIs that repeatedly issue the same query (infinite scroll, facet spamming).

Caveats:

POST requests are not cached (RFC-compliant behaviour per the HTTP spec).
SPARQL endpoints often switch between “short queries via GET, long queries via POST” on the client side, so the benefit grows if you can bias towards GET.

5. Always start WAF in COUNT mode

One of the reasons I brought in WAF was that “CrowdSec was producing frequent false bans”. To avoid making the same mistake, I also ran WAF with all rules in COUNT mode for the first week.

What COUNT mode means

Matching rules do not block, they only log. CloudWatch Metrics and Sampled Requests let you see “how many requests would have been blocked if the rule were enforcing”.

Rules I enabled

Rule	Purpose
`AWSManagedRulesCommonRuleSet`	XSS / SQLi / common attacks
`AWSManagedRulesKnownBadInputsRuleSet`	Known bad inputs (log4shell, etc.)
`AWSManagedRulesAmazonIpReputationList`	Malicious IPs (IP reputation)
Rate-based rule	1000 req/IP per 5 minutes (DoS mitigation)

Rules I skipped:

AWSManagedRulesBotControlRuleSet — separately priced and fairly prone to false positives, so I left it off at the start.

Switching modes in Terraform

Writing override_action so it can be toggled via a variable makes it a one-command job to move from the initial count to production none (block enabled).

variable "waf_rule_action" {
  type    = string
  default = "count"
  validation {
    condition     = contains(["count", "block"], var.waf_rule_action)
    error_message = "Must be 'count' or 'block'."
  }
}

locals {
  managed_override = var.waf_rule_action == "block" ? "none" : "count"
}

resource "aws_wafv2_web_acl" "cf" {
  rule {
    name = "AWSManagedRulesCommonRuleSet"

    override_action {
      dynamic "count" {
        for_each = local.managed_override == "count" ? [1] : []
        content {}
      }
      dynamic "none" {
        for_each = local.managed_override == "none" ? [1] : []
        content {}
      }
    }
    ...
  }
}

A week of COUNT operation is enough to identify rules that produce false positives, so you can flip only the safe rules to block.

Why I dropped CrowdSec in favour of WAF

CrowdSec is behaviour-analysis based, and scenarios like http-crawl-non_statics ban “many dynamic-resource requests in a short time”. That means:

Static site / low traffic → ◎ works well
Interactive web app + search API → △ normal users get banned

My stack is mostly the latter (faceted search, infinite scroll), which does not play well with CrowdSec. WAF Managed Rules judge on request patterns, so they mostly do not react to request volume itself, and the structure tends to produce fewer false positives.

6. A gradual cutover sequence

To cut over to production with no downtime, this order works well:

1. Add origin DNS (origin.example.com)
2. Add an "origin router" on the reverse proxy (running in parallel with the existing one)
3. Build CloudFront (do not touch DNS yet)
4. Verify via the CF dist-domain (xxxx.cloudfront.net) using curl --resolve or /etc/hosts
5. Verify in the browser using /etc/hosts (only your environment goes via CF)
6. DNS cutover (point the ALIAS at CloudFront)
7. Remove the old router (after the DNS TTL expires)

The important thing is that there is room to test between 5 and 6. The CloudFront distribution domain (xxxx.cloudfront.net) is issued as soon as the distribution is created, so you can verify production-equivalent behaviour with curl and /etc/hosts before switching real DNS.

# /etc/hosts
18.65.x.x   example.com
18.64.y.y   api.example.com

At this step, watch out for DNS resolver behaviour. In my environment, the local dig resolver was REFUSING cloudfront.net, so I had to pass dig @8.8.8.8 ... explicitly to get CloudFront IPs.

7. Wire invalidation into CI

You need one extra step to clear cache after a deploy.

Minimal IAM policy

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": "cloudfront:CreateInvalidation",
    "Resource": "arn:aws:cloudfront::<account>:distribution/*"
  }]
}

GitHub Actions step

- name: Invalidate CloudFront
  env:
    AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
    AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
    AWS_DEFAULT_REGION: us-east-1
  run: |
    aws cloudfront create-invalidation \
      --distribution-id ${{ secrets.CF_DIST_ID }} \
      --paths "/*"

For services where articles are updated via a CMS, you can do the equivalent from the CMS webhook → Lambda → invalidation.

8. Measured speed improvement

Measured from a Tokyo client against a Tokyo origin (VPS):

Path	TTFB
Direct to origin	~70 ms
CloudFront cold (first-time cache miss)	~125 ms (first request only)
CloudFront warm (cache hit)	~45 ms

Even Tokyo-to-Tokyo gets 35–60% faster on cache hit. Contributing factors:

The SSL handshake to the origin is reused at the edge.
TCP/TLS round trips are shorter.
Compression (brotli) happens at the edge.

The effect is even larger for overseas users (the trans-Pacific hop to the origin disappears).

Cold miss is genuinely slow, but with a 24h TTL, the distribution is “only the first request is slow; everything in the next 24h is fast”.

9. Monthly cost

Item	Rough cost
WAF Web ACL	$5/month
WAF Managed Rules (3)	$3/month
WAF request-based charges	$0.60/million requests
CloudFront data transfer (first 1 TB/month is free)	~$0–5/month
ACM certificate	Free
Route53 ALIAS	Free (Hosted Zone $0.50/month only)

All in, around $10–15/month. One WAF ACL can be shared across distributions for multiple domains, so it is surprisingly cheap.

10. Side benefits I did not expect

I was able to stop exposing Elasticsearch externally. A good chance to reduce the attack surface.
Reorganising Traefik labels in docker-compose.yml clarified each router’s responsibilities.
By managing things as IaC with Terraform, adding a new domain is a matter of adding one entry to the sites = { ... } map.
I got out from under CrowdSec’s operational cost (no more false-positive handling or allow-list maintenance).

11. Things that went wrong, lessons learned

Situation	Mistake	Lesson
ALB SG	Removing 0.0.0.0/0 from the shared SG also cut off internal traffic	Separate ALB SG and EC2 SG from the start
Origin name	Used `origin.api.example.com` (two levels), outside wildcard cert coverage	Keep it to one level, e.g. `origin-api.example.com`
Let’s Encrypt	HTTP-01 requires port 80; a later CF-only cutover can cause renewal to stall	Consider switching to DNS-01 alongside the migration
DNS resolver	Local DNS REFUSED `cloudfront.net`	Use `dig @8.8.8.8` explicitly
OGP rendering	The first entry of the API’s `description` array happened to be “none” / “unknown”	Filter empty values and join multiple values
CrowdSec	Frequent false bans against API-heavy users	WAF Managed Rules are a safer choice than behaviour-based filters

Summary

The pattern of retrofitting CloudFront + WAF in front of an existing origin comes down to:

Introduce a separate origin-only domain (runs in parallel, and makes later migrations easy).
Effectively hide the origin using a secret header.
Run WAF in COUNT mode for a week, then switch to block.
Separate ALB and EC2 SGs from the start.
Model sites as a for_each’d map in Terraform (easy to extend).
Automate invalidation from CI on deploy.

Following this, you can raise both security and speed at the same time, with no downtime.

In particular, “separating the ALB and EC2 SGs” is a point to get right at the initial design stage. I am sharing this article so that others can avoid the same incident.

Before#

After#

1. Why set up an origin-only subdomain#

When this pattern is needed#

Why a separate name is needed#

Recommended: add a new origin.<service>.<domain> name#

Watch the subdomain depth#

2. Origin protection: secret header#

On the CloudFront side#

On the Traefik side (just add a Headers match to the router rule)#

Nginx version#

If you want to harden further#

3. Splitting the ALB ⇄ EC2 SG (pitfall)#

Recommended SG layout#

Splitting an existing environment where the SG is shared (with zero downtime)#

4. Cache key design#

Baseline policy#

Choosing TTLs#

An extra policy for static assets#

Effect on SPARQL and search APIs#

5. Always start WAF in COUNT mode#

What COUNT mode means#

Rules I enabled#

Switching modes in Terraform#

Why I dropped CrowdSec in favour of WAF#

6. A gradual cutover sequence#

7. Wire invalidation into CI#

Minimal IAM policy#

GitHub Actions step#

8. Measured speed improvement#

9. Monthly cost#

10. Side benefits I did not expect#

11. Things that went wrong, lessons learned#

Summary#