Wednesday 16th August 2023

Handling surges in traffic to a WooCommerce site with NGINX

I’ve recently worked on a few WooCommerce projects where we’ve needed to handle large surges of traffic from social media/email campaigns on sites we host for our customers.

Alongside this for one customer, the problem was amplified by the fact that Microsoft’s “SafeLinks” functionality in their Office 365 service scans every URL they see in inbound email. Our client’s emails have individual tracking code on every link to a recipient. So if our client sends an email with 4 links in it, Office 365 will send 4 hits to the site during their processing of the email – before the recipient even clicks a link. Our client works B2B, with a lot of large corporate customers – most of their recipients are using an Office 365 based solution, and they send tens of thousands of emails at a time. Even when spreading the campaign out over an hour or so, this still results inn very large peaks of traffic to the site – several thousands requests per minute.

Removing these tracking links would solve the problem, but they are actively used by the client’s marketing team – so that wasn’t an option.

Caching pages with URL parameters in NGINX

Looking around I noticed quite a lot of solutions documented online, most of them simply redirected users away from the parameterised URL to a clean URL. This means the tracking information being passed in the link will often be lost, no good for my clients who rely the analytics data being available when the page is being viewed so it can be read by javascript code from their analytics vendor.

The others simply overrode rules telling Nginx to skip caching for urls with a query string. This means each unique URL would be cached separately; no good for us as every link in the emails is unique.

So, what we need to do is get Nginx to ignore the tracking parameters in the query string and serve up a cached page to the majority of users. The solution is to rewrite the fastcgi_cache_key directive with the clean URL without any query string. This means we’ll serve up the same cached page for all visitors regardless of the tracking parameters – but as we aren’t redirecting the parameters are still there when the JavaScript that processes them runs.

Some of the examples matched any parameter containing utm_ however if you aren’t careful you could cause issues with less used parameters such as utm_nooverride (often used with third party payment processors with offsite payment pages).

The solution I implemented is split across various nginx configuration files – firstly, within the site’s HTTP block:

# Get the URL without a query string
# This is used to rewrite the cache key for requests where we want to strip parameters
map $request_uri $request_path {
~(?<captured_path>[^?]*) $captured_path;
}

then within the server block:

# Set a default cache key, which is the full URL including parameters.
set $cache_key_value "$scheme$request_method$host$request_uri";

# Cache pages by default.
set $skip_cache 0;

# POST requests should always go to PHP
if ($request_method = POST) {
set $skip_cache 1;
}

# By default, if there's a query string we should send the request to PHP
if ($query_string != "") {
set $skip_cache 1;
}

# But, if the request has GA parameters, or is from GB/Google ads then we want to server the page
# from cache, and modify the cache key value so it uses the page without parameters
if ($query_string ~* "utm_source|utm_medium|utm_campaign|utm_term|utm_content|gclid|fbclid|gclsrc|dclid|dmsi") {
set $cache_key_value "$scheme$request_method$host$request_path";
set $skip_cache 0;
}

# Set the fastcgi_cache_key to the value of our variable.
fastcgi_cache_key "$cache_key_value";

# Don't cache URIs containing the following segments.
if ($request_uri ~* "/wp-admin/|/wp-json/|/xmlrpc.php|wp-.*.php|/feed/|index.php|sitemap(_index)?.xml|/cart/|/checkout/|/my-account/") {
set $skip_cache 1;
}

# Don't use the cache for logged in users, on password protected pages or for recent commenters.
if ($http_cookie ~* "comment_author|wordpress_[a-f0-9]+|wp-postpass|wordpress_no_cache|wordpress_logged_in|woocommerce_items_in_cart") {
set $skip_cache 1;
}

This method removes the entire query string from the URL when modifying the cache key, so it’s no good if you have traffic being directed to pages with additional parameters being passed by a query string. However, it would be possible to modify the map directive to handle this, but it wasn’t required for this site.

We design and develop all kinds of great stuff using WordPress, WooCommerce & LearnDash.

If you are looking for a new WordPress website, WooCommerce store or LearnDash LMS or want to improve an existing one, give us a call on 0114 303 8181 or click the button below to get the ball rolling.