I posted this on the GridPane Community forums too.
I've got a site that is a massive WooCommerce store, and after about 12 hours, clearing the Redis cache results in an error. The cache is flushed, and subsequent Redis cache flushes are fine. Unfortunately, the error causes panic.
I dug into it more and eventually posted it to the Github repository for the WordPress Redis Object Cache. Speaking to the developer Till Kruss, it seems as though using the depreciated function of clearing out only prefixed keys is the culprit.
WP_REDIS_SELECTIVE_FLUSH is the setting that is depreciated and will ultimately go through the current Redis database and look for keys that have the prefix set via WP_CACHE_KEY_SALT/WP_REDIS_PREFIX and flush them out using a slow LUA script.
By default, Redis usually has 16 databases created with a default configuration on most operating systems. The Redis Cache Plugin for WordPress, by default, uses database ID 0. This means if you have 10-100 WordPress sites on your server, they all use Database ID 0. You might be thinking, won't the Redis key's overlap? GridPane does set a WP_CACHE_KEY_SALT/WP_REDIS_PREFIX for each site, so this doesn't happen.
Overall this is fine, and there are no issues until you have a site with a massive amount of data in Redis and you go to flush its object cache. It might take up to 5-10 seconds for the operation to complete and lock up the database resulting in connecting errors within WordPress. This is a rare edge-case scenario.
How do you fix this? Stop using WP_CACHE_KEY_SALT/WP_REDIS_PREFIX and define a separate Redis database for each WordPress site. You can increase the Redis database count from 16 put to 100 or 500, or how many sites you have on your server. Then the Redis Object Cache WordPress plugin would need to send a single command to clear out the database, versus going in and clearing out all the keys with a specific prefix in database 0.
Unfortunately, you must provide a unique ID to each site on your GridPane server and keep track. This is possible since a prefix is already generated for each site using WP_CACHE_KEY_SALT/WP_REDIS_PREFIX. The code at GridPane needs to be refactored and expanded, which isn't an easy task but is possible.
Another solution is to find the sites that are storing large amounts of data, provide them with their own Redis database, and keep track of the Redis database ID and site mapping yourself.
Figured I'd post this here for those that might be getting errors clearing Redis. Here's the Github discussion with Till.
Purging Cache Results in Error · rhubarbgroup · Discussion #18
Description When running the "Purge Cache" operation, an error will occur. Expected Behavior There should be no error if Redis is unavailable during a purge of cache. Actual Behavior The ...