CDN cache hit ratio analysis
Content cached at the CDN reduces the latency experienced by website users, who do not need to wait for the request to make its way back to the Apache/dispatcher or AEM publish. With that in mind, it is worthwhile to optimize the CDN cache hit ratio to maximize the amount of content cacheable at the CDN.
Learn how to analyze the AEM as a Cloud Service provided CDN logs and gain insights such as cache hit ratio, and top URLs of MISS and PASS cache types, for optimization purposes.
The CDN logs are available in JSON format, which contains various fields including url
, cache
. For more information, see the CDN Log Format. The cache
field provides information about state of the cache and its possible values are HIT, MISS, or PASS. Let’s review the details of possible values.
Possible Value
For the purpose of this tutorial, the AEM WKND project is deployed to the AEM as a Cloud Service environment and a small performance test is triggered using Apache JMeter.
This tutorial is structured to take you through the following process:
- Downloading CDN logs via Cloud Manager
- Analyzing those CDN logs, which can be performed with two approaches: a locally installed dashboard or a remotely accessed Jupityer Notebook (for those who license Adobe Experience Platform)
- Optimizing CDN cache configuration
Download CDN logs
To download the CDN logs, follow these steps:
-
Log into Cloud Manager at my.cloudmanager.adobe.com and select your organization and program.
-
For a desired AEMCS environment, select Download Logs from the ellipsis menu.
img-md w-500 modal-image -
In the Download Logs dialog, select the Publish Service from the drop-down menu, then click the download icon next to the cdn row.
img-md w-500 modal-image
If the downloaded log file is from today the file extension is .log
otherwise for past log files the extension is .log.gz
.
Analyze downloaded CDN logs
To gain insights such as cache hit ratio, and top URLs of MISS and PASS cache types, analyze the downloaded CDN log file. These insights help to optimize the CDN cache configuration and enhance the site performance.
To analyze the CDN logs, this article presents two options: the Elasticsearch, Logstash, and Kibana (ELK) dashboard tooling and Jupyter Notebook. The ELK dashboard tooling can be installed locally onto your laptop, while the Jupityr Notebook tooling can be accessed remotely as part of Adobe Experience Platform without installing additional software, for those who have licensed Adobe Experience Platform.
Option 1: Using ELK dashboard tooling
The ELK stack is a set of tools that provide a scalable solution to search, analyze, and visualize the data. It consists of Elasticsearch, Logstash, and Kibana.
To identify the key details, let’s use the AEMCS-CDN-Log-Analysis-ELK-Tool dashboard tooling project. This project provides a Docker container of the ELK stack and a pre-configured Kibana dashboard to analyze the CDN logs.
-
Follow the steps from How to setup the ELK Docker container and make sure to import the CDN Cache Hit Ratio Kibana dashboard.
-
To identify the CDN cache hit ratio and top URLs, follow these steps:
-
Copy the downloaded CDN log file/s inside the environment-specific folder.
-
Open the CDN Cache Hit Ratio dashboard by clicking the top-left corner Navigation Menu > Analytics > Dashboard > CDN Cache Hit Ratio.
img-md w-500 modal-image -
Select the desired time range from the top-right corner.
img-md w-500 modal-image -
The CDN Cache Hit Ratio dashboard is self-explanatory.
-
The Total Request Analysis section displays the following details:
- Cache ratios by cache type
- Cache counts by cache type
img-md w-500 modal-image -
The Analysis by Request or Mime Types displays the following details:
- Cache ratios by cache type
- Cache counts by cache type
- Top MISS and PASS URLs
img-md w-500 modal-image
-
Filtering by environment name or program ID
To filter the ingested logs by environment name, follow the below steps:
-
In the CDN Cache Hit Ratio dashboard, click the Add Filter icon.
img-md w-500 modal-image -
In the Add filter modal, select the
aem_env_name.keyword
field from the drop-down menu, andis
operator and desired environment name for next field and finally click Add filter.img-md w-500 modal-image
Filtering by hostname
To filter the ingested logs by hostname, follow the below steps:
-
In the CDN Cache Hit Ratio dashboard, click the Add Filter icon.
img-md w-500 modal-image -
In the Add filter modal, select the
host.keyword
field from the drop-down menu, andis
operator and desired hostname for next field and finally click Add filter.img-md w-500 modal-image
Likewise add more filters to the dashboard based on the analysis requirements.
Option 2: Using Jupyter Notebook
For those who would rather not install software locally (i.e., the ELK dashboard tooling from the previous section), there is another option, but it requires a license to Adobe Experience Platform.
The Jupyter Notebook is an open-source web application that lets you create documents that contain code, text, and visualization. It is used for data transformation, visualization, and statistical modeling. It can be accessed remotely as part of Adobe Experience Platform.
Downloading the Interactive Python Notebook file
First, download the AEM-as-a-CloudService - CDN Logs Analysis - Jupyter Notebook file, which will help with the CDN logs analysis. This “Interactive Python Notebook” file is self-explanatory, however, the key highlights of each section are:
- Install additional libraries: installs the
termcolor
andtabulate
Python libraries. - Load CDN logs: loads the CDN log file using
log_file
variable value; make sure to update its value. It also transforms this CDN log into the Pandas DataFrame. - Perform analysis: the first code block is Display Analysis Result for Total, HTML, JS/CSS and Image Requests; it provides cache hit ratio percentage, bar, and pie charts.
The second code block is Top 5 MISS and PASS Request URLs for HTML, JS/CSS, and Image; it displays URLs and their counts in table format.
Running the Jupyter Notebook
Next, run the Jupyter Notebook in Adobe Experience Platform, by following these steps:
-
Login to the Adobe Experience Cloud, in the Home page > Quick access section > click the Experience Platform
img-md w-500 modal-image -
In the Adobe Experience Platform Home page > Data Science section >, click the Notebooks menu item. To start the Jupyter Notebooks environment, click the JupyterLab tab.
img-md w-500 modal-image -
In the JupyterLab menu, using the Upload Files icon, upload the downloaded CDN log file and
aemcs_cdn_logs_analysis.ipynb
file.img-md w-500 modal-image -
Open the
aemcs_cdn_logs_analysis.ipynb
file by double-clicking. -
In the Load CDN Log File section of the notebook, update the
log_file
value.img-md w-500 modal-image -
To run the selected cell and advance, click the Play icon.
img-md w-500 modal-image -
After running the Display Analysis Result for Total, HTML, JS/CSS, and Image Requests code cell, the output displays the cache hit ratio percentage, bar, and pie charts.
img-md w-500 modal-image -
After running the Top 5 MISS and PASS Request URLs for HTML, JS/CSS, and Image code cell, the output displays the Top 5 MISS and PASS Request URLs.
img-md w-500 modal-image
You can enhance the Jupyter Notebook to analyze the CDN logs based on your requirements.
Optimizing CDN cache configuration
After analyzing the CDN logs, you can optimize the CDN cache configuration to improve the site performance. The AEM best practice is to have a cache hit ratio of 90% or higher.
For more information, see Optimize CDN Cache Configuration.
The AEM WKND project has a reference CDN configuration, for more information, see CDN Configuration from the wknd.vhost
file.