Skip to content

Commit d0d80d9

Browse files
Revert "napkin math for sizing Docker SJM"
1 parent d2f5318 commit d0d80d9

File tree

1 file changed

+9
-93
lines changed

1 file changed

+9
-93
lines changed

src/content/docs/synthetics/synthetic-monitoring/private-locations/job-manager-configuration.mdx

Lines changed: 9 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -1850,106 +1850,22 @@ To set permanent data storage on Kubernetes, the user has two options:
18501850
helm install ... --set synthetics.persistence.existingVolumeName=sjm-volume --set synthetics.persistence.storageClass=standard ...
18511851
```
18521852
1853-
## Sizing considerations for Docker, Kubernetes, and OpenShift [#kubernetes-sizing]
1854-
1855-
### Docker [#docker]
1856-
1857-
To ensure your private location runs efficiently, you must provision enough CPU resources on your Docker host to handle your monitoring workload. Many factors impact sizing, but you can quickly estimate your needs.
1858-
1859-
You'll need **1 CPU core for each simultaneous heavyweight monitor** (i.e., each scripted browser or scripted API test).
1860-
1861-
Below are two formulas to help you calculate the number of cores you need, whether you're diagnosing a current setup or planning for a future one.
1862-
1863-
#### Formula 1: For Diagnosing an Existing Location
1864-
1865-
If your current private location is struggling to keep up and you suspect jobs are queuing, use this formula to find out how many cores you actually need. It's based on the observable performance of your system.
1866-
1867-
**The equation:**
1868-
$$C_{req} = (J_{processed} + Q_{growth}) \times D_j$$
1869-
1870-
* $C_{req}$ = **Required CPU Cores**
1871-
* $J_{processed}$ = The rate of jobs being **processed** per minute.
1872-
* $Q_{growth}$ = The rate your `jobManagerHeavyweightJobs` queue is **growing** per minute.
1873-
* $D_j$ = The **average duration** of a job in minutes.
1874-
1875-
**Here's how it works:** This formula calculates your true job arrival rate by adding the jobs your system *is processing* to the jobs that are *piling up* in the queue. Multiplying this total load by the average job duration tells you exactly how many cores you need to clear all the work without queuing.
1876-
1877-
#### Formula 2: For Forecasting a New or Future Location
1878-
1879-
If you're setting up a new private location or planning to add more monitors, use this formula to forecast your needs ahead of time.
1880-
1881-
**The equation:**
1882-
$$C_{req} = N_m \times F_j \times D_j$$
1883-
1884-
* $C_{req}$ = **Required CPU Cores**
1885-
* $N_m$ = The total **number** of heavyweight monitors you plan to run.
1886-
* $F_j$ = The average **frequency** of the monitors in jobs per minute (e.g., a monitor running every 5 minutes has a frequency of 1/5 or 0.2).
1887-
* $D_j$ = The **average duration** of a job in minutes.
1888-
1889-
**Here's how it works:** This calculates your expected workload from first principles: how many monitors you have, how often they run, and how long they take.
1890-
1891-
#### Important sizing factors
1892-
1893-
When using these formulas, remember to account for these factors:
1894-
1895-
* **Job duration ($D_j$):** Your average should include jobs that **time out** (often ~3 minutes), as these hold a core for their entire duration.
1896-
* **Job failures and retries:** When a monitor fails, it's automatically retried. These retries are additional jobs that add to the total load. A monitor that consistently fails and retries **effectively multiplies its frequency**, significantly impacting throughput.
1897-
* **Scaling out:** In addition to adding more cores to a host (scaling up), you can deploy additional synthetics job managers with the same private location key to load balance jobs across multiple environments (scaling out).
1898-
1899-
#### NRQL queries for diagnosis
1900-
1901-
You can run these queries in the [query builder](/query-your-data/explore-query-data/get-started/introduction-querying-new-relic-data/) to get the inputs for the diagnostic formula. Make sure to set the time range to a long enough period to get a stable average.
1902-
1903-
**1. Find jobs processed per minute ($J\_{processed}$):**
1904-
This query counts the number of non-ping (heavyweight) jobs completed over the last day and shows the average rate per minute.
1905-
1906-
```nrql
1907-
FROM SyntheticCheck SELECT rate(uniqueCount(id), 1 minute) AS 'job rate per minute' WHERE location = 'YOUR_PRIVATE_LOCATION' AND type != 'SIMPLE' SINCE 1 day ago
1908-
```
1909-
1910-
**2. Find queue growth per minute ($Q\_{growth}$):**
1911-
This query calculates the average per-minute growth of the `jobManagerHeavyweightJobs` queue on a time series chart. A line above zero indicates the queue is growing, while a line below zero means it's shrinking.
1912-
1913-
```nrql
1914-
FROM SyntheticsPrivateLocationStatus SELECT derivative(jobManagerHeavyweightJobs, 1 minute) AS 'queue growth rate per minute' WHERE name = 'YOUR_PRIVATE_LOCATION' TIMESERIES SINCE 1 day ago
1915-
```
1916-
1917-
<Callout variant="tip">
1918-
Make sure to select the account where the private location exists. It's best to view this query as a time series because the derivative function can vary wildly. The goal is to get an estimate of the rate of queue growth per minute. Play with different time ranges to see what works best.
1919-
</Callout>
1920-
1921-
**3. Find average job duration in minutes ($D\_j$):**
1922-
This query finds the average execution duration of completed non-ping jobs and converts the result from milliseconds to minutes. Why use `executionDuration`? It represents the time the job took to execute on the host, which is what we want to measure.
1923-
1924-
```nrql
1925-
FROM SyntheticCheck SELECT average(executionDuration)/60e3 AS 'avg job duration (m)' WHERE location = 'YOUR_PRIVATE_LOCATION' AND type != 'SIMPLE' SINCE 1 day ago
1926-
```
1927-
1928-
**4. Find total number of heavyweight monitors ($N\_m$):**
1929-
This query finds the unique count of heavyweight monitors.
1930-
1931-
```nrql
1932-
FROM SyntheticCheck SELECT uniqueCount(monitorId) AS 'monitor count' WHERE location = 'YOUR_PRIVATE_LOCATION' AND type != 'SIMPLE' SINCE 1 day ago
1933-
```
1934-
1935-
**5. Find average heavyweight monitor frequency ($F\_j$):**
1936-
If the private location's `jobManagerHeavyweightJobs` queue is growing, it isn't accurate to calculate the average monitor frequency from existing results. This will need to be estimated from the list of monitors on the [Synthetic Monitors](https://one.newrelic.com/synthetics) page. Make sure to select the correct New Relic account and you may need to filter by `privateLocation`.
1853+
## Sizing considerations for OpenShift, Kubernetes and Docker [#kubernetes-sizing]
19371854
19381855
<Callout variant="tip">
1939-
Synthetic monitors may exist in multiple sub accounts. If you have more sub accounts than can be selected in the query builder, choose the accounts with the most monitors.
1856+
Docker specific sizing considerations will be available soon.
19401857
</Callout>
19411858
1942-
#### Note about ping monitors and the `pingJobs` queue
1943-
1944-
**Ping monitors are different.** They are lightweight jobs that do not consume a full CPU core each. Instead, they use a separate queue (`pingJobs`) and run on a pool of worker threads.
1859+
If you're working in larger environments, you may need to customize the job manager configuration to meet minimum requirements to execute synthetic monitors efficiently. Many factors can impact sizing requirements for a synthetics job manager deployment, including:
19451860
1946-
While they are less resource-intensive, a high volume of ping jobs, especially failing ones, can still cause performance issues. Keep these points in mind:
1861+
* If all runtimes are required based on expected usage
1862+
* The number of jobs per minute by monitor type (ping, simple or scripted browser, and scripted API)
1863+
* Job duration, including jobs that time out at around 3 minutes
1864+
* The number of job failures. For job failures, automatic retries are scheduled when a monitor starts to fail to provide built-in 3/3 retry logic. These additional jobs add to the throughput requirements of the synthetic job manager.
19471865
1948-
* **Resource model:** Ping jobs utilize worker threads, not dedicated CPU cores. The core-per-job calculation does not apply to them.
1949-
* **Timeout and retry:** A failing ping job can occupy a worker thread for up to **60 seconds**. It first attempts an HTTP HEAD request (30-second timeout). If that fails, it immediately retries with an HTTP GET request (another 30-second timeout).
1950-
* **Scaling:** Although the sizing formula is different, the same principles apply. To handle a large volume of ping jobs, you may need to scale up your host's resources or scale out by deploying more job managers to keep the `pingJobs` queue clear and prevent delays.
1866+
In addition to the sizing configuration settings listed below, additional synthetics job managers can be deployed with the same private location key to load balance jobs across multiple environments.
19511867
1952-
### Kubernetes and OpenShift [#k8s]
1868+
## Kubernetes and OpenShift [#k8s]
19531869
19541870
Each runtime used by the Kubernetes and OpenShift synthetic job manager can be sized independently by setting values in the [helm chart](https://github.com/newrelic/helm-charts/tree/master/charts/synthetics-job-manager).
19551871

0 commit comments

Comments
 (0)