Title: | Analyze Multi-Threading Performance for 'data.table' Functions |
---|---|
Description: | Assists in finding the most suitable thread count for the various 'data.table' routines that support parallel processing. |
Authors: | Anirban Chetia [aut, cre] |
Maintainer: | Anirban Chetia <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.1 |
Built: | 2024-11-18 09:29:48 UTC |
Source: | https://github.com/anirban166/data.table.threads |
This function adds to the timing results (or the benchmarked data). It computes the recommended efficiency speedup line and the point which denotes the recommended thread count, both being based on the specified efficiency value.
addRecommendedEfficiency(benchmarkData, recommendedEfficiency = 0.5)
addRecommendedEfficiency(benchmarkData, recommendedEfficiency = 0.5)
benchmarkData |
A |
recommendedEfficiency |
A numeric value between 0 and 1 that defines the slope for the "Recommended" efficiency speedup line. (Default is 0.5) |
This function allows users to add a "Recommended" efficiency line to previously computed benchmark data (without needing to recompute the timings). The recommended speedup is based on the provided efficiency value, which adjusts the slope of the speedup curve and correspondingly helps in the computation of the closest point of measured speedup to the "Recommended" speedup curve.
The input data.table
with the recommended efficiency added to the plot data (attributes).
findOptimalThreadCount
for computing the benchmark data with measured and ideal speedup data.
# Finding the best performing thread count for each benchmarked data.table function # with a data size of 1000 rows and 10 columns: benchmarks <- data.table.threads::findOptimalThreadCount(1e3, 10) # Adding recommended efficiency to the plot data: addRecommendedEfficiency(benchmarks, recommendedEfficiency = 0.6)
# Finding the best performing thread count for each benchmarked data.table function # with a data size of 1000 rows and 10 columns: benchmarks <- data.table.threads::findOptimalThreadCount(1e3, 10) # Adding recommended efficiency to the plot data: addRecommendedEfficiency(benchmarks, recommendedEfficiency = 0.6)
data.table
functionsThis function finds the optimal thread count for running data.table
functions with maximum efficiency.
findOptimalThreadCount( rowCount, colCount = NULL, times = 10, verbose = FALSE, benchmarksList = NULL, customDT = NULL )
findOptimalThreadCount( rowCount, colCount = NULL, times = 10, verbose = FALSE, benchmarksList = NULL, customDT = NULL )
rowCount |
The number of rows in the |
colCount |
The number of columns in the |
times |
The number of times the benchmarks are to be run. |
verbose |
Option (logical) to enable or disable detailed message printing. |
benchmarksList |
A named list of custom benchmarking functions which when specified overrides the default benchmarks for each parallelizable |
customDT |
A user-specified |
Iteratively runs benchmarks with increasing thread counts and determines the optimal number of threads for each data.table
function.
A data.table
of class data_table_threads_benchmark
containing the optimal thread count for each data.table
function.
# Finding the best performing thread count for each benchmarked data.table function # with a data size of 1000 rows and 10 columns: (optimalThreads <- data.table.threads::findOptimalThreadCount(1e3, 10))
# Finding the best performing thread count for each benchmarked data.table function # with a data size of 1000 rows and 10 columns: (optimalThreads <- data.table.threads::findOptimalThreadCount(1e3, 10))
data.table
functionsFunction to make speedup plots for the benchmarked data.table
functions
## S3 method for class 'data_table_threads_benchmark' plot(x, ...)
## S3 method for class 'data_table_threads_benchmark' plot(x, ...)
x |
A |
... |
Additional arguments (not used in this function but included for consistency with the S3 generic |
Creates a comprehensive ggplot
showing the ideal, sub-optimal, and measured speedup trends for the data.table
functions benchmarked with varying thread counts.
A ggplot
object containing a speedup plot for each benchmarked data.table
function.
# Finding the best performing thread count for each benchmarked data.table function # with a data size of 1000 rows and 10 columns: benchmarkData <- data.table.threads::findOptimalThreadCount(1e3, 10) # Generating speedup plots based on the data collected above: plot(benchmarkData)
# Finding the best performing thread count for each benchmarked data.table function # with a data size of 1000 rows and 10 columns: benchmarkData <- data.table.threads::findOptimalThreadCount(1e3, 10) # Generating speedup plots based on the data collected above: plot(benchmarkData)
findOptimalThreadCount()
in an organized tableFunction to concisely display the results returned by findOptimalThreadCount()
in an organized table
## S3 method for class 'data_table_threads_benchmark' print(x, ...)
## S3 method for class 'data_table_threads_benchmark' print(x, ...)
x |
A |
... |
Additional arguments (not used in this function but included for consistency with the S3 generic |
Prints a table enlisting the best performing thread count along with the runtime (median value) for each benchmarked function.
NULL.
# Finding the best performing thread count for each benchmarked data.table function # with a data size of 1000 rows and 10 columns: (benchmarkData <- data.table.threads::findOptimalThreadCount(1e3, 10))
# Finding the best performing thread count for each benchmarked data.table function # with a data size of 1000 rows and 10 columns: (benchmarkData <- data.table.threads::findOptimalThreadCount(1e3, 10))
data.table
functions with varying thread countsFunction to run a set of predefined benchmarks for different data.table
functions with varying thread counts
runBenchmarks( rowCount, colCount, threadCount, times = 10, verbose = TRUE, benchmarksList = NULL, customDT = NULL )
runBenchmarks( rowCount, colCount, threadCount, times = 10, verbose = TRUE, benchmarksList = NULL, customDT = NULL )
rowCount |
The number of rows in the |
colCount |
The number of columns in the |
threadCount |
The total number of threads to use. |
times |
The number of times the benchmarks are to be run. |
verbose |
Option (logical) to enable or disable detailed message printing. |
benchmarksList |
A named list of custom benchmarking functions which when specified overrides the default benchmarks for each parallelizable |
customDT |
A user-specified |
Benchmarks various data.table
functions that are parallelizable (setorder
, GForce_sum
, subsetting
, frollmean
, fcoalesce
, between
, fifelse
, nafill
, and CJ
) with varying thread counts.
A data.table
containing benchmarked timings for each data.table
function with different thread counts.
data.table
functionFunction to set the thread count for a specific data.table
function
setThreadCount( benchmarkData, functionName, efficiencyFactor = 0.5, verbose = FALSE )
setThreadCount( benchmarkData, functionName, efficiencyFactor = 0.5, verbose = FALSE )
benchmarkData |
A |
functionName |
The name of the |
efficiencyFactor |
A numeric value between 0 and 1 indicating the desired efficiency level for thread count selection. 0 represents use of the optimal thread count (lowest median runtime) and 0.5 represents the recommended thread count. |
verbose |
Option (logical) to enable or disable detailed message printing. |
Sets the thread count to either the optimal (fastest median runtime) or recommended value (default) based on the chosen type argument for the specified data.table
function based on the results obtained from findOptimalThreadCount()
.
NULL.
# Finding the best performing thread count for each benchmarked data.table function # with a data size of 1000 rows and 10 columns: benchmarkData <- data.table.threads::findOptimalThreadCount(1e3, 10) # Setting the optimal thread count for the 'forder' function: setThreadCount(benchmarkData, "forder", efficiencyFactor = 1) # Can verify by checking benchmarkData and getDTthreads(): data.table::getDTthreads()
# Finding the best performing thread count for each benchmarked data.table function # with a data size of 1000 rows and 10 columns: benchmarkData <- data.table.threads::findOptimalThreadCount(1e3, 10) # Setting the optimal thread count for the 'forder' function: setThreadCount(benchmarkData, "forder", efficiencyFactor = 1) # Can verify by checking benchmarkData and getDTthreads(): data.table::getDTthreads()