The Normality Transformation via Optimized Skewness and Kurtosis (OKST) is a normality method that simultaneously evaluates deviations in skewness and kurtosis of non-normal data.
The recent version of the package osktnorm from CRAN is
installed with the following command:
If the package osktnorm has already been installed, load
it into R working environment by using the following command:
In the following code snippet, a right-skewed distribution with 300
observations is generated using rlnorm of R and then
normalized using OSKT. For this purpose, the okstfast
function is applied by simply passing the original observation vector to
be transformed, and the results are stored in the object
res_okst.
In this result object res_okst, the transformed values
are contained in the transformed vector. The element
g contains the skewness parameter used in the
transformation, which is obtained via optimization L-BFGS, while
h contains the kurtosis parameter. In the output,
value indicates the minimum A2 value reached
during optimization, i.e., the Anderson–Darling statistic at the
optimum..
set.seed(12)
x_orig <- rlnorm(300, mean=0, sd=0.5) # Generate right-skewed data
# Apply OSKT normality
res_oskt <- osktfast(x_orig)
x_transformed <- res_oskt$transformed
head(x_transformed, 5)
[1] -1.802405 1.399280 -1.250515 -1.210836 -2.307431
g_star <- res_oskt$g
h_star <- res_oskt$h
A2 <- res_oskt$value
cat("Optimized skewness: ", g_star, "\n")
Optimized skewness: -0.5909243
cat("Optimized kurtosis: ", h_star, "\n")
Optimized kurtosis: 0.07987881
cat("Anderson-Darling statistic at the optimum: ", A2, "\n")
Anderson-Darling statistic at the optimum: 0.1056021The code snippet below visualizes the original and normalized observations using histograms and density plots for comparison purposes. To ensure a fair comparison, 25 common breaks are set for both histograms. Upon examining the plots, it is evident that the data transformed via OSKT (red) approximates the normal distribution (black dashed line).
breaks <- pretty(range(c(x_orig, x_transformed)), n = 25)
h_orig <- hist(x_orig, breaks = breaks, plot = FALSE)
h_trans <- hist(x_transformed, breaks = breaks, plot = FALSE)
d_orig <- density(x_orig); d_trans <- density(x_transformed)
ymax <- max(c(h_orig$density, h_trans$density, d_orig$y,d_trans$y, dnorm(0)))
hist(x_orig, breaks = breaks, freq = FALSE, ylim = c(0, ymax * 1.05),
col = rgb(0.2, 0.4, 0.8, 0.4), border = "white",
main = "Before and After OSKT Transformation", xlab = "Value")
lines(d_orig, col = "blue", lwd = 2)
hist(x_transformed, breaks = breaks, freq = FALSE,
col = rgb(0.8, 0.3, 0.3, 0.4), border = "white", add = TRUE)
lines(d_trans, col = "red", lwd = 2)
curve(dnorm(x), add = TRUE, lwd = 2, lty = 2, col = "black") # Standard normal reference
legend("topleft",
legend = c("Original", "Transformed", "Original Density", "OSKT Density", "Standard Normal"),
col = c(rgb(0.2,0.4,0.8,0.6), rgb(0.8,0.3,0.3,0.6), "blue", "red", "black"),
lwd = c(10, 10, 2, 2, 2), lty = c(1, 1, 1, 1, 2), bty = "n")Back-transformation (transforming the transformed values back to the
original scale) or reverse transformation can be performed in the
package using the backost and backostfast
functions. The former is slower because it recovers values using R’s
uniroot function, whereas backostfast uses the
Brent–Dekker algorithm. Since this function is implemented in C++, it
runs faster; however, there may be negligible differences in precision
(on the order of > 1e-4).
In the example below, assigning “brent” to the method argument specifies the use of the Brent–Dekker algorithm. Another option is “nr” for the Newton–Raphson method. This method is faster than the former but may suffer from precision issues. For this reason, using “auto” as the method in backostfast is recommended; this is also the default behavior when method is not explicitly specified. In the automatic method selection, Newton–Raphson is attempted first, and if it fails, the Brent option is tried. In this way, speed is prioritized first, and accuracy is prioritized if the faster method fails.
In the example below, make sure that the g argument of
backostfast is assigned the g_star returned in
the example above, and the h argument is assigned
h_star.
X_mean <- mean(x_orig)
X_sd <- sd(x_orig)
res_back <- backosktfast(
Z = x_transformed,
X_mean = X_mean, X_sd = X_sd,
g = g_star, h = h_star,
method = "brent")
x_recovered <- res_back$X_orig
head(x_recovered, 5)
[1] 0.4759235 2.2021046 0.6189750 0.6304848 0.3670768Overlaying histograms can be useful for diagnosing recovered values and assessing accuracy. In the plot, it can be seen that the histogram of the recovered values (green) almost perfectly overlaps with the histogram of the original values (blue). Below, the results of a comparison with the original values are also provided, using a tolerance of 1e-6 (which can be adjusted as desired).
breaks <- pretty(range(c(x_orig, x_transformed, x_recovered)), n = 30)
hist(x_orig, breaks = breaks, freq = FALSE, col = rgb(0.2, 0.4, 0.9, 0.4),
border = "white", main="OSKT Transformation & Back Transformation", xlab="Value")
hist(x_transformed, breaks = breaks, freq = FALSE, col = rgb(0.8, 0.3, 0.3, 0.4),
border = "white", add=TRUE)
hist(x_recovered, breaks = breaks, freq = FALSE, col = rgb(0.2,0.8,0.2,0.4),
border = "white", add=TRUE)
legend("topleft", legend = c("Original","Transformed","Back-transformed"),
fill = c(rgb(0.2,0.4,0.8,0.4), rgb(0.8,0.3,0.3,0.4), rgb(0.2,0.8,0.2,0.4)))In the code snippet below, the original values and the recovered values obtained via back-transformation are compared using various criteria. The difference between the original and recovered values is treated as the back-transformation error, and error-based performance metrics such as Mean Absolute Error (MAE), Maximum Absolute Error (MAXE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) can be computed. In addition, for diagnostic analysis, the correlation coefficient between the original and recovered values, as well as the regression coefficient and goodness of fit measured by R-squared, can also be examined.
ok <- is.finite(x_orig) & is.finite(x_recovered) # Remove any non-finite values
xo <- x_orig[ok]
xr <- x_recovered[ok]
err <- xr - xo #Error
# Performance metrics
MAE <- mean(abs(err))
MAXE <- max(abs(err))
MSE <- mean(err^2)
RMSE <- sqrt(MSE)
# Correlation and R^2
COR <- cor(xo, xr)
R2 <- COR^2
# Linear fit (recovered ~ original)
fit <- lm(xr ~ xo)
slope <- coef(fit)[2]
intercept <- coef(fit)[1]
# Summary table
back_stats <- data.frame(
MSE = MSE, RMSE = RMSE, MAE = MAE, MaxError = MAXE,
Correlation= COR, R2 = R2, Slope = slope, Intercept = intercept)
round(t(back_stats), 8)
xo
MSE 0.00000080
RMSE 0.00089459
MAE 0.00069680
MaxError 0.00300090
Correlation 1.00000000
R2 1.00000000
Slope 1.00167084
Intercept -0.00185199
# Regression line for diagnostic
plot(xo, xr,
pch = 16, cex = 0.6, col = rgb(0.2, 0.2, 0.7, 0.4),
xlab = "Original values", ylab = "Back-transformed values",
main = "OSKT Back-transformation Accuracy")
abline(0, 1, col = "red", lwd = 2, lty = 3) # 1:1 reference line
abline(fit, col = "darkgreen", lwd = 3) # Fitted regression line
legend("topleft",
legend = c("y = x (ideal)", "Fitted line"),
col = c("red", "darkgreen"),
lwd = 2, lty = c(2,1), bty = "n")Below, a left-skewed variable with 300 observations is generated
using the ghdist function from the
groupcompare package. This data is then transformed using
the Box-Cox (BC), Yeo-Johnson (YJ), and OSKT methods. The transformed
values are subsequently tested for normality using the Shapiro–Wilk (SW)
normality test, as well as the Cramer-von Misses (CVM), and Wang-Zu ZA
tests available in the osktnorm package. In addition, the
Pearson P statistic (PPM) is computed and used as a metric that
evaluates the transformed values across the entire distribution, from
the left tail to the right tail.
# Generate left-skewed data
set.seed(12)
x_orig <- groupcompare::ghdist(n=300, A=0, B=1, g=-0.49, h=0)
x_transformed <- osktfast(x_orig)$transformed
breaks <- pretty(range(c(x_orig, x_transformed)), n = 25)
h_orig <- hist(x_orig, breaks = breaks, plot = FALSE)
h_trans <- hist(x_transformed, breaks = breaks, plot = FALSE)
d_orig <- density(x_orig); d_trans <- density(x_transformed)
ymax <- max(c(h_orig$density, h_trans$density, d_orig$y,d_trans$y, dnorm(0)))
hist(x_orig, breaks = breaks, freq = FALSE, ylim = c(0, ymax * 1.05),
col = rgb(0.2, 0.4, 0.8, 0.4), border = "white",
main = "Before and After OSKT Transformation", xlab = "Value")
lines(d_orig, col = "blue", lwd = 2)
hist(x_transformed, breaks = breaks, freq = FALSE,
col = rgb(0.8, 0.3, 0.3, 0.4), border = "white", add = TRUE)
lines(d_trans, col = "red", lwd = 2)
curve(dnorm(x), add = TRUE, lwd = 2, lty = 2, col = "black") # Standard normal reference
legend("topleft",
legend = c("Original", "Transformed", "Original Density", "OSKT Density", "Standard Normal"),
col = c(rgb(0.2,0.4,0.8,0.6), rgb(0.8,0.3,0.3,0.6), "blue", "red", "black"),
lwd = c(10, 10, 2, 2, 2), lty = c(1, 1, 1, 1, 2), bty = "n")
# Normality tests with BC, YJ and OSKT
x_bc <- boxcox(x_orig, makepositive=TRUE)$transformed # Box-Cox transformation
x_yj <- yeojohnson(x_orig)$transformed # Yeo-Johnson transformation
x_oskt <- osktfast(x_orig)$transformed # OSKT
# Normality tests and moments
get_stats <- function(x) {
x <- x[is.finite(x)]
c(
Skew = mean((x - mean(x))^3) / sd(x)^3,
Kurt = mean((x - mean(x))^4) / sd(x)^4 - 3,
SW = shapiro.test(x)$p.value,
ZA = zatest(x, nsim=100)$p.value,
CVM = cvmtest(x)$p.value,
PPM = unname(pearsonp(x)$statistic)
)
}
# Normality results table
pval_table <- rbind(
ORG = get_stats(x_orig),
BC = get_stats(x_bc),
YJ = get_stats(x_yj),
OSKT = get_stats(x_oskt)
)
pval_table <- as.data.frame(round(pval_table, 4))
pval_table
Skew Kurt SW ZA CVM PPM
ORG -1.8642 5.9759 0.0000 0.0099 0.0001 9.24
BC -0.2397 -0.5635 0.0047 0.0099 0.1550 0.82
YJ -0.0454 -0.3109 0.1257 0.0396 0.3866 0.71
OSKT -0.2130 0.0118 0.0631 0.0099 0.9085 0.56Upon examining the table above, it is observed that OSKT reduces kurtosis close to 0, whereas YJ is more effective in reducing skewness. However, according to the PPM statistic, the best performance is obtained by OSKT with a value of 0.56, indicating that the method attempts to correct the entire distribution, not just the tails. Based on the SW and CVM normality tests, both YJ and OSKT are successful in normalizing the data, but according to the ZA test, neither method performs adequately. In conclusion, considering the SW, CVM, and PPM results as well as the reduction in kurtosis, OSKT demonstrates better performance compared to the other methods evaluated.