Correlating Variables in Your Model

To get started, we are going to recap what we did in your first model and build a correlated version of the same model and compare.

Building a Simple Uncorrelated Model

using Distributions, StatsBase, DataFrames, MCHammer
n_trials = 10000
Revenue = rand(TriangularDist(2500000,4000000,3000000), n_trials)
Expenses = rand(TriangularDist(1400000,3000000,2000000), n_trials)

# The Model
Profit = Revenue - Expenses

#Trial Results : the Profit vector (OUTPUT)
Profit

# Trials or Results Table (OUTPUT)
Trials = hcat(Profit, Revenue, Expenses)
Trials = DataFrame(Trials)
names!(Trials, [:Profit, :Revenue, :Expenses])

cormat(Trials)
3×3 Array{Float64,2}:
  1.0       0.669217     -0.716073
  0.669217  1.0           0.000726586
 -0.716073  0.000726586   1.0

Applying correlation to your simulation

Using the corvar() function, we are going to correlate the Revenue and Expenses at -0.8 and generate the results tables for both the correlated and uncorrelated versions.

#Apply correlation to random samples
Rev_Exp_Cor = 0.8
cor_matrix = [1 Rev_Exp_Cor; Rev_Exp_Cor 1]

#Validate input correlation. You can also use cormat() to define the correlation
#matrix from historical data.
cor_matrix
2×2 Array{Float64,2}:
 1.0  0.8
 0.8  1.0

It is very important to join Trial into an array before applying correlation. Furthermore, this step is necessary in order to produce a sensitivity_chrt()

c_table = [Revenue Expenses]
C_Trials = corvar(c_table, n_trials, cor_matrix)

#Correlated Model(2) - Create Correlated Results Array
C_Profit = C_Trials[1] - C_Trials[2]
C_Trials = [C_Profit, C_Trials[1], C_Trials[2]]
C_Trials = DataFrame(C_Trials)
names!(C_Trials, [:C_Profit, :C_Revenue, :C_Expenses])

cormat(C_Trials)
3×3 Array{Float64,2}:
  1.0       0.225132  -0.400685
  0.225132  1.0        0.778227
 -0.400685  0.778227   1.0

Analyze the impact of correlation on your output

Input Correlation between Revenue and Expenses

Input Correlation:

cor(Revenue,Expenses)
-0.0010405497680156886

Input Correlation for the Correlated Model:

cor(C_Trials[2],C_Trials[3])
0.7889819301131576

Make sure to put a line in your project that lists all the outputs you can query with the charting and stats functions.

println("Model Outputs: Trials, C_Trials, Profit, C_Profit")

Correlated vs. Uncorrelated results in Julia

Let us compare the percentiles of an uncorrelated model vs. a correlated one.

Uncorrelated Results

density_chrt(Profit)
Sim. Values -6×10⁶ -5×10⁶ -4×10⁶ -3×10⁶ -2×10⁶ -1×10⁶ 0 1×10⁶ 2×10⁶ 3×10⁶ 4×10⁶ 5×10⁶ 6×10⁶ 7×10⁶ 8×10⁶ -5.0×10⁶ -4.8×10⁶ -4.6×10⁶ -4.4×10⁶ -4.2×10⁶ -4.0×10⁶ -3.8×10⁶ -3.6×10⁶ -3.4×10⁶ -3.2×10⁶ -3.0×10⁶ -2.8×10⁶ -2.6×10⁶ -2.4×10⁶ -2.2×10⁶ -2.0×10⁶ -1.8×10⁶ -1.6×10⁶ -1.4×10⁶ -1.2×10⁶ -1.0×10⁶ -8.0×10⁵ -6.0×10⁵ -4.0×10⁵ -2.0×10⁵ 0 2.0×10⁵ 4.0×10⁵ 6.0×10⁵ 8.0×10⁵ 1.0×10⁶ 1.2×10⁶ 1.4×10⁶ 1.6×10⁶ 1.8×10⁶ 2.0×10⁶ 2.2×10⁶ 2.4×10⁶ 2.6×10⁶ 2.8×10⁶ 3.0×10⁶ 3.2×10⁶ 3.4×10⁶ 3.6×10⁶ 3.8×10⁶ 4.0×10⁶ 4.2×10⁶ 4.4×10⁶ 4.6×10⁶ 4.8×10⁶ 5.0×10⁶ 5.2×10⁶ 5.4×10⁶ 5.6×10⁶ 5.8×10⁶ 6.0×10⁶ 6.2×10⁶ 6.4×10⁶ 6.6×10⁶ 6.8×10⁶ 7.0×10⁶ -5×10⁶ 0 5×10⁶ 1×10⁷ -5.0×10⁶ -4.5×10⁶ -4.0×10⁶ -3.5×10⁶ -3.0×10⁶ -2.5×10⁶ -2.0×10⁶ -1.5×10⁶ -1.0×10⁶ -5.0×10⁵ 0 5.0×10⁵ 1.0×10⁶ 1.5×10⁶ 2.0×10⁶ 2.5×10⁶ 3.0×10⁶ 3.5×10⁶ 4.0×10⁶ 4.5×10⁶ 5.0×10⁶ 5.5×10⁶ 6.0×10⁶ 6.5×10⁶ 7.0×10⁶ h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -1.2×10⁻⁶ -1.0×10⁻⁶ -8.0×10⁻⁷ -6.0×10⁻⁷ -4.0×10⁻⁷ -2.0×10⁻⁷ 0 2.0×10⁻⁷ 4.0×10⁻⁷ 6.0×10⁻⁷ 8.0×10⁻⁷ 1.0×10⁻⁶ 1.2×10⁻⁶ 1.4×10⁻⁶ 1.6×10⁻⁶ 1.8×10⁻⁶ 2.0×10⁻⁶ 2.2×10⁻⁶ -1.00×10⁻⁶ -9.50×10⁻⁷ -9.00×10⁻⁷ -8.50×10⁻⁷ -8.00×10⁻⁷ -7.50×10⁻⁷ -7.00×10⁻⁷ -6.50×10⁻⁷ -6.00×10⁻⁷ -5.50×10⁻⁷ -5.00×10⁻⁷ -4.50×10⁻⁷ -4.00×10⁻⁷ -3.50×10⁻⁷ -3.00×10⁻⁷ -2.50×10⁻⁷ -2.00×10⁻⁷ -1.50×10⁻⁷ -1.00×10⁻⁷ -5.00×10⁻⁸ 0 5.00×10⁻⁸ 1.00×10⁻⁷ 1.50×10⁻⁷ 2.00×10⁻⁷ 2.50×10⁻⁷ 3.00×10⁻⁷ 3.50×10⁻⁷ 4.00×10⁻⁷ 4.50×10⁻⁷ 5.00×10⁻⁷ 5.50×10⁻⁷ 6.00×10⁻⁷ 6.50×10⁻⁷ 7.00×10⁻⁷ 7.50×10⁻⁷ 8.00×10⁻⁷ 8.50×10⁻⁷ 9.00×10⁻⁷ 9.50×10⁻⁷ 1.00×10⁻⁶ 1.05×10⁻⁶ 1.10×10⁻⁶ 1.15×10⁻⁶ 1.20×10⁻⁶ 1.25×10⁻⁶ 1.30×10⁻⁶ 1.35×10⁻⁶ 1.40×10⁻⁶ 1.45×10⁻⁶ 1.50×10⁻⁶ 1.55×10⁻⁶ 1.60×10⁻⁶ 1.65×10⁻⁶ 1.70×10⁻⁶ 1.75×10⁻⁶ 1.80×10⁻⁶ 1.85×10⁻⁶ 1.90×10⁻⁶ 1.95×10⁻⁶ 2.00×10⁻⁶ -1×10⁻⁶ 0 1×10⁻⁶ 2×10⁻⁶ -1.0×10⁻⁶ -9.0×10⁻⁷ -8.0×10⁻⁷ -7.0×10⁻⁷ -6.0×10⁻⁷ -5.0×10⁻⁷ -4.0×10⁻⁷ -3.0×10⁻⁷ -2.0×10⁻⁷ -1.0×10⁻⁷ 0 1.0×10⁻⁷ 2.0×10⁻⁷ 3.0×10⁻⁷ 4.0×10⁻⁷ 5.0×10⁻⁷ 6.0×10⁻⁷ 7.0×10⁻⁷ 8.0×10⁻⁷ 9.0×10⁻⁷ 1.0×10⁻⁶ 1.1×10⁻⁶ 1.2×10⁻⁶ 1.3×10⁻⁶ 1.4×10⁻⁶ 1.5×10⁻⁶ 1.6×10⁻⁶ 1.7×10⁻⁶ 1.8×10⁻⁶ 1.9×10⁻⁶ 2.0×10⁻⁶ Frequency

Probability of Making 1m or less (uncorrelated) :

GetCertainty(Profit, 1000000, 0)
0.4747

fractiles() allows you to get the percentiles at various increments.

fractiles(Profit)
11×2 Array{Any,2}:
 "P0.0"    -337427.0
 "P10.0"    433308.0
 "P20.0"    637083.0
 "P30.0"    790785.0
 "P40.0"    916016.0
 "P50.0"         1.03172e6
 "P60.0"         1.14253e6
 "P70.0"         1.26634e6
 "P80.0"         1.41062e6
 "P90.0"         1.6273e6
 "P100.0"        2.43465e6
sensitivity_chrt(Trials,1)
Rank Correlation -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 -3.0 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 -4 -2 0 2 4 -3.0 -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 Positive Negative impact h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? Revenue Expenses Input Variables with Biggest Impact

Correlated Results

density_chrt(C_Profit)
Sim. Values -2.5×10⁶ -2.0×10⁶ -1.5×10⁶ -1.0×10⁶ -5.0×10⁵ 0 5.0×10⁵ 1.0×10⁶ 1.5×10⁶ 2.0×10⁶ 2.5×10⁶ 3.0×10⁶ 3.5×10⁶ 4.0×10⁶ 4.5×10⁶ -2.0×10⁶ -1.9×10⁶ -1.8×10⁶ -1.7×10⁶ -1.6×10⁶ -1.5×10⁶ -1.4×10⁶ -1.3×10⁶ -1.2×10⁶ -1.1×10⁶ -1.0×10⁶ -9.0×10⁵ -8.0×10⁵ -7.0×10⁵ -6.0×10⁵ -5.0×10⁵ -4.0×10⁵ -3.0×10⁵ -2.0×10⁵ -1.0×10⁵ 0 1.0×10⁵ 2.0×10⁵ 3.0×10⁵ 4.0×10⁵ 5.0×10⁵ 6.0×10⁵ 7.0×10⁵ 8.0×10⁵ 9.0×10⁵ 1.0×10⁶ 1.1×10⁶ 1.2×10⁶ 1.3×10⁶ 1.4×10⁶ 1.5×10⁶ 1.6×10⁶ 1.7×10⁶ 1.8×10⁶ 1.9×10⁶ 2.0×10⁶ 2.1×10⁶ 2.2×10⁶ 2.3×10⁶ 2.4×10⁶ 2.5×10⁶ 2.6×10⁶ 2.7×10⁶ 2.8×10⁶ 2.9×10⁶ 3.0×10⁶ 3.1×10⁶ 3.2×10⁶ 3.3×10⁶ 3.4×10⁶ 3.5×10⁶ 3.6×10⁶ 3.7×10⁶ 3.8×10⁶ 3.9×10⁶ 4.0×10⁶ -2×10⁶ 0 2×10⁶ 4×10⁶ -2.0×10⁶ -1.8×10⁶ -1.6×10⁶ -1.4×10⁶ -1.2×10⁶ -1.0×10⁶ -8.0×10⁵ -6.0×10⁵ -4.0×10⁵ -2.0×10⁵ 0 2.0×10⁵ 4.0×10⁵ 6.0×10⁵ 8.0×10⁵ 1.0×10⁶ 1.2×10⁶ 1.4×10⁶ 1.6×10⁶ 1.8×10⁶ 2.0×10⁶ 2.2×10⁶ 2.4×10⁶ 2.6×10⁶ 2.8×10⁶ 3.0×10⁶ 3.2×10⁶ 3.4×10⁶ 3.6×10⁶ 3.8×10⁶ 4.0×10⁶ h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -2.5×10⁻⁶ -2.0×10⁻⁶ -1.5×10⁻⁶ -1.0×10⁻⁶ -5.0×10⁻⁷ 0 5.0×10⁻⁷ 1.0×10⁻⁶ 1.5×10⁻⁶ 2.0×10⁻⁶ 2.5×10⁻⁶ 3.0×10⁻⁶ 3.5×10⁻⁶ 4.0×10⁻⁶ 4.5×10⁻⁶ -2.0×10⁻⁶ -1.9×10⁻⁶ -1.8×10⁻⁶ -1.7×10⁻⁶ -1.6×10⁻⁶ -1.5×10⁻⁶ -1.4×10⁻⁶ -1.3×10⁻⁶ -1.2×10⁻⁶ -1.1×10⁻⁶ -1.0×10⁻⁶ -9.0×10⁻⁷ -8.0×10⁻⁷ -7.0×10⁻⁷ -6.0×10⁻⁷ -5.0×10⁻⁷ -4.0×10⁻⁷ -3.0×10⁻⁷ -2.0×10⁻⁷ -1.0×10⁻⁷ 0 1.0×10⁻⁷ 2.0×10⁻⁷ 3.0×10⁻⁷ 4.0×10⁻⁷ 5.0×10⁻⁷ 6.0×10⁻⁷ 7.0×10⁻⁷ 8.0×10⁻⁷ 9.0×10⁻⁷ 1.0×10⁻⁶ 1.1×10⁻⁶ 1.2×10⁻⁶ 1.3×10⁻⁶ 1.4×10⁻⁶ 1.5×10⁻⁶ 1.6×10⁻⁶ 1.7×10⁻⁶ 1.8×10⁻⁶ 1.9×10⁻⁶ 2.0×10⁻⁶ 2.1×10⁻⁶ 2.2×10⁻⁶ 2.3×10⁻⁶ 2.4×10⁻⁶ 2.5×10⁻⁶ 2.6×10⁻⁶ 2.7×10⁻⁶ 2.8×10⁻⁶ 2.9×10⁻⁶ 3.0×10⁻⁶ 3.1×10⁻⁶ 3.2×10⁻⁶ 3.3×10⁻⁶ 3.4×10⁻⁶ 3.5×10⁻⁶ 3.6×10⁻⁶ 3.7×10⁻⁶ 3.8×10⁻⁶ 3.9×10⁻⁶ 4.0×10⁻⁶ -2×10⁻⁶ 0 2×10⁻⁶ 4×10⁻⁶ -2.0×10⁻⁶ -1.8×10⁻⁶ -1.6×10⁻⁶ -1.4×10⁻⁶ -1.2×10⁻⁶ -1.0×10⁻⁶ -8.0×10⁻⁷ -6.0×10⁻⁷ -4.0×10⁻⁷ -2.0×10⁻⁷ 0 2.0×10⁻⁷ 4.0×10⁻⁷ 6.0×10⁻⁷ 8.0×10⁻⁷ 1.0×10⁻⁶ 1.2×10⁻⁶ 1.4×10⁻⁶ 1.6×10⁻⁶ 1.8×10⁻⁶ 2.0×10⁻⁶ 2.2×10⁻⁶ 2.4×10⁻⁶ 2.6×10⁻⁶ 2.8×10⁻⁶ 3.0×10⁻⁶ 3.2×10⁻⁶ 3.4×10⁻⁶ 3.6×10⁻⁶ 3.8×10⁻⁶ 4.0×10⁻⁶ Frequency

Probability of Making 1m or less (correlated) :

GetCertainty(C_Profit, 1000000, 0)
0.4324
fractiles(C_Profit)
11×2 Array{Any,2}:
 "P0.0"    217536.0
 "P10.0"   761763.0
 "P20.0"   857252.0
 "P30.0"   926631.0
 "P40.0"   983526.0
 "P50.0"        1.03499e6
 "P60.0"        1.08516e6
 "P70.0"        1.13697e6
 "P80.0"        1.20116e6
 "P90.0"        1.2912e6
 "P100.0"       1.867e6
sensitivity_chrt(C_Trials,1)
Rank Correlation -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 -3.0 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 -4 -2 0 2 4 -3.0 -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 Positive Negative impact h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? C_Revenue C_Expenses Input Variables with Biggest Impact

A quick analysis of the results

  1. Accounting for correlation meant a 5% (42.5% vs. 47.7%) reduction in probability of not making our goals.
  1. The Worse Case goes from -290k to 230k, a 225% difference
  1. The critical driver in both cases is expenses.