The `ggcorr`

function is a visualization function to plot correlation matrixes as `ggplot2`

objects. It was inspired by a Stack Overflow question.

Correlation matrixes show the correlation coefficients between a relatively large number of continuous variables. However, while R offers a simple way to create such matrixes through the `cor`

function, it does not offer a plotting method for the matrixes created by that function.

The `ggcorr`

function offers such a plotting method, using the “grammar of graphics” implemented in the `ggplot2`

package to render the plot. In practice, its results are graphically close to those of the `corrplot`

function, which is part of the excellent `arm`

package.

`ggcorr`

is available through the `GGally`

package:

`install.packages("GGally")`

It can also be used as a standalone function:

`source("https://raw.githubusercontent.com/briatte/ggcorr/master/ggcorr.R")`

The main package dependency of `ggcorr`

is the `ggplot2`

package for plot construction.

`library(ggplot2)`

The `ggplot2`

package can be installed from CRAN through `install.packages`

. Doing so will also install the `reshape2`

package, which is used internally by `ggcorr`

for data manipulation.

The examples shown in this vignette use NBA statistics shared by Nathan Yau at his excellent blog “Flowing Data”.

`nba = read.csv("http://datasets.flowingdata.com/ppg2008.csv")`

Let’s pass the entire dataset to `ggcorr`

without any further work:

`ggcorr(nba)`

```
## Warning in ggcorr(nba): data in column(s) 'Name' are not numeric and were
## ignored
```

`## Warning: Non Lab interpolation is deprecated`

This example shows the default output of `ggcorr`

. It also produced a warning to indicate that one column of the dataset did not contain numeric data and was therefore dropped from the correlation matrix. The warning can be avoided by dropping the column from the dataset passed to `ggcorr`

:

`ggcorr(nba[, -1])`

**Note:** when used with a continuous color scale, `ggcorr`

also currently produces a warning related to color interpolation. This is an innocuous warning that should disappear in future updates of the `ggplot2`

and `scales`

packages. This warning is hidden in the rest of this vignette.

The first argument of `ggcorr`

is called `data`

. It accepts either a data frame, as shown above, or a matrix of observations, which will be converted to a data frame before plotting:

`ggcorr(matrix(runif(5), 2, 5))`

`ggcorr`

can also accept a correlation matrix through the `cor_matrix`

argument, in which case its first argument must be set to `NULL`

to indicate that `ggcorr`

should use the correlation matrix instead:

`ggcorr(data = NULL, cor_matrix = cor(nba[, -1], use = "everything"))`

`ggcorr`

supports all correlation methods offered by the `cor`

function. The method is controlled by the `method`

argument, which takes two character strings:

- The first setting that needs to be taken into account in a correlation matrix is the selection of observations to be used. This setting might take any of the following values:
`"everything"`

,`"all.obs"`

,`"complete.obs"`

,`"na.or.complete"`

or`"pairwise.complete.obs"`

(the default used by`ggcorr`

). These settings control how covariances are computed in the presence of missing values. The difference between each of them is explained in the documentation of the`cor`

function. - The second setting that
`ggcorr`

requires is the type of correlation coefficient to be computed. There are three possible values for it:`"pearson"`

(the default used both by`ggcorr`

and by`cor`

),`"kendall"`

or`"spearman"`

. Again, the difference between each setting is explained in the documentation of the`cor`

function. Generally speaking, unless the data are ordinal, the default choice should be`"pearson"`

, which produces correlation coefficients based on Pearson’s method.

Here are some examples showing how to pass different correlation methods to `ggcorr`

:

```
# Pearson correlation coefficients, using pairwise observations (default method)
ggcorr(nba[, -1], method = c("pairwise", "pearson"))
# Pearson correlation coefficients, using all observations
ggcorr(nba[, -1], method = c("everything", "pearson"))
# Kendall correlation coefficients, using complete observations
ggcorr(nba[, -1], method = c("complete", "kendall"))
# Spearman correlation coefficients, using strictly complete observations
ggcorr(nba[, -1], method = c("all.obs", "spearman"))
```

If no second argument is provided, `ggcorr`

will default to `"pearson"`

.

The rest of this vignettes focuses on how to tweak the aspect of the correlation matrix plotted by `ggcorr`

.

By default, `ggcorr`

uses a continuous color scale that extends from \(-1\) to \(+1\) to show the strength of each correlation represented in the matrix. To switch to categorical colors, all the user has to do is to add the `nbreaks`

argument, which specifies how many breaks should be contained in the color scale:

`ggcorr(nba[, 2:15], nbreaks = 5)`

When the `nbreaks`

argument is used, the number of digits shown in the color scale is controlled through the `digits`

argument. The `digits`

argument defaults to two digits, but as shown in the example above, it will default to a single digit if the breaks do not require more precision.

Further control over the color scale includes the `name`

argument, which sets its title, the `legend.size`

argument, which sets the size of the legend text, and the `legend.position`

argument, which controls where the legend is displayed. The latter two are just shortcuts to the same arguments in `ggplot2`

’s `theme`

, and since the plot is a `ggplot2`

object, all other relevant `theme`

and `guides`

methods also apply:

```
ggcorr(nba[, 2:15], name = expression(rho), legend.position = "bottom", legend.size = 12) +
guides(fill = guide_colorbar(barwidth = 18, title.vjust = 0.75)) +
theme(legend.title = element_text(size = 14))
```

`ggcorr`

uses a default color gradient that goes from bright red to light grey to bright blue. This gradient can be modified through the `low`

, `mid`

and `high`

arguments, which are similar to those of the `scale_gradient2`

controller in `ggplot2`

:

`ggcorr(nba[, 2:15], low = "steelblue", mid = "white", high = "darkred")`

`## Warning: Non Lab interpolation is deprecated`