This is Part 3 in my series comparing different heat map packages. You can find Part 1 here & Part 2 here.
In this part, I’m going to look at ComplexHeatmap. I think I first came across ComplexHeatmap on Twitter. It looks to have some really great features. The clustering & annotation features alone look amazing!!! This seemed like the perfect excuse to play with it a bit.
We’re going to use the same data as before. A lot of the data wrangling will look similar to Part 1.
ComplexHeatmap is not available on CRAN. You’ll need to download it from Github. The easiest way to do this is with the devtools package. You can find the code below.
devtools::install_github("jokergoo/ComplexHeatmap")
Now that installation is taken care of, let’s start by loading the libraries we need.
library(tidyverse)
library(RColorBrewer)
library(ComplexHeatmap)
We’re going to read in our data similar to before, using read_csv()
.
df = read_csv("https://raw.githubusercontent.com/sapo83/TidyTuesday/master/2018/TT.20180403/us.avg.tuition.noCR.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## State = col_character(),
## `2004-05` = col_character(),
## `2005-06` = col_character(),
## `2006-07` = col_character(),
## `2007-08` = col_character(),
## `2008-09` = col_character(),
## `2009-10` = col_character(),
## `2010-11` = col_character(),
## `2011-12` = col_character(),
## `2012-13` = col_character(),
## `2013-14` = col_character(),
## `2014-15` = col_character(),
## `2015-16` = col_character()
## )
head(df)
## # A tibble: 6 x 13
## State `2004-05` `2005-06` `2006-07` `2007-08` `2008-09` `2009-10` `2010-11`
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Alabama $5,683 $5,841 $5,753 $6,008 $6,475 $7,189 $8,071
## 2 Alaska $4,328 $4,633 $4,919 $5,070 $5,075 $5,455 $5,759
## 3 Arizona $5,138 $5,416 $5,481 $5,682 $6,058 $7,263 $8,840
## 4 Arkansas $5,772 $6,082 $6,232 $6,415 $6,417 $6,627 $6,901
## 5 Califor… $5,286 $5,528 $5,335 $5,672 $5,898 $7,259 $8,194
## 6 Colorado $4,704 $5,407 $5,596 $6,227 $6,284 $6,948 $7,748
## # … with 5 more variables: 2011-12 <chr>, 2012-13 <chr>, 2013-14 <chr>,
## # 2014-15 <chr>, 2015-16 <chr>
Next we need to remove the dollar signs & commas from the numeric values.
df = as.data.frame(lapply(df, function (x) {gsub("[$,]", "", x)}))
head(df)
## State X2004.05 X2005.06 X2006.07 X2007.08 X2008.09 X2009.10 X2010.11
## 1 Alabama 5683 5841 5753 6008 6475 7189 8071
## 2 Alaska 4328 4633 4919 5070 5075 5455 5759
## 3 Arizona 5138 5416 5481 5682 6058 7263 8840
## 4 Arkansas 5772 6082 6232 6415 6417 6627 6901
## 5 California 5286 5528 5335 5672 5898 7259 8194
## 6 Colorado 4704 5407 5596 6227 6284 6948 7748
## X2011.12 X2012.13 X2013.14 X2014.15 X2015.16
## 1 8452 9098 9359 9496 9751
## 2 5762 6026 6012 6149 6571
## 3 9967 10134 10296 10414 10646
## 4 7029 7287 7408 7606 7867
## 5 9436 9361 9274 9187 9270
## 6 8316 8793 9293 9299 9748
Similar to pheatmap, ComplexHeatmap requires a numeric matrix. I will use as.numeric()
inside sapply()
to make sure each column except the first is numeric. I’ll wrap this in as.matrix()
to return a new matrix object (df_num
). We will assign the state names as row names to our new object.
df_num = as.matrix(sapply(df[, 2:13], as.numeric))
rownames(df_num) = df$State
head(df)
## State X2004.05 X2005.06 X2006.07 X2007.08 X2008.09 X2009.10 X2010.11
## 1 Alabama 5683 5841 5753 6008 6475 7189 8071
## 2 Alaska 4328 4633 4919 5070 5075 5455 5759
## 3 Arizona 5138 5416 5481 5682 6058 7263 8840
## 4 Arkansas 5772 6082 6232 6415 6417 6627 6901
## 5 California 5286 5528 5335 5672 5898 7259 8194
## 6 Colorado 4704 5407 5596 6227 6284 6948 7748
## X2011.12 X2012.13 X2013.14 X2014.15 X2015.16
## 1 8452 9098 9359 9496 9751
## 2 5762 6026 6012 6149 6571
## 3 9967 10134 10296 10414 10646
## 4 7029 7287 7408 7606 7867
## 5 9436 9361 9274 9187 9270
## 6 8316 8793 9293 9299 9748
Finally before we plot our data, we’ll clean up our column names. Two nested gsub()
commands remove the prefix “X” and replace the dash ("-") with “-20” to make both values 4 digit years.
colnames(df_num) = gsub("[.]", "-20", gsub("X", "", colnames(df_num)))
Now let’s plot our heat map!
Heatmap(df_num)
Ok! Not bad! First thing I’d like to change is the order of the columns. I’m going to add column_order = sort(colnames(df_num))
to reverse the order of the columns. When you use column_order
column clustering is set to FALSE. This also removes the column dendrogram we see in the original version.
Heatmap(df_num,
column_order = sort(colnames(df_num)))
Next I want to apply the same color palette & change the title of the legend. col = rev(brewer.pal(11, "RdYlBu"))
uses the same palette as the previous 2 plots. name = "Tuition"
renames our legend to a more descriptive title.
Heatmap(df_num,
column_order = sort(colnames(df_num)),
col = rev(brewer.pal(11, "RdYlBu")),
name = "Tuition")
Last of all, I want to change the size of each tile. I’ll use width
& height
to control that.
Heatmap(df_num,
column_order = sort(colnames(df_num)),
col = rev(brewer.pal(11, "RdYlBu")),
name = "Tuition",
width = ncol(df_num)*unit(3.5, "mm"),
height = nrow(df_num)*unit(3.5, "mm"))
This looks very similar to the clustered heat map we made in Part 2 with pheatmap. It’s interesting to note the slight differences in clustering between the two packages. I would assume they are using slightly different clustering algorithms.
I have barely scratched the surface on ComplexHeatmap. At some point, I’d like to come back with a different data set & use this package a bit more!
I think the main takeaway from each of these 3 packages is that each one fits certain use cases. Hopefully this has helped you identify which one would suit your needs best.
Feel free to reach out with any questions/comments/concerns via Twitter or using one of the other methods on my contact page. Thanks for reading!