A Modern Day Comparison of 3 Heat Map Packages - Part 3


This is Part 3 in my series comparing different heat map packages. You can find Part 1 here & Part 2 here.

In this part, I’m going to look at ComplexHeatmap. I think I first came across ComplexHeatmap on Twitter. It looks to have some really great features. The clustering & annotation features alone look amazing!!! This seemed like the perfect excuse to play with it a bit.

We’re going to use the same data as before. A lot of the data wrangling will look similar to Part 1.

ComplexHeatmap is not available on CRAN. You’ll need to download it from Github. The easiest way to do this is with the devtools package. You can find the code below.

devtools::install_github("jokergoo/ComplexHeatmap")

Now that installation is taken care of, let’s start by loading the libraries we need.

library(tidyverse)
library(RColorBrewer)
library(ComplexHeatmap)

We’re going to read in our data similar to before, using read_csv().

df = read_csv("https://raw.githubusercontent.com/sapo83/TidyTuesday/master/2018/TT.20180403/us.avg.tuition.noCR.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   State = col_character(),
##   `2004-05` = col_character(),
##   `2005-06` = col_character(),
##   `2006-07` = col_character(),
##   `2007-08` = col_character(),
##   `2008-09` = col_character(),
##   `2009-10` = col_character(),
##   `2010-11` = col_character(),
##   `2011-12` = col_character(),
##   `2012-13` = col_character(),
##   `2013-14` = col_character(),
##   `2014-15` = col_character(),
##   `2015-16` = col_character()
## )
head(df)
## # A tibble: 6 x 13
##   State    `2004-05` `2005-06` `2006-07` `2007-08` `2008-09` `2009-10` `2010-11`
##   <chr>    <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>    
## 1 Alabama  $5,683    $5,841    $5,753    $6,008    $6,475    $7,189    $8,071   
## 2 Alaska   $4,328    $4,633    $4,919    $5,070    $5,075    $5,455    $5,759   
## 3 Arizona  $5,138    $5,416    $5,481    $5,682    $6,058    $7,263    $8,840   
## 4 Arkansas $5,772    $6,082    $6,232    $6,415    $6,417    $6,627    $6,901   
## 5 Califor… $5,286    $5,528    $5,335    $5,672    $5,898    $7,259    $8,194   
## 6 Colorado $4,704    $5,407    $5,596    $6,227    $6,284    $6,948    $7,748   
## # … with 5 more variables: 2011-12 <chr>, 2012-13 <chr>, 2013-14 <chr>,
## #   2014-15 <chr>, 2015-16 <chr>

Next we need to remove the dollar signs & commas from the numeric values.

df = as.data.frame(lapply(df, function (x) {gsub("[$,]", "", x)}))

head(df)
##        State X2004.05 X2005.06 X2006.07 X2007.08 X2008.09 X2009.10 X2010.11
## 1    Alabama     5683     5841     5753     6008     6475     7189     8071
## 2     Alaska     4328     4633     4919     5070     5075     5455     5759
## 3    Arizona     5138     5416     5481     5682     6058     7263     8840
## 4   Arkansas     5772     6082     6232     6415     6417     6627     6901
## 5 California     5286     5528     5335     5672     5898     7259     8194
## 6   Colorado     4704     5407     5596     6227     6284     6948     7748
##   X2011.12 X2012.13 X2013.14 X2014.15 X2015.16
## 1     8452     9098     9359     9496     9751
## 2     5762     6026     6012     6149     6571
## 3     9967    10134    10296    10414    10646
## 4     7029     7287     7408     7606     7867
## 5     9436     9361     9274     9187     9270
## 6     8316     8793     9293     9299     9748

Similar to pheatmap, ComplexHeatmap requires a numeric matrix. I will use as.numeric() inside sapply() to make sure each column except the first is numeric. I’ll wrap this in as.matrix() to return a new matrix object (df_num). We will assign the state names as row names to our new object.

df_num = as.matrix(sapply(df[, 2:13], as.numeric))  

rownames(df_num) = df$State

head(df)
##        State X2004.05 X2005.06 X2006.07 X2007.08 X2008.09 X2009.10 X2010.11
## 1    Alabama     5683     5841     5753     6008     6475     7189     8071
## 2     Alaska     4328     4633     4919     5070     5075     5455     5759
## 3    Arizona     5138     5416     5481     5682     6058     7263     8840
## 4   Arkansas     5772     6082     6232     6415     6417     6627     6901
## 5 California     5286     5528     5335     5672     5898     7259     8194
## 6   Colorado     4704     5407     5596     6227     6284     6948     7748
##   X2011.12 X2012.13 X2013.14 X2014.15 X2015.16
## 1     8452     9098     9359     9496     9751
## 2     5762     6026     6012     6149     6571
## 3     9967    10134    10296    10414    10646
## 4     7029     7287     7408     7606     7867
## 5     9436     9361     9274     9187     9270
## 6     8316     8793     9293     9299     9748

Finally before we plot our data, we’ll clean up our column names. Two nested gsub() commands remove the prefix “X” and replace the dash ("-") with “-20” to make both values 4 digit years.

colnames(df_num) = gsub("[.]", "-20", gsub("X", "", colnames(df_num)))

Now let’s plot our heat map!

Heatmap(df_num)

Ok! Not bad! First thing I’d like to change is the order of the columns. I’m going to add column_order = sort(colnames(df_num)) to reverse the order of the columns. When you use column_order column clustering is set to FALSE. This also removes the column dendrogram we see in the original version.

Heatmap(df_num,
        column_order = sort(colnames(df_num)))

Next I want to apply the same color palette & change the title of the legend. col = rev(brewer.pal(11, "RdYlBu")) uses the same palette as the previous 2 plots. name = "Tuition" renames our legend to a more descriptive title.

Heatmap(df_num,
        column_order = sort(colnames(df_num)),
        col = rev(brewer.pal(11, "RdYlBu")),
        name = "Tuition")

Last of all, I want to change the size of each tile. I’ll use width & height to control that.

Heatmap(df_num,
        column_order = sort(colnames(df_num)),
        col = rev(brewer.pal(11, "RdYlBu")),
        name = "Tuition",
        width = ncol(df_num)*unit(3.5, "mm"), 
        height = nrow(df_num)*unit(3.5, "mm"))

This looks very similar to the clustered heat map we made in Part 2 with pheatmap. It’s interesting to note the slight differences in clustering between the two packages. I would assume they are using slightly different clustering algorithms.

I have barely scratched the surface on ComplexHeatmap. At some point, I’d like to come back with a different data set & use this package a bit more!

I think the main takeaway from each of these 3 packages is that each one fits certain use cases. Hopefully this has helped you identify which one would suit your needs best.

Feel free to reach out with any questions/comments/concerns via Twitter or using one of the other methods on my contact page. Thanks for reading!