Summarizing the size of cells

Joachim Goedhart

Loading the data from the Google sheet and cleaning it:

The tidy data sheet df is used as an input for the plot that shows the distribution:

In [1]:

ggplot(df, aes(x=Size, fill=Sample)) +
  
  geom_histogram(bins = 100, alpha=.8, color='grey40') +
  # scale_x_log10() +
  labs(y="Count", x="Size [µm]") +
  # coord_cartesian(xlim = c(0.5,120)) +
  theme_light(base_size = 16) + theme(axis.text.y = element_blank()) +
  facet_wrap(~Sample) +
  theme(legend.position = "none") +
  
  #Force the y-axis to start at zero
  scale_y_continuous(expand = c(0, NA), limits = c(0,150)) +
  #Apply a logarithmic scale to the x-axis and set the numbers for the scale
  scale_x_log10(breaks = c(1,10,100), limits = c(.5,200)) +
  #Remove minor gridlines
  theme(panel.grid.minor = element_blank()) +
  #Add ticks to the bottom, outside
  annotation_logticks(sides="b", outside = TRUE) +
  #Give a little more space to the log-ticks by adding margin to the top of the x-axis text
  theme(axis.text.x = element_text(margin = margin(t=8))) +
  #Needed to see the tcks outside the plot panel
  coord_cartesian(clip = "off")

Figure 1: Distribution of the measured size of human cheek cells and their nucleus on a log scale. Aggregated data from three consecutive years (2021-2023).

The size data of the Cells is selected and used to plot the distributions as a violin plot per group. The median value is indicated as a black dot:

In [2]:

df_cell <- df %>% filter(Sample =="Cell")

        p <- ggplot(df_cell, aes(x=Group, y=Size, fill=Group))

        p <- p + geom_violin() +  stat_summary(fun = median, geom = "point")
        
        p <- p + scale_y_log10()
        
        p <- p + labs(x="Group", y="Size [µm]")
        p <- p + coord_cartesian(ylim = c(0.5,200))
        
        p <- p + theme_light(base_size = 16)
        p <- p + theme(legend.position = "none")

p