Taking Charge with Fitbit
*insert introductory section here –> What is this? What motivated me?*
This document is the product of a combination of events in my life and some newly-fueled interest that I’ve had for a longer time.
Something about my interest in data:
- Gathering data on sports, physical state, games
- Bachelor Thesis about quantified self
- Gathering data on sports, physical state, games
I’m fascinated by the all the cool things you can do with data, but I never really took the time to do something like it myself.
*make sure to come up with better titles for everything*
1 Fitbit Time-Series Data
*describe how the data were collected, stored, and what they look like*
All the data that I’m using can be retrieved by calling the fitbit API through the getActivitiesResourceByDatePeriod, getHeartByDateIntraday, and getWeightByDate methods. The functions I wrote to perform these GET requests can be found in this script.
I decided to structure the data in a tidy format. The code to collect new data from the fitbit servers and tidy it can be found in this script. So first I split up the data in minute-level and daily-level time series. Then I created separate tibbles per type of value:
- Minute-level
- Numerical: calories, distance, elevation, floors, heartrate, steps
- Ordinal: activity intensity
- Daily-level
- Numerical: weight, BMI, activity calories, base metabolic heartrate calories, rest heartrate{, minutes, calories out (per heartrate zone –> separate table) [backlog]}
str(fitbit_data)
## List of 3
## $ dl_num:Classes 'tbl_df', 'tbl' and 'data.frame': 611 obs. of 5 variables:
## ..$ date : Date[1:611], format: "2019-01-01" ...
## ..$ type : Factor w/ 5 levels "bmi","weight",..: 3 4 5 3 4 5 3 4 5 3 ...
## ..$ value: num [1:611] 1885 708 59 1885 1389 ...
## ..$ day : Ord.factor w/ 7 levels "Monday"<"Tuesday"<..: 2 2 2 3 3 3 4 4 4 5 ...
## ..$ week : Ord.factor w/ 21 levels "1"<"2"<"3"<"4"<..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ml_ord:Classes 'tbl_df', 'tbl' and 'data.frame': 202980 obs. of 5 variables:
## ..$ datetime: POSIXct[1:202980], format: "2019-01-01 00:00:00" ...
## ..$ type : Factor w/ 1 level "intensity": 1 1 1 1 1 1 1 1 1 1 ...
## ..$ value : Ord.factor w/ 4 levels "sedentary"<"light"<..: 1 2 1 1 1 1 1 2 2 2 ...
## ..$ day : Ord.factor w/ 7 levels "Monday"<"Tuesday"<..: 2 2 2 2 2 2 2 2 2 2 ...
## ..$ week : Ord.factor w/ 21 levels "1"<"2"<"3"<"4"<..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ml_num:Classes 'tbl_df', 'tbl' and 'data.frame': 1193695 obs. of 5 variables:
## ..$ datetime: POSIXct[1:1193695], format: "2019-01-01 00:00:00" ...
## ..$ type : Factor w/ 6 levels "calories","distance",..: 1 2 3 4 5 6 1 2 3 4 ...
## ..$ value : num [1:1193695] 1.7 0 0 0 80 ...
## ..$ day : Ord.factor w/ 7 levels "Monday"<"Tuesday"<..: 2 2 2 2 2 2 2 2 2 2 ...
## ..$ week : Ord.factor w/ 21 levels "1"<"2"<"3"<"4"<..: 1 1 1 1 1 1 1 1 1 1 ...
1.1 Minute-Level Data
1.1.1 Data Validation
*Show summary statistics for all numerical variables and table for ordinal. Show histogram and density plot for heartrate. Other plots are so heavily skewed that they are not very informative other than that they show that the values are 0 most of the time.*
fitbit_data$ml_num %>%
group_by(type) %>%
summarise(
minimum = min(value),
pctl_25 = quantile(value, probs = c(0.25), names = FALSE),
median = median(value),
pctl_75 = quantile(value, probs = c(0.75), names = FALSE),
maximum = max(value),
mean = mean(value),
st_dev = sd(value)
) %>%
knitr::kable(digits = 2, caption = "", col.names = c("",
"*P~0~*",
"*P~25~*",
"*P~50~*",
"*P~75~*",
"*P~100~*",
"*$\\mu$*",
"*$\\sigma$~X~*"))
P0 | P25 | P50 | P75 | P100 | \(\mu\) | \(\sigma\)X | |
---|---|---|---|---|---|---|---|
calories | 1.22 | 1.26 | 1.36 | 1.66 | 17.94 | 2.35 | 2.35 |
distance | 0.00 | 0.00 | 0.00 | 0.00 | 249.47 | 9.22 | 26.72 |
elevation | 0.00 | 0.00 | 0.00 | 0.00 | 21.34 | 0.07 | 0.61 |
floors | 0.00 | 0.00 | 0.00 | 0.00 | 7.00 | 0.02 | 0.20 |
heartrate | 35.00 | 50.00 | 58.00 | 69.00 | 182.00 | 62.69 | 19.94 |
steps | 0.00 | 0.00 | 0.00 | 0.00 | 185.00 | 11.31 | 30.20 |
*Some text here.*
min_hr <- fitbit_data$ml_num %>%
filter(type == "heartrate") %>%
select(value) %>%
min()
max_hr <- fitbit_data$ml_num %>%
filter(type == "heartrate") %>%
select(value) %>%
max()
fitbit_data$ml_num %>%
filter(type == "heartrate") %>%
ggplot(aes(x = value, fill = value, stat(density))) +
geom_histogram(binwidth = 1, color = "black", fill = "#2A211C") +
geom_density(alpha = 0.8, color = "black", fill = "#9F2042") +
theme_classic(base_size = 12, base_line_size = 1) +
theme(axis.title.x = element_text(color = "#000000", face = "italic", margin = margin(7.5, 0, 0, 0)),
axis.title.y = element_text(color = "#000000", face = "italic", margin = margin(0, 7.5, 0, 0)),
axis.text.x = element_text(color = "#000000", face = "bold", size = 9),
axis.text.y = element_text(color = "#000000", face = "bold", size = 9),
panel.background = element_rect(fill = "#FCFCFC"),
plot.background = element_rect(color = "#000000", fill = "#FCFCFC")) +
labs(x = "Heart rate", y = "Density") +
scale_x_continuous(breaks = c(min_hr, 50, 75, 100, 125, 150, 175, max_hr))
ds_histogram_and_density_heartrate
*Some text here.*
1.2 Daily-Level Data
# act_tidy %>%
# filter(!is.na(act_tidy$datetime)) %>%
# group_by(as.Date(datetime, tz = "Europe/Amsterdam"), type) %>%
# summarise(
# daily_total = sum(value)
# ) %>%
# group_by(type) %>%
# summarise(
# total = sum(daily_total),
# minimum = min(daily_total),
# mean = mean(daily_total),
# median = median(daily_total),
# maximum = max(daily_total)
# ) %>%
# knitr::kable(digits = 2, caption = "Summary of activity data")
# act_tidy %>%
# filter(!is.na(act_tidy$datetime)) %>%
# group_by(week, type) %>%
# summarise(
# average_day = sum(value)/uniqueN(day)
# ) %>%
# ggplot(aes(x = week, y = average_day)) +
# geom_point() +
# geom_line() +
# facet_wrap(~ type, nrow = 2, ncol = 3, scales = "free_y")
2 How does getting fitter reflect in the data?
*obviously do a solid attempt to answer this question here*
2.1 Rest Heart Rate Over Time
# hr_summary %>%
# ggplot(aes(x = date, y = rest_hr)) +
# geom_point() +
# geom_smooth(se = FALSE, span = 0.275) +
# labs(title = "Rest Heart Rate over Time",
# subtitle = "*this is how you make a subtitle, Luc*")
# hr_intraday %>%
# group_by(week) %>%
# summarise(
# minimum = min(hr),
# mean = mean(hr),
# median = median(hr),
# maximum = max(hr)
# ) %>%
# gather(key = "type", value = "value", -week) %>%
# ggplot(aes(x = week, y = value)) +
# geom_point() +
# geom_line() +
# facet_wrap(~ type, nrow = 2, ncol = 2, scales = "free_y")
2.2 Weekly heart-rate distribution
*show ggridges weekly heart rate distributions here*
# hr_intraday %>%
# mutate(week = fct_rev(as.factor(week))) %>%
# ggplot(aes(x= hr, y = week)) +
# geom_density_ridges(scale = 2.5, alpha = 0.7) +
# xlim(30, 194) +
# theme_ridges()
2.3 Link steps/minute to bpm
*idea here is that when fitter heart rate is lower for a fixed amount of steps/min than when less fit*