Computing uncertainty estimates for tiger weights
Question
Suppose you're given a dataset containing weights of a tiger, recorded over time. The dataset will contain a date, and a recorded weight (in kgs).
Using this data, write code to compute and visualize uncertainty estimates for the weights recorded across each month. In our solution we'll compute uncertainty using a standard confidence interval formula, as well as using bootstrapping -- note that this question is an example of a more open-ended problem with many potential ways to approach.
To help get you started, below is Python code to generate a randomized (fake) dataset of tiger weights over time, along with an associated scatter plot. You can also view the code / make a copy in this Google Colab notebook.
# Import packages
import pandas as pd
from matplotlib import pyplot
import matplotlib
import numpy as np
import datetime
import random
pyplot.style.use('fivethirtyeight')
matplotlib.rcParams['figure.figsize'] = (15, 9)
def generate_time_series(k=100, m=230, sigma=25, n=100,
start_date=datetime.date(2020, 4, 1)):
# sigma = stdev of weight
# m = mean
# n = # of samples to create
# k = optional scaling factor for xs series created by numpy linespace
# Return evenly spaced numbers from
# 0 to 1 arcross the # of time intervals we want to create (50)
xs = np.linspace(0, 1, n, endpoint=False)
# Generate random, normally distributed #s for each
# of our x points (generates our weight values)
# k*x to add modest upward trend over time
ys = [k*x + m + random.gauss(0, sigma) for x in xs]
# Convert the x values to time
# (adding a day for each value in the interval)
ts = [start_date + datetime.timedelta(x)*365 for x in xs]
return xs, ys, ts
# Grab the outputs from our function above
xs, ys, ts = generate_time_series()
# Build a quick plot to view the output
# x (temporal data), y (weight data), size of dots
pyplot.scatter(ts, ys, s=50)
pyplot.xlabel('Date')
pyplot.ylabel('Weight of tiger (kg)')