Normalizing cholesterol data
Question
Given this dataset on heart disease, write code to normalize the "chol" (cholesterol) column such that the distribution of the data remains the same, but the range is adjusted to be between 0 and 1.
There are built in functions that will normalize data for you, but try to avoid them in this question -- the goal here is to understand how the math underneath works.
To get you started, the code below loads in the dataset. You can also make a copy of this Google Colab notebook.
# Importing packages
import pandas as pd
import numpy as np
# Reading in data
df = pd.read_csv('https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/ Datasets/heart_disease.csv', parse_dates=True)
df.head()
age | sex | cp | trestbps | chol | fbs | restecg | thalach | exang | oldpeak | slope | ca | thal | target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 63 | 1 | 3 | 145 | 233 | 1 | 0 | 150 | 0 | 2.3 | 0 | 0 | 1 | 1 |
1 | 37 | 1 | 2 | 130 | 250 | 0 | 1 | 187 | 0 | 3.5 | 0 | 0 | 2 | 1 |
2 | 41 | 0 | 1 | 130 | 204 | 0 | 0 | 172 | 0 | 1.4 | 2 | 0 | 2 | 1 |
3 | 56 | 1 | 1 | 120 | 236 | 0 | 1 | 178 | 0 | 0.8 | 2 | 0 | 2 | 1 |
4 | 57 | 0 | 0 | 120 | 354 | 0 | 1 | 163 | 1 | 0.6 | 2 | 0 | 2 | 1 |