Normalizing cholesterol data

Question

Given this dataset on heart disease, write code to normalize the "chol" (cholesterol) column such that the distribution of the data remains the same, but the range is adjusted to be between 0 and 1.

There are built in functions that will normalize data for you, but try to avoid them in this question -- the goal here is to understand how the math underneath works.

To get you started, the code below loads in the dataset. You can also make a copy of this Google Colab notebook.

# Importing packages
import pandas as pd
import numpy as np
# Reading in data
df = pd.read_csv('https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/                  Datasets/heart_disease.csv', parse_dates=True) 
df.head()
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
0 63 1 3 145 233 1 0 150 0 2.3 0 0 1 1
1 37 1 2 130 250 0 1 187 0 3.5 0 0 2 1
2 41 0 1 130 204 0 0 172 0 1.4 2 0 2 1
3 56 1 1 120 236 0 1 178 0 0.8 2 0 2 1
4 57 0 0 120 354 0 1 163 1 0.6 2 0 2 1

Solution

Access restricted

Subscribe to premium account to see the solution.

Get premium now