Count distinct in Pandas aggregation


import pandas as pd
import numpy as np

Create a dataframe

#create a dataframe
df = pd.DataFrame({'date': ['2013-04-01','2013-04-01','2013-04-01','2013-04-02', '2013-04-02'],
'user_id': ['0001', '0001', '0002', '0002', '0002'],
'duration': [30, 15, 20, 15, 30]})
df
date duration user_id
0 2013-04-01 30 0001
1 2013-04-01 15 0001
2 2013-04-01 20 0002
3 2013-04-02 15 0002
4 2013-04-02 30 0002


Count distinct in Pandas aggregation

#here we can count the number of distinct users viewing on a given day
df = df.groupby("date").agg({"duration": np.sum, "user_id": pd.Series.nunique})
df
duration user_id
date
2013-04-01 65 2
2013-04-02 45 1


Ace your next data science interview

Get better at data science interviews by solving a few questions per week



Find a bug? Submit a suggested change on Github, or message me on Twitter.