Formatting dirty data

Question

Suppose you have the following dataset which contains which contains (1st tab) a list of items purchased by a given user, (2nd tab) a mapping which maps the item_id to the item name and price.

Can you format the data into a matrix with users in rows and the items they purchased into columns along with the frequency of the purchase for each type of item?

For example, if we have a user with the following row:

user_id ids
12345 1, 4, 4, 3, 5, 5, 5

We would want the output to look like the following:

user_id 1 2 3 4 5
12345 1 0 2 2 3

Solution

Access restricted

Subscribe to premium account to see the solution.

Get premium now