Cracking Amazon's Data Science Interview

Introduction

It can be an intimidating experience to get interviewed for any role, and first expressions play a significant and major role in how an employer perceives you as a candidate. The outcome can be good or bad depending upon what you say during the first phase of the interview. Some hiring managers often decide to reject a candidate that gives a poor first impression. For instance, being punctual and attentive is an important element of professional behaviour; showing up late or checking the phone after short intervals throughout the interview can lead the recruiter to perceive the candidate as having an inability to focus, commit, follow through, and meet deadlines. You have to show that you possess professionalism and strong communication skills apart from the skillset of your domain. So, the best way to ensure success is to get well prepared and know what the company is looking for.

Most employers find it challenging and tough to get through the recruitment process. In the ruthless and cut-throat competitive world of today, the stakes to crack an interview soar high in the sky. Every interview is a new learning experience, even though you've appeared in many interviews. It's because, with each interview, we meet new people, brand, and sell ourselves. We display our skills and stay enthusiastic and upbeat no matter how long it takes. It can be a challenging situation because we will have to answer the baffling questions satisfactorily and reasonably.

There are a wide variety of roles for which candidates apply in different companies. Therefore, they must be aware of the job roles and responsibilities for which they are applying. For example, suppose a candidate desires to apply for a Data Scientist position. In that case, he must know that the employer will ask questions with lots of coding and algorithmic computing elements. These are the fundamental and practical questions for which the candidate must be certainly prepared.

If we talk about big tech companies like Google, Amazon, Facebook, and Apple, then these companies have standard criteria for selecting candidates for any position. For instance, Amazon is considered the world's biggest online retailer and the largest internet company. With many services and products, it is constantly in the market for innovative and motivated data scientists to meet its ever-growing data needs. So if Amazon hires any Data Science folk, it certainly does it with a high bar of excellence. Data Science roles at Amazon are highly difficult and competitive to land, so a great deal of hard work and focus can land us a job in the world's top and well-known customer-centric company.

Amazon's Data Scientist Role

Let's specifically talk about the data scientist role at Amazon, how to get through its interview, and the questions that are likely to be asked in its interview.
We know that Amazon is a magnificent conglomerate corporation with many teams working on various products and services. The job of a data scientist at Amazon depends on the specific team they work in. There are different teams such as Middle Mile Planning Research and Optimization Science (mmPROS) team, AWS (Amazon Web Services) team, the NASCO (North America Supply Chain Organization) team, forecasting team in the Supply Chain Optimization Technologies (SCOT), and many more.

General Requirements

The following are a set of some general requirements to apply for a data scientist role at Amazon:

  • The candidates must develop, evaluate, deploy and update data-driven models and analytical solutions for machine learning (ML) and natural language (NL) applications.
  • They must develop cutting-edge data pipelines, build accurate predictive models, and deploy automated software solutions to provide forecasting insights.
  • They must be able to research, design, and improve models with a business-minded approach.

Categories

There are four broad categories of data scientists at Amazon. Each category allows them to specialize and apply their analytical skills in different service and product areas.

Data Analytics

Data scientists specializing in this domain typically have a focus on creating forecasts, providing informed and business-related insights, and identifying strategic opportunities. In short, they have a major focus on business intelligence. They create dashboards, devise solutions to various business-related challenges, and present data-backed findings to the company stakeholders in an accessible way. Therefore, they need data visualization tools like Tableau, and data warehousing skills are also required for creating forecasts. As the world's top customer-centric company driven by data, wrangling and getting enormous data input to forecast how people will use its services and platforms is the most significant concern for Amazon.

Generalized Data Science

It is the most popular role, and Amazon hires many data science generalists that dive into big data sets for:

  • Building simulations
  • Writing optimization algorithms
  • Building experimentation systems
  • Running algorithms and models to find actionable insights
  • Making meaningful recommendations
  • Offering feedback to the company stakeholders based on their findings

Machine Learning

The role of a machine learning specialist at Amazon usually requires graduate or Ph.D. qualifications in Natural Language Processing (NLP), Deep learning, or Computer Vision. The research and data scientists in this domain mainly focus on cutting edge researches in areas like Deep Learning, NLP, streaming data analysis, video recommendations, and social networks, etc., to assist the company in the development of new algorithmic models that power Amazon's streaming services, Web Services, Alexa and other business parts.

Data Engineering

The Data Engineering team focuses on building products or tools used inside and outside the company. In addition, it builds out data pipelines, and its role significantly overlaps with the Machine Learning engineers.

Cracking Amazon's Data Science Interview:

Now let's focus on how a data aspirant can crack Amazon's interview. Below listed are some of the crucial points that the candidate can follow to fulfill his/her dream of getting through the interview.

  • Master the Programming Language: For a data science interview, one must possess fundamental skills in the programming languages like Python, SQL, and R. Apart from that, the candidate must have a keen knowledge of basic topics like data structures and distributed computing. Python is the most popular language in the IT industry in today's world, so the candidate must have a strong command in Python and problem-solving skills.
  • Algorithmic Implementation of Programming Language: The candidate must know how to implement the programming language like Python/R after mastering it. This way, the candidate can master the language more quickly and understand how to create and deploy complicated machine learning algorithms.
  • Familiarity with Production Deployment Environment: Today, almost all organizations use the cloud as an infrastructure model. We can also say that the cloud is ruling the data science space. Well-known and popular cloud vendors like Amazon Web Service (AWS), Google Cloud Platform (GCP), and Azure have made it pretty easy for data scientists and data engineers to instantly set up a machine learning environment and work without paying heed to the huge mass of generated data.
  • Must get Hands-on Experience: The candidate applying for the data scientist role must have hands-on experience through which they can get to know the practical depth of data science knowledge and develop better skills to understand and solve any given problem scenario. So, the candidate must take up different data science projects and build and develop models that provide him with in-depth knowledge of the domain.
  • Digital Participation: We know that the data science community is growing at a really fast pace on platforms such as Twitter, LinkedIn, Facebook, and many more. So, the candidate can take various initiatives to show his/her active digital participation. For instance, he/she can write LinkedIn posts related to the data science knowledge domain. He/she can also showcase technical skills for the community to get noticed. Also, the candidate can start his/her blog, and he must try to participate in competitions like hackathons and bootcamps. He must also try to contribute to different open-source projects on GitHub. These initiatives will ensure that the candidate is actively participating and contributing to the digital world.
  • Knowledge Showcasing: The candidate must be aware of the latest tools and technologies being used around. This way, he can keep his pace the same as the emerging technologies. He can improve his learning and knowledge-seeking pace by identifying new trends and giving forward-looking opinions to help him get through the interview.
  • Possible Questions: Many questions can be asked from the applicant in a data science interview. For instance, the questions can be like:
    • What do you know about the working of K-means, and what distance metric would you choose?
    • What are the discriminative and generative algorithms? Do you know about their strengths and weaknesses? Which type of algorithms can preferably be used and why?"
    • You have to implement the union and intersection of two arrays efficiently. Note that elements of the two given arrays cannot be repeated in union and intersection array.

We will be discussing several other questions in a separate section.

  • Ask Questions: While giving the interview, the candidate must ask insightful questions to step ahead of other applicants. This way, the interviewer can evaluate his worth by the level of his questions and see how well he/she can fit into the company.
  • Elementary Knowledge of Data Science: One must dive into the basic topics such as statistics, mathematics, programming language, machine learning algorithms, and Business Intelligence core concepts. The candidate must have a strong knowledge base of these subjects to help him evaluate basic problem-solving questions.
  • Visualization Tools: One must have knowledge of popular data visualization tools to get through the interview. Some of the popular data visualization tools are Google charts, Tableau, Qlik, and many more. The candidate must know how to use these tools efficiently to become a good data scientist.

Possible Data Science Questions:

Below listed are the possible questions that can be asked from a data science candidate during their interview:

  • What do you know about Model Overfitting?
  • What is the importance of gradient checking?
  • Can you list some of the assumptions about logistic and linear regression?
  • How can you implement a circular queue using an array?
  • The users perform several actions when they navigate through the Amazon website. What can be the best way to model if their next action is a purchase?
  • How does the logistic regression model know what the coefficients are?
  • How can you differentiate between a convex and a non-convex cost function? What does it imply when a cost function is non-convex?
  • How will the market be affected by a change in the prime membership fee?
  • What will be the problem in an application (that predicts the customer behaviour) for which you built a classification model having 100% accuracy?
  • You are given a 'csv' file with ID and Quantity columns, 30 million records, and the size of data as 3 GBs. You need to write a program in any language of your choice to aggregate the QUANTITY column.
  • How can you differentiate Lasso and Ridge regression?
  • The probability an item will be present at location X is 0.8, and 0.5 at location Y. What is the probability that the item will be found on the Amazon website?
  • How is the random weight assignment better than assigning the same weights to the hidden layer units?
  • Describe Support Vector Machine (SVM), Random forest, and boosting. Also, discuss their pros and cons.
  • What is the criterion for a particular model selection? Also, discuss the importance of dimensionality reduction.
  • What do you know about boosting?
  • How can you modify a table with over a billion rows?
  • Discuss the difference between MAP and MLE inference.

Amazon's Interview Process:

Amazon's interview process consists of three phases, and they are as follows:

  • An initial phone screen by a hiring manager or a recruiter
  • The technical phone screen
  • An onsite interview usually done in five phases Now, we discuss all three stages of Amazon's interview process to have an idea of each of its phases.
  • An Initial Phone Screen: A hiring manager or a recruiter conducts the initial phone interview at Amazon. Similar to the interview process at Facebook and Google, Amazon's first phase of the interview judges a candidate's capability and suitability for the role. The first round is a resume-based phone interview that normally goes over the candidate's resume and the team's position. This round also includes general HR questions related to the candidate's background, experience, and his/her reason for working at Amazon. As Amazon is one of the world's largest organizations, the recruiter explains the function and division of different teams to the candidates to get them know the teams' working.
  • The Technical Screen: If the applicant gets through the first round of the interview, the next round is the technical screen that involves coding challenges, statistical questions, and machine learning problems. The candidate can expect at least two coding questions, one can be an algorithmic coding question, and the other can involve SQL questions. The coding part is done on a shared code editor. The candidate must remember to take the time for getting over his/her thought process with the interviewer because there is another section named "Approach" detailing how the candidate solved the problem and why he/she used those steps (that he/she wrote) to solve that particular problem. On the other hand, the interviewer can ask the candidate general data science and machine learning concepts in a pretty conversational manner to check that he/she has required foundational knowledge to do the job. For instance, the interviewer may ask the candidate "What is the bias-variance trade-off?", "Explain p-value," along with the SQL coding questions.

Example of a Technical Screen Question

There are a wide variety of SQL questions asked during interviews across tech companies.
In general, each SQL interview question can be bucketed into these categories:

  • Basic SQL questions
  • Definition based SQL questions
  • Analytics SQL questions
  • ETL SQL questions
  • Database design questions
  • Logic-based SQL questions
  • Reporting and metrics SQL questions

Let's consider an example of a SQL question that can be asked from a candidate during his/her data science interview.

Consider that you are given two tables, a users table with demographic information and the neighborhoods table that indicates the neighborhood users live in. You have to write a query that returns all neighborhoods having zero users.

Consider the following users table:

columns type
id int
name varchar
neighborhood_id int
created_at datetime

Consider the following neighborhoods table:

columns type
id int
name varchar
city_id int

Strategy to Solve the Question: Whenever the question asks about finding values with zero employees, users, posts, etc., think of the concept of Left Join.

We use an inner join to find any values that are present in both tables. However, a left join keeps only the values in the left table. We have to find all neighborhoods without users. So we must do a left join from the neighborhoods table to the users table for doing this. Afterward, we add a where condition to get every single neighborhood with zero users, as shown in the code below.

SELECT n.name
FROM neighborhoods AS n
LEFT JOIN users AS u
    ON n.id = u.neigborhood_id
WHERE u.id IS NULL
  • The Onsite Interview: After the candidate passes the second technical screen phase of Amazon's interview, the recruiter arranges an onsite interview. In this interview phase, most of the questions are related to machine learning, A/B testing, exploratory data analysis, conceptual data science questions, and other coding problems.

This phase consists of five or six back-to-back interviews that can be either one-on-one or with two people; a junior data scientist and a hiring manager. This onsite interview typically lasts for around six hours. This is what the process of five back-to-back interviews looks like:

  • A behavioral interview for accessing culture-fit
  • A technical interview involving A/B testing and data analysis
  • SQL-based interview with a data scientist or manager
  • Algorithm-based problem-solving questions and optimizations
  • A modeling case study and machine learning interview

Each of the stages mentioned above likely tests the candidate's critical thinking, knowledge related to the leadership principles of Amazon, and problem-solving ability. The hiring manager or the data scientist may ask the candidate to discuss his/her former projects, tell about how they failed badly and explain reasons when they made trade-offs.

Points to Remember

We discussed the important points of cracking Amazon's data science interview. Now, we need to remember the following important points at our fingertips before applying for our desired role.

  • For the data science roles, Amazon cares about technical ability a lot. The candidate must remember to brush up on optimizing queries, memorizing as many machine learning algorithms as possible, and solving algorithms.
  • The candidate must remember fundamental machine learning concepts, modeling, and business case questions. Amazon will likely ask some vague questions in which the candidate will be expected to apply machine learning to a business scenario.
  • The candidate must memorize all the 14 leadership principles as Amazon assesses every applicant through them. He would also be expected to exhibit those principles in the behavioral interview.

Conclusion

We discussed how to crack Amazon's data science interview by showcasing leadership skills, professionalism, good communication, and technical skills. But if you come across a situation during the interview where the recruiter or the hiring manager points out your mistake, do not get shy or afraid from accepting it. You are a human, and a human is a statue of mistakes, so accept your mistake as it will portray you as a mature person open to criticism and open to learning. Being stubborn and arguing around will not help because as much as your technical skills are important, your organizational behaviour and soft skills matter equally when getting hired for a job.