The second data set includes data about the customers of a telecom company, the services they subscribed to, and whether they left the company’s services. Multiple aspects of the customer’s behaviors could be analyzed using this dataset.
The data and the variable descriptions can be found at:
https://www.kaggle.com/blastchar/telco-customer-churn
Requirements:
Part I: Contextualization
In the Introductory section for Part 1 of your 4-5-page report, provide a contextualization of the dataset by doing the following:
Download the selected data set and review its description. Specify the selected dataset.
Summarize the story behind the data. Provide in-text and reference list citations of any outside sources that you consulted to contextualize the data collection efforts.
Explore the dataset. Explain the attributes of the entities.
Describe the data captured in the data set.
Summarize three observations from a cursory visual inspection of the data.
Part II: Research Questions
When data is collected, or even before collection starts, we have the purpose of the analysis in mind and plan accordingly. For an analysis to provide us with meaningful insight, we must plan it well and have a clear research question or questions in mind. The term “research” can sound scary, but it refers to a path to satisfying curiosity or resolving a problem. For Part II, which should be 2-3 pages, provide an analysis based on the background of the data sets:
Write three questions that this dataset can help answer.
Explain the importance of these questions in a business or other relevant context.
Identify the variables from the data set that are relevant to the questions.
Write a sample statement of outcome that your analysis may reveal.
Underline the words in these statements that refer to the variables identified in task prompt H.
Part III. Data Preparation and Distribution Analysis. Using an R script, please perform the following:
Identify the columns that hold the values of the variables you previously identified in the analysis of the research questions. Write R script to generate a subset of the given data. If the variable is categorical and the values are not numerical, transform the values into numerical values. For example, if the values of a column are ‘Y’ and ‘N’, use SQL to convert them to “0” and “1” respectively when generating the subset. That includes only the values for the selected variables and translate categorical values into numeric codes.
Write an R script for a query that provides insights into your research question.
Part IV. Reporting
Write a 250-word executive summary for members of the general public outlining the findings from your analysis.
Your well-written report should be 10-12 pages in length, not including the title or reference pages. Use Saudi Electronic University academic writing standards and APA style guidelines, citing at least two references in support of your work, in addition to your text and assigned readings.
You will upload a zipped file that includes your 10-12-page report and all supporting files, including Word, R code and Excel files, and screenshots that support the presentation of the findings in the report.
You are strongly encouraged to submit all assignments to the Turnitin Originality Check prior to submitting them to your instructor for grading. If you are unsure how to submit an assignment to the Originality Check tool, review the Turnitin Originality Check Student Guide.