DATA4200 Data Acquisition And Management Assignment
Assessment Title: Sampling And Data Mining Project
Word Count: 1600
Weighting: 30 %
Your Task : Read the Assessment Instructions and complete sections (a) – (e)
Consider the rubric at the end of the assignment for guidance on structure and content.
• LO3: Create analysis-ready data sets by applying and exploring basic validation, preprocessing,
filtering and cleaning techniques
• LO4: Evaluate and apply data mining software
DATA4200 Data Acquisition And Management Assignment
Assessment Description
Business Problem: Airbnb is a U.S. company which provides an online marketplace for short- term and/or holiday accommodation. Airbnb collect large volumes of data to gain insight into their clients and associated customers, such as review scores, host acceptance rate, ‘superhosts’, popular accommodation types and density of listings in particular location.
Data sets: We have obtained data on Airbnb listings in Melbourne with a variety of variables. Sampled datasets, the original data and data dictionary will be available from Week 4. See sections below.
Assessment Instructions
Analysis and Report (30 marks)
Use Microsoft Excel or Power BI or Tableau.
Recall the sampling methods below that you have learnt about in lectures.
A data dictionary file and the following datasets (as .csv files) that contain sample data generated using quota, systematic, simple random, and stratified sampling will be available from week 4, see section c. below. You will also have to access the original population dataset cleansed_listings_dec_18.csv from the source, see section a. and section e. below.
Create a report and include your response to the following questions:
a. Access the data file cleansed_listings_dec_18.csv, by going to the link provided on MyKBS under the Assessment 1 tab. You will initially be downloading a zip folder from the Melbourne Airbnb Open Data project on Kaggle. Extract all the files within the folder and then choose the file cleansed_listings_dec_18.csv. Browse over the columns and comment on which variables appear to be the most useful in terms of insights into current listings. Document that in your report.
b. List an advantage, possible disadvantage and limitations of each of the sampling methods.
c. Access the sampled data sets on MyKBS. Choose a number of different variables, as in part (a), then for each of the sampled datasets create summary statistics for each of those variables. That is, make sure that the selected variables are the same for each of the four datasets and document them in your report.
d. Interpret and compare the results of the summary stats across all four sample datasets. What conclusions can you draw from the comparison. Document your findings in your report.
e. Repeat the above for the original dataset cleansed_listings_dec_18.csv. Explain with statistical examples which sampling method summary stats (across all chosen variables) were nearest in value to the original dataset summary stats.
Explain the variations in your report and include the supporting data. Explain possible ethical
issues that could occur from the use of sampled data.
Briefly evaluate the software that you have used to produce the summaries.
To Be Continuous…
Excellent Assignment Help
We Aim At:
- Lowest Price.
- 100% Uniqueness.
- Assignment Fastest Delivery.