Part 1.2 Real Estate Market Valuation Project
In TAB 2 of the Google Sheet, you will generate a simple random sample of 300 data values from the data set to be used for the remainder of this assignment.
Open your copy of Property Prices – Part 1 Data Template.
Copy the original data set provided for you (TAB 1) and paste that entire data set into TAB 2. To do this,
Select inside the blank box in the top left corner of screen. This will select all the cells.
Then, right-select to copy the data.
Paste the data by selecting “Paste” in cell A1 in TAB 2.
Once the random sample of 300 rows is generated, sort the data in TAB 2 by TYPE, since for some parts of this assignment you will be working with only homes or only townhomes or only units. To do this,
Select Column B
Then select “Data” from the menu followed by “Sort Range”
>It is imperative that you sort the sheet by the range as opposed to sorting the column by just the range, as you need to make sure that the entire row is sorted, not just the column.
Then select “Data” as a header row checkbox to protect your header row from being sorted. Note that, if done correctly, you will notice the first number of rows correspond to data on HOME prices (h), the next number of rows correspond to data on TOWNHOME prices (t), and the last number of rows corresponds to data on UNIT prices (u).
Now that the rows of data in TAB 2 are sorted by TYPE, copy the rows corresponding only to HOME prices in TAB 2 and paste the rows in TAB 3.
Copy the rows in TAB 2 corresponding only to TOWNHOME prices and paste the rows in TAB 4.
Copy the rows in TAB 2 corresponding only to UNIT prices and paste the rows in TAB 5.
Then post a link to your Google Sheet or upload an excel version for grading.
Part 1.3
Use appropriate graphical displays, measures of central tendency, and measures of dispersion to summarize the following variables, TYPE, PRICE, DISTANCE, from your generated random sample of the Property Prices dataset.
TYPE, which records the property type (home, townhome, or unit). Since TYPE is a categorical variable, a pie chart will be used to model the data. Copy the TYPE column from your random sample in TAB 2 and paste the data into Column A of TAB 6: Graphical Display (Type). Select the data. Then, select “Insert” and then “Chart”.
A pie graph will automatically be generated from this data. Use the pie graph to verify or disprove that your randomly generated sample seems to be representative of the original data set. Given that, of the 2,355 properties in the original data set, 1,460 are homes, 282 are townhomes, and 613 are units, compare the percentage of each property type in your sample to the percentage of each property type in the original data set. Describe your results.
PRICE, which records the property price (in USD) – This task is two parts.
First, you will consider all of the property prices from your random sample. Select the PRICE column from TAB 2 (random sample), and copy the prices into Column A in TAB 7. Note that measures of central tendency, spread, and dispersion will automatically be generated. Create a histogram of the property prices by selecting “Insert” from the top menu, and then “chart.” A histogram is automatically generated. If a histogram is not automatically generated, open the “chart-type” drop down menu and select “histogram”. Google Sheets will automatically set the class width of the histogram. To change the class width, select Customize → Histogram → Bucket Size. Change the Bucket Size to 100,000. Note that 100,000 is not an option in the drop-down menu for Bucket Size, but you can type 100,000 in place of “auto.” To edit chart titles, select Customize → Chart & Axis Titles.
You will also create a boxplot of all of the property prices. To create a box plot, use the link provided in TAB 7 that says “BOX PLOT”. Copy the numerical data from column A in TAB 7 (do not include the column heading). Select the link. Select “sample” instead of population. Then, paste the data in the DATA box. You do not need to separate numbers by commas. Then, select “SUBMIT”. A box plot of your data will be generated. Take a screenshot of the box plot for your Google Slides.
Identify both the measures of central tendency and the measures of dispersion (standard deviation, IQR). Describe the trend in data, and highlight any information from the histogram or box plots that you deem important.
Second, you will consider home prices, townhome prices, and unit prices separately. For home prices, copy the prices data from Column C in TAB 3 and paste in Column A of TAB 8 – Graphical Display (HOMES only). For townhome prices, copy the prices data from Column C in TAB 4 and paste in Column A of TAB 9 – Graphical Display (TOWNHOMES only). For unit prices, copy the prices data from Column C in TAB 5 and paste in Column A of TAB 10 – Graphical Display (UNITS only). As in the previous part, create both a frequency histogram and a box plot to model each property type separately. For the histogram, use the same class width as the histogram for all property types combined. Identify both the measures of central tendency and the measures of dispersion (standard deviation, IQR). Describe the trend in data, and highlight any information from the histogram or box plots that you deem important.
Where specified in your Google Slides file, comment on the strengths and weaknesses of using a frequency histogram vs a box plot to model the property prices data. For what situations might a histogram be appropriate? For what situations might a boxplot be appropriate? Compare the data from the three different property types, identifying similarities and differences. Why might it be important in this case to look separately at property prices based on type instead of merely looking at all of the property prices lumped together, as in Part 1? For what specific situations might descriptive statistics be helpful in representing property prices?
DISTANCE, which records the distance (in miles) of the suburb from the central business district (CBD).
For this task, you will consider the distance of all of the properties from the Central Business District (CBD). Select the “DISTANCE” column from TAB 2 (random sample), and copy the distances into Column A in TAB 11. Note that measures of central tendency, spread, and dispersion will automatically be generated. Create a histogram of the distances by selecting “Data” and then “Chart.” A histogram is automatically generated. If a histogram is not automatically generated, open the “Chart-type” drop-down menu and select “histogram”. Google Sheets will automatically set the class width of the histogram. To change the class width, select Customize → Histogram → Bucket Size. Change the Bucket Size to 1. To edit chart titles, select Customize → Chart & Axis Titles.
You will also create a boxplot of all of the distances. To create a box plot, use the link provided in TAB 11 that says “BOX PLOT”. Copy the numerical data from column A in TAB 11 (do not include the column heading). Select the link. Select “sample” instead of population. Then, paste the data in the DATA box. You do not need to separate numbers by commas. Then, select “SUBMIT”. A box plot of your data will be generated. Take a screenshot of the box plot for your Google Slides.
Include both your histogram and box plot in your Google Slides Presentation. Identify the measures of central tendency and the measures of dispersion (standard deviation, IQR).
Describe the trend in data, and highlight any information from the histogram or box plots that you deem important. For what situations might it be handy to use descriptive statistics to describe the distance of a property from the CBD (downtown)?
Submit a written summary for TYPE and PRICE, and the updated Presentation slides for grading.
Part 1.4
You will specifically work with the variables PRICE and DISTANCE, looking at homes, townhomes, and units separately.
You will thoroughly assess the assumptions of your particular analysis, and include relevant graphs and tables to support your answers.
Copy the DISTANCE column from TAB 3 (random sample of homes only) and paste in column A of TAB 12 (regression – home). Similarly, copy the PRICE column from TAB 3 and paste in column B of TAB 12. Note that a regression analysis will automatically be generated for you.
Create a scatter plot of the data to compare the distance from CBD to the home price, and graph the trend line of the data. Include the scatter plot, equation of your trend line, and your r value in your Google Slides. Then, analyze your findings based on what you learned about regression in class.
Copy the DISTANCE column from TAB 4 (random sample of townhomes only) and paste in column A of TAB 13 (regression – townhome). Similarly, copy the PRICE column from TAB 4 and paste in column B of TAB 13. Note that a regression analysis will automatically be generated for you.
Create a scatter plot of the data to compare distance from CBD to home price, and graph the trend line of the data. Include the scatter plot, equation of your trend line, and your r value in your Google Slides. Then, analyze your findings based on what you learned about regression in class.
Copy the DISTANCE column from TAB 5 (random sample of units only) and paste in column A of TAB 14 (regression – unit).
Similarly, copy the PRICE column from TAB 5 and paste in column B of TAB 14.Create a scatter plot of the data to compare distance from CBD to home price, and graph the trend line of the data. Include the scatter plot, equation of your trend line, and your r value in your Google Slides.
Then, analyze your findings based on what you learned about regression in class.
Is there a significant association between the property price and distance from the CBD based on your analyses? Are your results expected or surprising? What are some considerations in addressing this question?
Choose another numeric variable from your random sample to compare with the home PRICE to determine if there is an association between that variable and the price of a home. Copy the column of your chosen numeric variable from TAB 3 (random sample of homes only) and paste it in column A of TAB 15 (regression – $ vs student choice 1). Similarly, copy the PRICE column from TAB 3 and paste it in column B of TAB 15.
Be sure to thoroughly assess the assumptions of your particular analysis, and be sure to include relevant information to support your answers.
Choose one last numeric variable from your random sample to compare with the home PRICE to determine if there is an association between that variable and the price of a home. Copy the column of your chosen numeric variable from TAB 3 (random sample of homes only) and paste in column A of TAB 16 (regression – $ vs student choice 1).
Similarly, copy the PRICE column from TAB 3 and paste in column B of TAB 16. Be sure to thoroughly assess the assumptions of your particular analysis, and be sure to include relevant information to support your answers.