Assignment Instructions 1
Tasks:
1. Pitch for a dataset of your choice from the following data source for global Covid-19 vaccinations: https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations
(Choose vaccinations.csv)
Enrich and explain your initial dataset selection with information from one or two datasets in the following data source: https://www.covid19data.com.au
(Choose datasets related to global Covid vaccinations, vaccinations in Australia and Covid-19 in Australia)
2. Profile the selected data using descriptive and inferential statistics techniques (with Python).
3. Propose one main problem (regarding the statistical analyses of the datasets) to be solved later in the ‘Data science project 2.’
4. Present the key outcomes of Tasks 1 – 3 above in a report of 2000 words.
Step 2 involves Python programming to perform descriptive and inferential statistical analyses of the chosen datasets. This includes measuring the central tendency (e.g. mean, median, mode) and spread (e.g. range, interquartile range, variance, standard deviation etc) in the data as well as computing and plotting (e.g. histograms) summary statistics using Python. provide the Python code in a separate file.
The statistical analyses should be done using Python. There is no need to use other statistical software (e.g. Excel) except for the purpose of affirming the prior analyses with Python.
Make sure to propose one main PROBLEM (regarding the statistical analyses of the data sets) TO BE SOLVED LATER in Data Science project 2.
Using Python, please calculate or apply the following descriptive and inferential statistics techniques to the chosen data sets. Interpret the results where applicable. Please provide the Python code (used to perform the statistical analyses) in a separate file.
Descriptive Statistics techniques:
1) Levels of Measurement • Nominal data • Ordinal data • Interval data
• Ratio data 2) Continuous and discrete data
3) Measures of Central Tendency • Mean • Median • Mode
4) Measures of Dispersion • Range • The Interquartile Range (IQR) • Variance • Standard deviation
5) Distributions of data Histogram
Common types of continuous data distributions • Uniform distribution • Normal distribution • Skewed distributions • Binomial distribution