Running Your First Program — Assignment 2

Ganiyah Issa-Onilu
4 min readOct 20, 2020

This is the blog entry for the solution to the week 2 assignment of the Data Management and Visualization Course offered through Coursera.

The assignment is to show the Python or SAS program written including the outputs that display the frequency tables of three variables selected in the dataset chosen for week 1 assignment and also the explanation of the frequency tables describing the frequency distributions in terms of the values the variables take, how often they take them, the presence of missing data, etc.

Python Program for Frequency Tables

I used Python for the assignment and the screenshots of code snippets in the program are below. The entire program can be found on GitHub through this link.

Program screenshot

The program above outputs the total number of rows and columns in the Gapminder dataset. There are 213 rows and 16 columns in the dataset. The three variables selected for the assignment are:

  1. oilperperson
  2. co2emissions
  3. relectricperperson
Program for frequency tables

The three variables are continuous variables, therefore, the range of the values is high. Hence, the values for each variable selected were divided into 5 bins to identify the frequency distribution of the variables’ values across the bins.

Outputs of Program and Explanation

Frequency table for ‘oilperperson’ variable

The frequency table for ‘oilperperson’ variable above shows that there was a total of 63 values for the variable. Out of the 63 values, 55 values fall within the first bin (0.0191-2.472), 6 values fall within the second bin (2.472-4.911), 1 value falls within the third bin (4.911-7.35), no value falls within the fourth bin (7.35-9.789), and 1 value falls within the last bin (9.789–12.229). The table also shows that about 25.82% of the values fall within the first bin (0.0191–2.472), 2.82% of the values fall within the second bin (2.472–4.911), 0.47% of the values fall within the third bin (4.911–7.35), 0% of the values fall within the fourth bin (7.35–9.789), and 0.47% of the values fall within the last bin (9.789–12.229). There were 150 missing values in ‘oilperperson’ column because there were 213 observations and the column contained only 63 values.

Frequency table for ‘co2emissions’ variable

The frequency table for ‘co2emissions’ above shows that a total of 200 values was recorded and there were 13 missing values in the ‘co2emissions’ variable because there were 213 rows in the dataset. Out of the 200 values, 197 values fall within the first bin (-3333867868.001–66800105600.0), 2 values fall within the second bin (66800105600.0–133600079200.0), no values fall within the third and fourth bin, and 1 value falls within the last bin (267200026400.0–334000000000.0). Also, 92.49% of the values fall within the first bin, 0.94% of the values fall within the second bin, 0% of the values fall within the third and fourth bin, and 0.47% of the values fall within the last bin.

Frequency table for ‘relectricperperson’ variable

The frequency table for ‘relectricperperson’ variable above shows that a total of 136 values were recorded and there were 77 missing values in the ‘relectricperperson’ variable because there were 213 rows in the dataset. 117 values fall within the first bin, 12 values fall within the second bin, 3 values fall within the third bin, 3 values fall within the fourth bin, and 1 value fall within last bin. Also, 54.9% of the values in the variable fall within the first bin, 5.63% of the values fall within the second bin, 14.1% of the values fall within the third bin, 14.1% of the values fall within the fourth bin, and 0.47% of the values fall within the last bin.

This is the end of the solution to the week 2 assignment for the Data Management and Visualization course. In the next post, I would be sharing the solution to the week 3 assignment.

--

--

Ganiyah Issa-Onilu

A Data Scientist and Visual Storyteller with a strong interest in Data Analytics and Business Intelligence.