ECS 116 Databases for Non-Majors / Data Management for Data Science
Programming Assignment 1
A. Prelude
1. The assignment is of 25 points.
2. Last date of submission is April 28, Sunday @ 11:59 pm.
3. Late submissions will be graded according to the late policy. Specifically, 10% of grade is deducted if you are
up to 24 hours late, 20% is deducted if you are 24 to 48 hours late, and no credit if turned in after 48 hours.
4. This assignment will be solo.
5. Create a new sql file for each step namely (Step 2, Step 3, Step 4) if you have to use sql commands through
6. Your assignment will be graded based on correctness (passing all tests), ingenuity and originality.
7. All the required files (csv) can be found under Files in Canvas.
8. Plagiarism is strictly prohibited. You’re free to discuss high-level concepts amongst your peers. However,
cheating will result in no points on the assignment and reporting to OSSJA.
B. Step 1: Uploading africa fs after cleaning db.csv into PostgreSQL
1. In DBeaver create a new database faostat. Set that as the default database
2. Create a schema food sec (or “food sec v01”) in your database faostat. Set that as default schema.
3. Do set search path to food sec;
4. Load the file africa fs after cleaning db.csv into the schema food sec to make table africa fs ac.
5. Modify the data types of some of the columns of africa fs ac as follows:
area code m49: varchar(3)
element code: varchar(4)
year code: varchar(8)
value: numeric
After making these changes, click on “Save” at bottom of pane.
6. Check whether the values for value column have been imported correctly.
Do a selection query to get distinct values that are ≤ 2.
Using Excel see what are the values ≤ 2.
Do these match?
7. Do an SQL query to DELETE all tuples from africa fs ac (it will ask you to confirm that you want to do this
8. Use DBeaver to import the file africa fs after cleaning db.csv (don’t use the SQL “COPY” command
because it complains about a data type encoding issue).
Do a sanity check that the number of tuples in your table is same as in csv file.
Again check on the values in column value.
C. Step 2: Build Table gdp stunting overweight anemia
1. Similar to the construction of gdp stunting overweight shown in the 2024-04-09 lecture and the SQL script
faostat-part 02-transforming africa fs.sql, use DBeaver and SQL commands to build a table
gdp stunting overweight anemia which has, for each country-year pair the following associated values for:
GDP per capita Purchasing Power Parity (22013): use column name gdp p ppp.
Percentage of children over 5 years of age who are stunted (21025): . childhood stunting
Percentage of children over 5 years of age who are overweight (21043): childhood overweight.
Prevalence of anemia among women of reproductive age: anemia
2. Add this table into your schema food sec.
Figure 1: Almost correct example of the table gdp stunting overweight anemia. Your table should have 3 characters
for area code m49 column, and may have some decimal values for the last 4 columns.
D. Step 3: Build table energy undernourished
1. Note that many records in africa fs ac have year and year code values based on 3-year intervals rather than
single years. We will use some of this data to gain more insight about countries. In particular, we will interpret
a 3-year interval as applying to the year in the middle, e.g., we will interpret 2000-2002 as applying to the year
2. First, build a table energy undernourished which has, for each country-year code pair the associated values
Average dietary energy supply adequacy (21010): use column name dietary energy.
Prevalence of undernourishment (210041): use column name undernourished.
Note: this table should have 1040 rows in it.
3. Now add a column derived year to the table energy undernourished, where for each tuple, the derived year
value is computed by using the year in the middle of the first and third years in the year code of the tuple.
4. The column you added probably has data type integer. Convert this to varchar(4).
Figure 2: Almost correct example of the table energy undernourished. As with Figure 1, the area code m49 column
should have 3 characters, and the values for last 3 columns may have decimal values.
E. Step 4: Joining the gdp stunting overweight anemia and
energy undernourished tables to create new
table gdp energy with fs indicators
1. Create a selection query that combines the table gdp stunting overweight anemia and energy undernourished
to form a new table gdp energy with fs indicators
The columns should include area code m49, area, year code, gdp pc ppp, dietary energy, childhood stunting,
childhood overweight, anemia and undernourished.
Tuples in this table should be formed by combining tuples from gdp stunting overweight anemia and
energy undernourished where year code from the first table equals derived year of the second table.
Note: your table should have 895 tuples in it.
2. Export the table gdp energy with fs indicators as a csv file gdp energy with fs indicators.csv.
3. Sort this csv file by area (country name) and then year code.
4. CONGRATULATIONS: you have created a table that we can use later to determine whether there are sta?tistical correlations between gdp per capita and/or stunting, childhood overweight, anemia in women and/or
Figure 3: Example of the table gdp energy with fs indicators
5. Create a new table gdp energy fs aggs.
Which has columns as:
– area code m49
– area
– avg gdp pc ppp
– avg dietary energy
– avg childhood stunting
– avg childhood overweight
– avg anemia
– avg undernourished
The “avg” columns should hold the averages of the corresponding items for each country, over all of the
years of available data.
Use the round operator on the “avg” value, so that they have type numeric and are rounded to 2 decimal
points. Use the following kind of expression: round(< expression for average >::numeric, 2).
6. Export the table gdp energy fs aggs as a csv file gdp energy fs aggs.csv.
7. Sort the csv file by area (i.e., country name).
Figure 4: Almost correct example of the table gdp energy fs aggs. You will obtain slightly different values. This
table was computed with rounded values for various columns, rather than with values having decimals.
F. Submission
1. Please make a single zip file that includes
gdp energy with fs indicators.csv
gdp energy fs aggs.csv
The DBeaver sql scripts that you used to create these 2 csv files, specifically, Step 2.sql, Step 3.sql,
Step 4.sql).
Name the zip file as FirstName LastName LastFourDigitsOfStudentID ECS116 A1.
2. Upload it on Canvas for Assignment 1 (This is a solo assignment so don’t add your peers to your submission).
