Due: 2024-10-04 23:59:00EDT
In this assignment, we’ll explore linked lists and more complex custom-designed classes. The complexity of this assignment increases significantly from the previous two. Start EARLY!
In this assignment, we will use the LinkedList
you started
implementing in lab
In this assignment, your driver program should be called
DriverHW02.java
.
In addition, please read the program design principles and code formatting standards carefully. You are expected to adhere to all stated standards. Violations will result in deductions.
Requirement: You can ONLY include the following import statements:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
You are NOT ALLOWED to include any other import statements in your solutions. Check with the professor before you import anything else.
Imagine you are a data scientist working for a social research organization tasked with analyzing trends in baby names over the years in the United States. Your organization aims to provide insights into the evolution of popular names and naming patterns, which can be used for various purposes, such as understanding cultural shifts, predicting future naming trends, and even informing marketing strategies for baby-related products.
To accomplish this, you will build a “Name Trend Analysis Tool” using a LinkedList. The tool should read in a file containing data from the Social Security Administration (SSA), which includes information about baby names given in different years.
Create a class called NameLL
. This will be your LinkedList implementation.
The LL must support all the methods from Lab03 (except insertSortedPop
), including the DoubleLinkedList methods.
You may also use your ExpandableArray
class from Lab02, but it is not necessarily required.
Your LinkedList does not need to enable Generic types.
You should design a Name
class that stores all the relevant stats for a particular
name. Your Node
class should hold data of type Name
. You can design your Name
class any way you like, but be sure to include a Name(String)
constructor and a public String getName()
method for compatability with the autograder.
Write a method called index()
based on the specification below:
/** Returns an integer, indicating the position of the name in your
linked list. The position should be 1-indexed (meaning the first item should be 1).
* state
* @param name - the name to search for
* @return - the index of the name in the LinkedList. If the name is not in the LinkedList, return -1.
*/
public int index(String name)
This method will help verify that the list is sorted alphabetically.
Write a method called yearStats()
based on the specification below:
/** Returns an array of size three. The first value indicates the rank of the name
** that year, the second value indicates the number of babies given that name during that year (for the gender),
** the third value indicates the percentage of babies given that name that year (for that gender)
* state
* @param name - the name to search for
* @param year - the year for the stats
*
* @return array indicating the rank of the name that year, the number of babies given that name that year (for that gender), percentage - the percentage of babies given that name that year (for that gender). If the name is not in the LinkedList, return null.
*/
public double[] yearStats(String name, int year)
Write a method called totalStats()
based on the specification below:
/** Returns an array of size three: for a given gender, the rank of the name, the numebr of babies with that name,
** the percentage of babies with that name
* state
* @param name - the name to search for
*
* @return array of int, int double. If the name is not in the LinkedList, return null.
*/
public double[] totalStats(String name)
The methods described in sections 2.3 and 2.4 will become more clear as you read the rest of the assignment.
You will implement your “Name Trend Analysis Tool” in a java program called
DriverHW02.java
.
Your driver must read in file(s) of yearly baby name statistics
and store the statistics in two linked lists, – one for male names and
one for female names. You will use your implemented LinkedList data structure,
you are not allowed to use Java’s built-in
LinkedList
. The two linked lists should be kept in alphabetically sorted order by
name.
Specifically, the program needs to be able to look up a name and report the following statistics:
For example, for the name “Mary” (female), the following statistics should be printed for all 28 files:
1424
1990
Mary: 35, 8666, 0.005432
1991
Mary: 38, 8760, 0.005596
...
2017
Mary: 126, 2381, 0.001877
Total
Mary: 51, 142630, 0.003630
Note: not all names appear in all years. The popularity of names vary greatly from generation to generation. If a name doesn’t appear in a particular data file, you should skip printing that particular year all together.
The year for specific data can be discerned by the filename.
Your program should take in command-line arguments to input zero or more
files names to process.
The flags -m
name -f
name indicate a male name or a female name to
look up, respectively. For example:
java DriverHW02 -f Dianna names1990.csv names2000.csv
will print out the index followed by the rank, number and percentages of the
female name Dianna used in 1990, 2000 as well as the combined statistics of these
two years, as shown below. 398 is the alphabetical index order of the name Dianna
from these two years.
398
1990
Dianna: 594, 384, 0.000241
2000
Dianna: 847, 262, 0.000182
Total
Dianna: 675, 646, 0.000213
Your tool must allow a user to search for as many names as they would like. Print the statistics for each name in the order they were input. When the user adds a name to search for when running the program in the command line, they will add a -f or -m to indicate the gender they are searching for.
For example,
java DriverHW02 -f Dianna -m Adam names1990.csv names2000.csv
or
java DriverHW02 -f Dianna -f Aline names1990.csv names2000.csv
are both valid inputs. You may assume that the list of filenames is always last after the names, i.e. the first non-flag argument you encounter is assumed to be the beginning of the list of file names.
Make sure you error-check your arguments thoroughly, i.e. illegal/badly-formated options, non-existent options, etc. Remember the order of flags should not matter. Your program should behave rationally no matter how unreasonable the input or the value of flags. Simply reporting error and quitting are completely acceptable for bad input. Throwing uncaught exceptions or giving errorneous output are not. Try at the Linux commandline to see examples of flag and error handling -just experiment with say ls and flags.
If the user enters a name that is not present in the input files print “That name isn’t in our data.”
Remember that the wildcard * expansion will let you specify multiple file names that fit a certain pattern easily, for example:
java DriverHW02 -f Dianna names200*
will be automatically expanded to:
java DriverHW02 -f Dianna names2000.csv names2001.csv names2002.csv
names2003.csv names2004.csv names2005.csv names2006.csv names2007.csv
names2008.csv names2009.csv
by the shell for you. Using this feature to test will reduce tedious typing.
We’ll be taking input from files containing lines in the following format:
rank,male-name,male-number,female-name,female-number
where the comma-separated fields have the following meanings:
rank
: the ranking of the names in this filemale-name
: a male name of this rankmale-number
: number of males with this namefemale-name
: a female name of this rankfemale-number
: number females with this nameThis is the format of database files obtained from the U.S. Social Secutiry Administration of the top 1000 registered baby names. Each line begins with a rank, followed by the male name at that rank, followed by the number of males with that name, etc. Here is an example showing data from the year 2002:
1,Jacob,30568,Emily,24463
2,Michael,28246,Madison,21773
3,Joshua,25986,Hannah,18819
4,Matthew,25151,Emma,16538
5,Ethan,22108,Alexis,15636
6,Andrew,22017,Ashley,15342
7,Joseph,21891,Abigail,15297
8,Christopher,21681,Sarah,14758
9,Nicholas,21389,Samantha,14662
10,Daniel,21315,Olivia,14630
...
996,Ean,157,Johana,221
997,Jovanni,157,Juana,221
998,Alton,156,Juanita,221
999,Gerard,156,Katerina,221
1000,Keandre,156,Amiya,220
As you can see from the above, in 2002, there were 30,568 male babies named Jacob and 24,463 female babies named Emily, making them the most popular names used in that year. Similarly, going down the list, we see that there were 220 newborn females named Amiya, making it the 1000th most popular female baby name.
The entire data set contains a file for each year from 1990 to 2017, named
names1990.csv
, … , names2017.csv
respectively.
You can download the csv files with the following command:
wget https://github.com/BMC-CS-151/BMC-CS-151.github.io/raw/main/hws/hw02/hw2_data.zip
Note that these csv
files, unlike the previous hw, do not contain column headers.
You cannot assume that the files will be in the same directory as your Driver file.
java DriverHW02 -f Dianna ../../names/names1990.csv
should work equivalently to
java DriverHW02 -f Dianna names1990.csv
(so long as names/names1990.csv also exists two directories up from DriverHW02).
Computing the yearly percentages, as well as total number and total percentage require additional auxiliary data structures besides the linked lists. Consider what you need and decide where and how to store the information carefully.
Calculating total rank is non-trivial. Think through your data structure and al- gorithm needs before you start. You should write the program so that it makes
the best use of available storage. Resist the urge to store redundant information in many different places.
Suggested steps below. We suggest you follow the steps below when implementing your solutions. Note that for each step, you should test with one input file, then multiple input files (Just print partial contents of your linked lists), before moving onto the next one. You can hardcode the filenames in the early steps. This will get fixed once you have step 3 and beyond working.
Read the files into two lists of unique names in sorted order. (your Name
class
needs to have only a String
, for now). If you are having trouble debugging
the sorted order, we suggest creating a smaller input-file (by keeping only the
first 10 or 20 names) and using that instead.
Hint: you can use the
head
command in terminal to get the first 10 lines from a file.head -n 10 names2003.csv
will show the first 10 lines fromnames2003.csv
.
Name
class to provide storage for yearly number and rank.
Modify your file-reading code to create and insert (in sorted order) a new
Name
object if it’s not already in the list, or update it with the given yearly
stats if it is.Compute all the necessary totals to enable yearly percentage reporting and storing them in reasonable data structures.
Compute additional totals to enable total number and total percentage re- porting
Start by testing manually. Select a few names and input files combinations to spot check.
Compare your output with the two samples you see in the data we provided. The output for Mary isn’t shown for every year, but if the three numbers for the grand totals match, then you are probably in safe hands.
In a text file called README.txt
answer the following questions:
Before submitting to the autograder, make sure you conform to the following format:
throws
to your method signatures.System.exit
callsSubmit the following files to the assignment called HW02
on Gradescope:
DriverHW02.java
NameLL.java
Name.java
ExpandableArray.java
README.txt
Make sure to name these files exactly what we specify here. Otherwise,
our autograders might not work and we might have to take points off.
DO NOT submit a .docx or .pdf file.
DO NOT submit any .class
files or any of the csv
files.
You will likely want to submit the files from your own laptop/computer.
To copy the files from goldengate
to your own machine,
open up a terminal on your own computer. If you have a mac or linux, you can use the
Terminal application, if you have a windows you can use the Powershell app.
Run the following command:
$ scp <username>@goldengate.cs.brynmawr.edu/home/<username>/cs151/homeworks/hw00/*
Make sure to replace <username>
with your username. scp
is a command to secure-copy files
from one server to another. In this case, from goldengate
to your own machine.