Skip to main content

Homework 2: LinkedList - Baby Names

Due: 2024-02-18 23:59:00EDT

Overview

In this assignment, we’ll explore linked lists and more complex custom-designed classes. The complexity of this assignment increases significantly from the previous two. Start EARLY!

In this assignment, we will use the LinkedList you started implementing in lab

In this assignment, your driver program should be called DriverHW02.java.

In addition, please read the program design principles and code formatting standards carefully. You are expected to adhere to all stated standards. Violations will result in deductions.

Requirement: You can ONLY include the following import statements:

import java.io.File;

import java.io.FileNotFoundException;

import java.util.Scanner;

You are NOT ALLOWED to include any other import statements in your solutions. That means you cannot use any built-in java data structures.

1. Objectives

Imagine you are a data scientist working for a social research organization tasked with analyzing trends in baby names over the years in the United States. Your organization aims to provide insights into the evolution of popular names and naming patterns, which can be used for various purposes, such as understanding cultural shifts, predicting future naming trends, and even informing marketing strategies for baby-related products.

To accomplish this, you will build a “Name Trend Analysis Tool” using a LinkedList. The tool should read in a file containing data from the Social Security Administration (SSA), which includes information about baby names given in different years.

2. Functionality

Create a class called NameLL. This will be your LinkedList implementation. The LL must support all the methods from Lab03 (except insertSortedPop), including the DoubleLinkedList methods. You may also use your ExpandableArray class from Lab02, but it is not necessarily required.

2.1 NameNodes

Your LinkedList does not need to enable Generic types. You should design a Name class that stores all the relevant stats for a particular name. Although generic linked lists have many advantages, your code will be simplified with non-generic linked lists that are locked to the Name class. This is acceptable. If however, you choose to implement generic linked lists from scratch instead, you will be awarded extra credit. Make sure you state this clearly in your README so that the TAs are aware.

2.2

Write a method called index() based on the specification below:

/** Returns an integer, indicating the position of the name in your
linked list. The position should be 1-indexed (meaning the first item should be 1).
* state
* @param name - the name to search for
* @return - the index of the name in the LinkedList
*/
public int index(String name)

This method will help verify that the list is sorted alphabetically.

2.3

Write a method called yearStats() based on the specification below:

/** Returns an array of size three. The first value indicates the rank of the name
** that year, the second value indicates the number of babies gien that during during that year (for the gender),
** the third value indicates the percentage of babies given that name that year (for that gender)
* state
* @param name - the name to search for
* @param year - the year for the stats
* @param gender - the gender for the stats
*
* @return array indicating the rank of the name that year, the number of babies given that name that year (for that gender), percentage - the percentage of babies given that name that year (for that gender)
*/
public double[] yearStats(String name, int year, char gender)

2.4

Write a method called totalStats() based on the specification below:

/** Returns an array of size three: for a given gender, the rank of the name, the numebr of babies with that name,
**  the percentage of babies with that name
* state
* @param name - the name to search for
* @param gender - the gender for the stats
*
* @return array of int, int double
*/
public double[] totalStats(String name, char gender)

The methods described in sections 2.3 and 2.4 will become more clear as you read the rest of the assignment.

3. DriverHW02.java

You will implement your “Name Trend Analysis Tool” in a java program called DriverHW02.java.

Your driver must read in file(s) of yearly baby name statistics and store the statistics in two linked lists, – one for male names and one for female names. You will use your implemented LinkedList data structure, you are not allowed to use Java’s built-in LinkedList. The two linked lists should be kept in alphabetically sorted order by name.

Specifically, the program needs to be able to look up a name and report the following statistics:

  1. Linked list index - just an integer indicating the position of the name in your linked list so that we can verify your list is sorted alphabetically.
  2. For each year:
    • rank - the rank of the name that year (for that gender)
    • number - the number of babies given that name that year (for that gender)
    • percentage - the percentage of babies given that name that year (for that gender)
  3. Total:
    • rank - the rank of the name among all years (for that gender)
    • number - the number of babies given that name among all years (for that gender)
    • percentage - the percentage of babies given that name among all years (for that gender)

For example, for the name “Mary” (female), the following statistics should be printed for all 28 files:

1424

1990
Mary: 35, 8666, 0.005432

1991
Mary: 38, 8760, 0.005596

...

2017
Mary: 126, 2381, 0.001877

Total
Mary: 51, 142630, 0.003630

Note: not all names appear in all years. The popularity of names vary greatly from generation to generation. If a name doesn’t appear in a particular data file, you should skip printing that particular year all together.

The year for specific data can be discerned by the filename.

3.1 Command Line Arguments

Your program should take in command-line arguments to input zero or more files names to process. The flags -m name -f name indicate a male name or a female name to look up, respectively. For example:

java DriverHW02 -f Dianna names1990.csv names2000.csv

will print out the index followed by the rank, number and percentages of the female name Dianna used in 1990, 2000 as well as the combined statistics of these two years, as shown below. 398 is the alphabetical index order of the name Dianna from these two years.

398

1990
Dianna: 594, 384, 0.000241

2000
Dianna: 847, 262, 0.000182

Total
Dianna: 675, 646, 0.000213

3.2 Searching for more than one name

Your tool must allow a user to search for as many names as they would like. When the user adds a name to search for when running the program in the command line, they will add a -f or -m to indicate the gender they are searching for.

For example,

java DriverHW02 -f Dianna -m Adam names1990.csv names2000.csv

or

java DriverHW02 -f Dianna -f Aline names1990.csv names2000.csv 

are both valid inputs. You may assume that the list of filenames is always last after the names, i.e. the first non-flag argument you encounter is assumed to be the beginning of the list of file names.

Make sure you error-check your arguments thoroughly, i.e. illegal/badly-formated options, non-existent options, etc. Remember the order of flags should not matter. Your program should behave rationally no matter how unreasonable the input or the value of flags. Simply reporting error and quitting are completely acceptable for bad input. Throwing uncaught exceptions or giving errorneous output are not. Try at the Linux commandline to see examples of flag and error handling -just experiment with say ls and flags.

If the user enters a name that is not present in the input files print “That name isn’t in our data.”

3.3 Multiple files

Remember that the wildcard * expansion will let you specify multiple file names that fit a certain pattern easily, for example:

java DriverHW02 -f Dianna names200*

will be automatically expanded to:

java DriverHW02 -f Dianna names2000.csv names2001.csv names2002.csv
names2003.csv names2004.csv names2005.csv names2006.csv names2007.csv
names2008.csv names2009.csv

by the shell for you. Using this feature to test will reduce tedious typing.

4. Input File Format

We’ll be taking input from files containing lines in the following format:

rank,male-name,male-number,female-name,female-number

where the comma-separated fields have the following meanings:

This is the format of database files obtained from the U.S. Social Secutiry Administration of the top 1000 registered baby names. Each line begins with a rank, followed by the male name at that rank, followed by the number of males with that name, etc. Here is an example showing data from the year 2002:

1,Jacob,30568,Emily,24463
2,Michael,28246,Madison,21773
3,Joshua,25986,Hannah,18819
4,Matthew,25151,Emma,16538
5,Ethan,22108,Alexis,15636
6,Andrew,22017,Ashley,15342
7,Joseph,21891,Abigail,15297
8,Christopher,21681,Sarah,14758
9,Nicholas,21389,Samantha,14662
10,Daniel,21315,Olivia,14630
...
996,Ean,157,Johana,221
997,Jovanni,157,Juana,221
998,Alton,156,Juanita,221
999,Gerard,156,Katerina,221
1000,Keandre,156,Amiya,220

As you can see from the above, in 2002, there were 30,568 male babies named Jacob and 24,463 female babies named Emily, making them the most popular names used in that year. Similarly, going down the list, we see that there were 220 newborn females named Amiya, making it the 1000th most popular female baby name.

The entire data set contains a file for each year from 1990 to 2017, named names1990.csv, … , names2017.csv respectively.

You can download the csv files with the following command: wget https://github.com/BMC-CS-151/BMC-CS-151.github.io/raw/main/hws/hw02/hw2_data.zip Note that these csv files, unlike the previous hw, do not contain column headers.

5. Design Notes, Recommendations, & Testing

Computing the yearly percentages, as well as total number and total percentage require additional auxiliary data structures besides the linked lists. Consider what you need and decide where and how to store the information carefully.

Calculating total rank is non-trivial. Think through your data structure and al- gorithm needs before you start. You should write the program so that it makes

the best use of available storage. Resist the urge to store redundant information in many different places.

Suggested steps below. We suggest you follow the steps below when implementing your solutions. Note that for each step, you should test with one input file, then multiple input files (Just print partial contents of your linked lists), before moving onto the next one. You can hardcode the filenames in the early steps. This will get fixed once you have step 3 and beyond working.

  1. Read the files into two lists of unique names in sorted order. (your Name class needs to have only a String, for now). If you are having trouble debugging the sorted order, we suggest creating a smaller input-file (by keeping only the first 10 or 20 names) and using that instead.

    Hint: you can use the head command in terminal to get the first 10 lines from a file. head -n 10 names2003.csv will show the first 10 lines from names2003.csv.

  2. Expand your Name class to provide storage for yearly number and rank. Modify your file-reading code to create and insert (in sorted order) a new Name object if it’s not already in the list, or update it with the given yearly stats if it is.
  3. Enable single name lookup on a single file (via commandline arguments)
  4. Enable single name lookup on multiple (or all) files
  5. Compute all the necessary totals to enable yearly percentage reporting and storing them in reasonable data structures.

  6. Compute additional totals to enable total number and total percentage re- porting

  7. Design an algorithm to compute total rank
  8. Enable multiple name lookup on multiple files

5.1 Testing

README.txt

In a text file called README.txt answer the following questions:

  1. How do you keep the linked lists in alphabetically sorted order?
  2. If you implemented with generic linked lists, please mention it!
  3. How much time did you spend on the homework

Autograder Formatting

Before submitting to the autograder, make sure you conform to the following format:

  1. Do not use any try-catches. Instead add throws to your method signatures.
  2. Do not use System.exit calls

Submitting

Submit the following files to the assignment called HW02 on Gradescope:

  1. DriverHW02.java
  2. NameLL.java
  3. Name.java
  4. ExpandableArray.java
  5. README.txt
  6. Any additional classes you made

Make sure to name these files exactly what we specify here. Otherwise, our autograders might not work and we might have to take points off.
DO NOT submit a .docx or .pdf file.
DO NOT submit any .class files or any of the csv files.

Copying the files

You will likely want to submit the files from your own laptop/computer. To copy the files from goldengate to your own machine, open up a terminal on your own computer. If you have a mac or linux, you can use the Terminal application, if you have a windows you can use the Powershell app.

Run the following command:

$ scp <username>@goldengate.cs.brynmawr.edu/home/<username>/cs151/homeworks/hw00/*

Make sure to replace <username> with your username. scp is a command to secure-copy files from one server to another. In this case, from goldengate to your own machine.