Analyzing Titanic Data
1 Introduction
Sometime ago, a Titanic dataset was released to the general public. This file is given to you as titanic_data.csv. This data is in text format and contains 12 different types of information for each passenger that boarded Titanic.
This data is formatted as shown below:
- Each line of the dataset corresponds to one person.
- For each person the following parameters are given:
- PassengerId: just a counter that starts with 1 and goes to 892
- Survived: 0 = not survived; 1 = survived
- Pclass: passenger class: 1 = first class, 2 = second class, and 3 = third class
- Name: passenger name
- Sex: passenger gender
- Age: passenger age
- SibSp, Parch, Ticket, Fare, Cabin, Embarked: not important for this lab
- The 12 items are separated by comma (csv = comma separated value). As shown in the example below:
PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin,Embarked
1, 0, 3, Braund, Mr. Owen Harris, male, 22,1, 0,A/5, 21171, 7.25, C85, S
2 Analyzing the data
You have been assigned by your manager to do some simple statistics on this data:
- Compute Overall Survival Rate
- Compute Woman Survival Rate
- Compute Man Survival Rate
- Compute Woman Per Class Survival Rate
- Compute Man Per Class Survival Rate
3 Conclusions of your Analysis
Provide your conclusions based on your analysis, such as:
“A passenger would mostly survive if it was a woman in the 1st class…”
“A passenger would mostly not survive if …”
4 Sample Code
If you like challenges, try to code the whole program yourself. You will learn more doing so. If you think that you need some help or got stuck developing it, then get help from your lab TA. If you still are not sure how to do it, look at the sample code shown in the next page. It implements the first bullet of Section 2 above.
Sample Code
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.logging.Level;
import java.util.logging.Logger;
public class TitanicAnalysis {
private int totalNumberOfPassenger = 892;
public static void main(String[] args) {
new TitanicAnalysis();
}
public TitanicAnalysis() {
String[] titanicData = readTitanicDatasetFromFile("titanic_data.csv");
computeOverallSurvivalRate(titanicData);
// computeWomanSurvivalRate(titanicData);
// computeManSurvivalRate(titanicData);
// computeWomanPerClassSurvivalRate(titanicData);
// computeManPerClassSurvivalRate(titanicData);
}
private String[] readTitanicDatasetFromFile(String titanicDatasetFile) {
File titanic = new File(titanicDatasetFile);
String[] titanicRawData = new String[totalNumberOfPassenger];
try {
Scanner titanicData = new Scanner(titanic);
int index = 0;
while(titanicData.hasNext()) {
titanicRawData[index] = titanicData.nextLine();
System.out.println(titanicRawData[index]);
index++;
}
return titanicRawData;
} catch (FileNotFoundException ex) {
Logger.getLogger(TitanicAnalysis.class.getName()).log(Level.SEVERE, null, ex);
}
return null;
}
private void computeOverallSurvivalRate(String[] titanicData) {
int survivedCounter = 0;
int notSurvivedCounter = 0;
for (int counter = 1; counter < totalNumberOfPassenger; counter++) {
String[] items = titanicData[counter].split(",");
if (Integer.parseInt(items[1].trim()) == 0) {
notSurvivedCounter++;
} else if(Integer.parseInt(items[1].trim()) == 1) {
survivedCounter++;
} else {
System.out.println("Invalid Survived type: " + items[1]);
}
}
double survivedRate = (double)survivedCounter / totalNumberOfPassenger * 100;
double notSurvivedRate = (double)notSurvivedCounter / totalNumberOfPassenger * 100;
System.out.printf("Survived Rate: %.2f%% Non-survived Rate: %.2f%%\n", + survivedRate, notSurvivedRate);
}
}