Skip to content

vietvudanh/really-large-file

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Process really large file

Inspired by this.

Tasks

  • Write a program that will print out the total number of lines in the file.
  • Notice that the 8th column contains a person’s name. Write a program that loads in this data and creates an array with all name strings. Print out the 432nd and 43243rd names.
  • Notice that the 5th column contains a form of date. Count how many donations occurred in each month and print out the results.
  • Notice that the 8th column contains a person’s name. Create an array with each first name. Identify the most common first name in the data and how many times it occurs.

Lesson learnt

- pypy is fucking slow, what?
- solution at https://itnext.io/using-java-to-read-really-really-large-files-a6f8a3f44649 is fucking bad, GC overhead all the time