The customer is trying to build secure database that can be utilized as a master data source to perform driving behavior analysis using Weka (via ODBC connection). Every day the customer receives 3-4 email blasts of zip files which contain 200+ CSV files which are tracked to an equal or greater number of drivers. Each file is recorded with an ID as a primary key, and about 113 columns of driving dynamic parameters. The ID is actually a vehicle id (which is unique to a driver) but doesn't uniquely represent a driver by name, etc. It uniquely represents a particular vehicle driven by a user. This ID can be used across time to track the same driver driving the same vehicle over a period of time.
The multiple email blast is simply due to limitations to the size of the ZIP file that can be produced. Because of the zip file size limit, multiple zip files are created and sent via multiple emails to the customer, which is difficult to manage. All CSVs in a zip file can be safely combined with all emails received on one day and be taken as the totality of the data for the day.
The problem is the customer has been using MS Access, and running special .BAT program in order to combine data files. For instance, one day of data from 1/10/17 is 439 CSV files, which totals 925MB. MS Access has a size limit of 2GB, therefore, the amount of data is just too much for MS Access - the customer can only store and manipulate (1) day of data in Access - and this preventing the customer from furthering their research.
See Project Slides Description.