This document provides the information necessary to run the MFS (Maximal Frequent Sequences) Algorithm. In your working directory, you should have the following files: Cleanse.java Sessionize.java bitmap.h bitmap.cpp mfs2.cpp 1. Cleanse.java: It gets rid of irrelevant logs and unnecessary fields in the raw log file. In addition, it substitutes each page and source IP address with a unique ID. In order to run the program, you must have access to the DBMS, and use the following sql file to create source_ids & page_ids tables: create_id_tables.sql The sample raw log file is below: access_log.19990816 Program can be executed with following command: Java Cleanse 2. Sessionize.java: It groups the cleansed log file by Source ID and Timestamp. Note: The output file from Cleanse.java cannot be processed by Sessionize.java directly. It must be sorted according to both Source ID and Timestamp, and rearranged in the order as: . (output from Cleanse.java is in the order as: ) The sample input file is below: access_log.19990816.20011220.cleansed.sorted.txt.gz The corresponding sample sessionized log files using time window of 5min, 10min, 15min, 30min respectively are listed below: sessionized05.gz sessionized10.gz sessionized15.gz sessionized30.gz Program can be executed with following command: Java Sessionize [ ] 3. bitmap.h, bitmap.cpp, mfs2.cpp: These three files are needed to run the MFS Algorithm. It builds a suffix tree first, then uses the depth-first traversal to output MFSs. In order to run the program, you need to create two directories under your working directory: data & result. Directory data contains the sessionized files to run the MFS algorithm and directory result will provide the running result files. The following data files can be put into data directory for testing purpose: S100K.15m access_log.19990816.20011220.sessionized.05minutes.txt access_log.19990816.20011220.sessionized.10minutes.txt access_log.19990816.20011220.sessionized.15minutes.txt access_log.19990816.20011220.sessionized.30minutes.txt Their corresponding running result files are named as following: Data_file.support_threshold.partition_number.cumulative_pruning The sample running result files are listed below (Note: in case the partition is too samall for the program to run in the amount of memory available, it would cause a memory error): S100K.15m.0.0010.10.0 S100K.15m.0.0010.10.1 access_log.19990816.20011220.sessionized.05minutes.txt.0.0010.1.1 access_log.19990816.20011220.sessionized.05minutes.txt.0.0010.10.0 access_log.19990816.20011220.sessionized.05minutes.txt.0.0010.10.1 access_log.19990816.20011220.sessionized.05minutes.txt.0.0010.100.1 access_log.19990816.20011220.sessionized.05minutes.txt.0.0010.20.1 access_log.19990816.20011220.sessionized.05minutes.txt.0.0010.5.1 access_log.19990816.20011220.sessionized.10minutes.txt.0.0010.1.1 access_log.19990816.20011220.sessionized.10minutes.txt.0.0010.10.0 access_log.19990816.20011220.sessionized.10minutes.txt.0.0010.10.1 access_log.19990816.20011220.sessionized.10minutes.txt.0.0010.20.1 access_log.19990816.20011220.sessionized.10minutes.txt.0.0010.5.1 access_log.19990816.20011220.sessionized.15minutes.txt.0.0010.1.1 access_log.19990816.20011220.sessionized.15minutes.txt.0.0010.10.0 access_log.19990816.20011220.sessionized.15minutes.txt.0.0010.10.1 access_log.19990816.20011220.sessionized.15minutes.txt.0.0010.20.1 access_log.19990816.20011220.sessionized.15minutes.txt.0.0010.5.1 access_log.19990816.20011220.sessionized.30minutes.txt.0.0010.1.1 access_log.19990816.20011220.sessionized.30minutes.txt.0.0010.10.0 access_log.19990816.20011220.sessionized.30minutes.txt.0.0010.10.1 access_log.19990816.20011220.sessionized.30minutes.txt.0.0010.20.1 access_log.19990816.20011220.sessionized.30minutes.txt.0.0010.5.1 These programs can be compiled using following command: g++ -o mfs2.cpp bitmap.cpp Then, run the program using following command: <1 for cumulative_pruning> [1 for debug]