CSE 7350/5350

David W. Matula

Exam Scheduling Project

 Total Points: 400

Updated: May MMVI

Due Date:

Early: 20 point bonus for May 5 (12 noon) Prompt:10 point bonus for May 9(4pm)

Latest: May 12 (4 pm)

Note : After Friday, May 12 at 4pm , late projects will incur a penalty with possible delay of grade. Have your project TIME STAMPED by Debra at the CSE office to receive bonus.  


1   Introduction and Summary

In this project you are asked to implement an algorithm for exam scheduling that reduces to an algorithm for graph coloring. You are first asked in Part I to prepare several randomized student exam lists as a randomized benchmark set and to implement two algorithms for converting the exam lists into conflict graphs. In Part II you are asked to implement the smallest last coloring algorithm for vertex coloring of the conflict graphs. The coloring algorithm is to be applied to the randomized benchmark set from Part I and a second set of benchmark graphs provided for the project as described in Section 2. The graph coloring algorithm is also to be applied to several original graphs you are asked to create to exhibit the strengths and weaknesses of the coloring algorithm.

Regarding algorithm efficiency you are asked  to verify a linear time bound O(|V|+|E|) for smallest last graph coloring and suitable time and space bounds for each conversion algorithm you implement for preparing the conflict graph from the exam lists. You are asked to document your implementations in the form of a report to a technical manager with substantial use of graphic display of results. Section 3 provides more details on the grading issues and the format for the project report. Section 4 provides details on the exam list generation implementation requested as Part I of the implementation. Specifications for Part II on the graph coloring procedure implementation are given in Section 5. 

 

2   Benchmarks

A Randomized Case Benchmark of 10 particular random graphs output from Part I should be part of your graph set input to your Exam Scheduling Program in Part II. You will also be provided with ten additional benchmark graphs with which to test your Exam Scheduling Program in Part II. Five of these graphs were created using a graph generation program developed by Paul Bartholomew. Your program must find a smallest last vertex ordering and vertex coloring for each of these ten benchmark graphs as well as the graphs created in Part I and additional graphs your create as requested in Parts I and II.

These additional graphs are available in both adjacency list and adjacency matrix representations as ASCII files on the Engr UNIX network in the directory: /users/project/cse/CSE5350.

 3   Project Format

Your project submission should be equivalent to a 10 -15 page word-processor-composed report. This should be initiated by a 2 - 3 page Executive Summary containing an Introduction and Summary, a Programming Environment Description, and References as described in the following.

The core of your report should contain the Reduction to Practice details and Result Summary as outlined in the following. With reference to the On Output perspective on the class home page, the output of Part I is to the “Believer” audience since it is input to a second program, and your report should substantiate the correctness of the implementation. The output of Part II has a dual audience of “Manager” for the executive summary and “Rational Learner” for the core of the report, and these audiences should be kept in mind. In summary, the report should be oriented towards a technical manager with a wide perspective, say in Scientific American magazine style. Grading partition:

 

 

 

3.1      Executive Summary

Introduction and Summary (50 points)

Give a brief description, in layman's terms, of the project, including your major result. Consider this section to be a professionally worded marketing effort. That is, would the competent technical manager (your audience) be likely to want to read on and try to understand your claimed good results. Describe the strongest features of your implementation. Summarize your use of illustrations, figures, tables, or computer graphics employed that will reveal your results effortlessly to the reader. Give clear citations (e.g. [1]) to your sources listed in the References section including downloaded programs and previous student's projects that most influenced your work.

Programming Environment Description (20 points)

Give a description of the environment you used to develop the program including both hardware and software components, i.e., the specific type and brand of computer, the computer's processing speed, the amount of memory, the operating system, the language, any special libraries, etc. The description should include metrics on the program resource utilization allowing judgment of “was the total used respectable or an overkill?”. The purpose of this description is to allow comparisons and facilitate reproduction. Consider: would a knowledgeable colleague be able to judge the scope of your work from this description of the environment and resource utilization? The graphics package used to present the report and the package used to create the graph drawings should also both be described.

References (30 points)

Provide an enumerated list of publications, texts, and programs in typical reference style as a concluding part of the executive summary, for example:

[1] D.W. Matula, Exam Scheduling Project, www.engr.smu.edu/cse/7350/, 2003

IMPORTANT: If you read previous student projects from the archives in the department office, you should include a reference to each of them in your reference list.

 

 

3.2      Exam Scheduling Report

Reduction to Practice (120 points)

This is the core of your report and should cover the five following topics appropriately integrated.

·         Data structure Design: Describe and illustrate the representation and organization of data employed and discuss the relation to the algorithms employed for both Parts I and II.

·         Algorithm Descriptions: Provide a description of the smallest last vertex ordering and graph coloring algorithm sufficient for your report to be self-contained to someone familiar with other graph search algorithms.

·         Algorithm Engineering: Describe how the algorithm implementation is engineered to achieve efficiencies in time and space and provide arguments for the efficiencies. In particular, carefully substantiate the linear time and appropriate optimal space bounds for the radix sort-merge, smallest last vertex ordering, and vertex coloring, in each case where your implementation realizes this bound.

·         Verification Walkthrough: Present a walk through for the vertex ordering and graph coloring of Part II for an original graph having 10 to 14 vertices and 20 to 30 edges. Choose the graph to expose challenging aspects of the scheduling algorithm and your implementation.

·         Algorithm Effectiveness: Include here a summary table of results as an overview of the benchmarks and your additional test graphs. Give your conclusion on the strengths and weaknesses of the conversion and coloring algorithms on the one hand and your implementation on the other. Indicate applications (generalized scheduling, etc.) where your implementation would be strongest and what kind of input would be most difficult to handle. Your analysis and conclusions relevant to your separately created test graphs can be a focal point of this topic.

Benchmark Result Summary (80 points)

You should provide a standard output for each randomized benchmark graph created in Part I as specified in Section 4. You should provide output for each graph colored in Part II as specified in Section 5.

 Display (100 points)

These points are awarded on the basis of your overall use of illustrations, figures, tables, bar charts, and computer generated diagrams in your report. Hand drawn and/or computer generated figures may be utilized as appropriate.

·         40 points will be related to your displays associated with the algorithm and data structure explanations for both Part I and Part II, including the verification walkthroughs.

·         60 points will be related to your displays and graph drawings associated with output results, including both sets of benchmark cases and the supplementary verification case studies you create.

4            Part I: Simulated Exam List Conflict Graphs

This subproject utilizes pseudo random number generation to create simulated student exam lists. A variety of methods possessing different time and space requirements will be employed to accumulate conflict data on-the-fly and prepare the output conflict adjacency list data structure.

Input:               N = number of exams.

                         S = number of students.

                        K = number of exams per student.

Output:            N = number of exams (vertices).

                        M = number of distinct pair wise exam conflicts (edges).

                   DEN = density of non zero exam conflict entries (range 0-1).

Degree distribution: Plot the number of vertices having degree i as a function of i

 for i =  0,1,…, max degree, whenever max degree is at most 200. If max degree

is greater than 200, present the distribution in some appropriate graphic form.

Graph Adjacency Matrix: Plot the adjacency matrix of the conflict graph for all

                   cases with N £ 200, and also for cases with larger N if your display tools allow 

                   larger presentations. For conflicts occurring more than once, you may employ

                   shading or some other mechanism to suggest higher density for the plotted

      “point”.

General Procedure: For input data generation we shall pick K distinct exams between 1 and N according to some specified distribution using pseudo random number generation, and note the specific conflicts implied. Note that each student generates K(K-1)/2 pair wise conflicts, many of which are duplicate conflicts shared by other students.

For output we shall consider a variety of procedures for counting and accumulating the number of duplicate conflicts and preparing the required sequential adjacency list structure of conflict data. Various types of summary information should be presented as requested. The graph adjacency structure should be produced and saved for input to the scheduling procedure.

Develop the adjacency list conflict data structure for the 10 benchmark data sets created as specified, and also for two-to-four more such cases that illustrate your implementation features in size and “quality”. Implement method 1 for application when N £  200, and implement either method 2 or 3 and apply to cases where N > 200.

Universal Set Incidence.

Multiset: Provide storage of the upper triangular portion of the N´N conflict matrix. After the input is processed the element of the array corresponding to the exams I, J (for 1 £ I < J £ N) should contain the number of students taking both exams I and J. This matrix is scanned with the nonzero entries indicating the I, J pairs to be entered into the adjacency list conflict data structure for the scheduling procedure. Discuss and run tests for the largest value of N for which the computer system used for your implementation conveniently provides space, e.g. N ³ 500 should be feasible for moderate size systems. For Benchmark 1 only print the conflict frequency matrix.

Sequential Adjacency Structure Construction.

Describe and develop a (linked) adjacency structure to dynamically record conflicting exams and their frequency. For each on-the-fly received conflict of exam I with J for I < J, search the adjacency structure with I for J and: if J is found, increment the corresponding frequency counter by unity; else append J as an appropriate new entry of the adjacency structure with initial frequency count of unity.

Radix sort/merge. (Optional if developed in class: 30 bonus points)

Employ the radix sort/merge procedure developed in class to accumulate the frequency counts. For each merge report the size of the row conflicts group and the resulting size of the adjacency structure after accumulating that group.

 

Distributions

Skew:  The density should look as follows:

  

 

       

       

        1                                        N                  

 

 

Two Tier: 50% of the density should be in the first 10% of the range.

                                                                                             

                                 Ratio:   9:1

 

 

 

      0               10% N                     100% N

 

Benchmark Data Sets

  Benchmark                       N                          K                        Den              Distribution

            1

16

3

30%

Skew

2

100

5

20%

Uniform

3

100

5

50%

Uniform

4

100

5

80%

Uniform

5

200

5

50%

Skew

6

200

5

50%

Two Tier

7

300

5

10%

Skew

8

300

5

10%

Two Tier

9

600

5

2%

Two Tier

10

1000

5

1%

Two Tier

 

5            Part II:  Graph Coloring

This subproject utilizes the smallest-last coloring algorithm to provide a vertex coloring of the conflict graph to implement the scheduling.

Input:               A graph in adjacency list form.

Output:           A coloring of the vertices of the graph, suitably presented to an audience of type “rational learner” that wants to visualize and understand the results for possible application of the program as implemented.

5.1   Tables and Displays in Report

Your report should include the following for each graph, along with any additional information you feel would “sell” your implementation.

 

Tables

            Header:  A description of the graph and key parameters, such as N, S, K, M, DEN, and probability type for each randomized graph obtained form Part I. For the addition ten benchmark graphs the corresponding parameters N, M, DEN, and some additional descriptive graph date, such as the distribution of vertex degrees.

            Walkthrough Tables:  For graphs with at most 200 vertices tabulate for each vertex in the order colored (1) the vertex label , (2) the degree, (3) the degree when deleted, (4) the number of adjacent colors when color assigned, (5) the final number of adjacent colors, (6) the vertex color. NOTE:  If this data is not printed for vertices in the right order there will be a 40 point penalty.

            For graphs with over 200 vertices provide summary data with regards to the above, such as averages or distribution functions for parameters (2) – (5). Provide a coloring algorithm walkthrough for randomized benchmark 1 (16 vertices) illustrating the order deleted by 15 successive cuts. For all graphs provide a size distribution table for the independent sets of vertices colored with color j for j = 1, 2, …, maxcolor.

Display

Use a graph drawing program to draw the graphs where possible, illustrating vertex colors either by distinct colors (preferred) or integer labels.

5.2  Smallest Last Coloring Implementation Outline

 The coloring algorithm may be summarized with reference to the following vertex records. The variables stored in fields 3 and 4 are illustrated to change dynamically at different stages in the algorithm to emphasize the distinct features of each stage. Fields 1 and 2 remain constant preserving the graph.

 

 

Record for Vertex I, 1 £  I  £  N.

Vertex I

Field 1

Field 2

Field 3

Field 4

Degree

Pointer to edge list

(i)  Current Degree

(ii)  –1 when deleted

(iii)  Color Value

 

(i)  Pointers for doubly-linked list of vertices of same current degree

(ii)  Pointer for order deleted list

(iii)  Number of adjacent colors when color assigned

The smallest last vertex ordering algorithm may be implemented by recursively deleting a vertex of smallest current degree, updating the third and fourth fields of the record for each adjacent undeleted vertex, and adding the vertex deleted to the ordered list of vertices deleted. After all vertices have been deleted, the coloring algorithm can assign colors (exam periods) to each vertex in an order opposite to the order deleted, assigning for a “color” the smallest non-negative integer not previously assigned to any adjacent vertex.