PhD Thesis: Research Source Code
In the course of researching for his PhD thesis on streaming OLAP, Stephen designed and developed the following implementation.
Stephen’s work is written in C and designed to run on Linux. It consists of an ingest data base that is optimized for high input rates. This is appropriate for streaming OLAP. He has created a new type of Star Schema called Stream Star Schema. Stephen has modeled the Stream Star Schema on the network data stream and the file object data stream. The results are significant. For the network data stream the Stream Star Schema is 177 times faster. For the file object data stream the Stream Star Schema is 39 times faster.
In addition Stephen has created an implementation of the OLAP hypercube which uses the Stream Star Schema for input. This hypercube also links data values to data aggregates and is thus called the Data Value Cube.
Master’s Thesis: Research Source Code
Stephen’s Master’s thesis was in the area of artificial intelligence: machine learning. Such research is usually applied to game theory since games provide a narrow well defined universe. The game must be of sufficient complexity to alleviate rote learning, that is brute force learning where the computer generates all possible moves and can thus always pick the best move. Checkers, chess, and Go are common choices. Stephen chose ScoreFour, which is three-dimensional tic-tac-toe with four in a row. His research was based on two of Findler’s learning techniques: polynomial learning (with aggression) and generalization learning. He implemented both techniques and conducted a playoff and performed a statistical analysis of the results.
UC Santa Cruz Storage Systems Research Center
Storage Systems Research Center (SSRC UC Santa Cruz)
The UC Santa Cruz has a wonderful Computer Engineering Department. The graduate students are all involved in a group called the Storage Systems Research Center (SSRC) that provides a mechanism to facilitate student and government laboratory collaboration. With the assistance of Dr. Ahmed Amer, I became involved with the SSRC to further my research into streaming OLAP. To wit, I am participating in the exabyte project.
Currently I am working on moving streaming OLAP into the cloud. This will enable large scaling and massive storage. I define the data mining stack as (from bottom to top) the ingest database, the OLAP hypercube, the data mining heuristics. Each one of these layers should be in the cloud.
OpenStack
OpenStack was originally a cloud collaboration between NASA and RackSpace. NASA developed the compute side called Nova. RackSpace developed the storage side called Swift. They released their work to open source. OpenStack is written in Python.