This thesis proposes a methodology for revealing deep content interaction models from real life weblogs. The methodology is applied to weblog data from Adresseavisen, a regional news publisher in Norway. This thesis gives a brief literature overview of process mining, motivations of process mining, related works are done with process mining and tools for process mining. In addition to this, the project compares process mining with data mining as well as web usage mining. The dynamic nature of the Web and Information systems are becoming more and more intertwined with the operational processes. This gives the possibility to record a multitude of event data and provides an opportunity to use process mining to these data to extract process-related information. Process mining is an active and innovative research area in recent years, where the goal is to extract process-related information from even logs by observing events recorded by some information system. Over last few decades, process mining has made its way as a new research field that focuses on the analysis of processes using event log data.
The migration of today’s news media business to personalized information delivery has created the need for analyzing user behaviors on the news site and deliver the personalized set of news item according to their preferences. There are several recommendation algorithms that are used for recommendation of news items. This thesis makes an endeavor to use event logs from news media site and use process mining on these data for process discovery. The discovered model from this process mining is used to predict the next click item for an anonymous reader. In addition to this, the thesis discusses the value and implications of the extracted models and how this information can be consumed by business intelligence.
Keywords: Process Mining, Business Process Management, Process Discovery, Petri nets, Workflow mining, Workflow management, process mining methodology
This thesis is submitted to the Norwegian University of Science and Technology (NTNU) for the partial fulfillment of the requirements for a Masters degree. This work has been performed at the Department of Computer and Information Science (IDI), NTNU, Trondheim in the spring of 2016. This thesis would not have been possible without the support of many people. First of all, I would like to thank my supervisor Professor Dr. Jon Atle Gulla and co-supervisor Jon Espen Ingvaldsen at the Department of Computer and Information Science, Norwegian University of Science and Technology for their constructive feedback and helpful guidance to complete my thesis on time.
I would also like to thank Arne Dag Fidjestl for providing resources to gather data from http://www.adressa.no/. And finally, I would be falling in debt if I don´t thank my family back in Nepal who provided help and motivational support for the last two years that gave me the strength to work hard and finish my master. I would also like to express my gratitude towards all my friends who helped me with their constructive feedback and honest criticism. Suresh Kumar Mukhiya June, 2016