Many programming courses at Universities follow the typical pattern: (1) students are given programming assignments, (2) students go through the process of figuring out what to write to solve the programming assignment, (3) students write their solution, and finally (4) students submit their code to be graded. Within these processes, the second and third process is of interest to us as they focus on what students write when creating their solution. However, before we can focus on what students write, we must find a way to capture the details of what code is written in a way that is not disruptive to the student.

In this paper, we describe Code Patternz, a tool created to record the programming process, so that we can find patterns in the steps that students take while writing code. We are interested not so much in the correctness of their solution, but in the coding path that students follow when solving the problems. Typically, a student's final submission does not provide insight into this type of valuable information because the process (e.g., top-down, bottom-up, tests-first, etc.) and intermediate attempts are not stored.

To capture this information, we built a web-based code editor, and included three Python programming problems that students in an introductory computer science course UC Berkeley had to solve. While solving the problems, the code-editor tracked the sequence of characters the students typed and stored these characters in a database for later analysis. By collecting this granular information, it opens up the possibility to learn from the patterns we might discover across students during the code writing process.

One of the challenges with such an experiment was building a system that could manage the data load of capturing every character students typed when a large number of students were using the system at the same time. In this paper, we discuss the architecture of the Code Patternz system, and some of the tradeoffs we had to make. The tool has two sections; the first presents questions and captures the code written. The second is the analysis section used to “replay” the code as if it were being written live. It also exports the stored data in a form that makes it easy to perform later analysis. We provide summary statistics as well as a “deep dive” into a few of the outlier student traces for the data collected from the UC Berkeley students.

There is a great amount of information that can be extracted through the analysis of the sequence of characters written by students when solving programming problems. The patterns we discover can help guide further instruction; watching where students struggle can inform future curriculum and intervention efforts. We hope our tool can provide a platform to ask these questions easily.




Download Full History