An Application of Natural Language Processing: Analyzing student essays as a big-data project

Friday, September 1, 2017 - 2:20pm to 3:10pm
Swearingen room 2A14

I would like to invite you to attend this week's CSCE 791 seminar. These seminars highlight research being performed in our department and across the world. All CSCE 791 seminars are open to anybody who wishes to attend - not just students registered for the course.

Friday, September 1, 2:20 - 3:10 PM
Swearingen room 2A14

Speaker: Dr. Duncan Buell, University of South Carolina

Abstract: First year students at most large universities take required courses whose purpose is to teach them to write prose essays and make arguments. We have acquired more than 7000 pairs of draft-and-final essays from USC and have been analyzing them. We are not trying to do “machine grading” of essays as an AI project. Rather, we are trying to identify features of writing that can be quantified and thus processed with programs as a big-data analysis. We are interested in the extent to which students revise their draft essays to become final versions. And we are interested in comparing our student writing against other genres of writing. For this last we use the Corpus of Contemporary American English (COCA) as source data. The COCA is a corpus of more than 500 million words of text separated into genres of academic writing, magazine writing, transcripts of spoken English and interviews, and such. Our eventual goal is to situate student writing relative to other genres and thu
s to help with improving the pedagogy of teaching writing; knowing what the students are actually writing now is key to knowing how to get them to write formal prose effectively. Programming is done in Python. Part of speech tagging is done using the CLAWS package from the University of Lancaster in the UK. Sentence parsing is done using the package from Dan Jurafsky’s lab at Stanford.

Bio: Duncan A. Buell is a Professor in the Department of Computer Science and Engineering at the Unviversity of South Carolina. His Ph.D. is in mathematics from the University of Illinois at Chicago (1976). He was from 2000 to 2009 the department chair at USC, and in 2005-2006 was interim dean. He has done research in document retrieval, computational number theory, and parallel computing, and has more recently turned to digital humanities as one of the emerging “marketplace” applications for computing. He is engaged with First Year English at USC on the analysis of freshman English essays, searching for an understanding of actual student writing in an effort to improve pedagogy for first year English instruction. He has team taught four times with Dr. Heidi Rae Cooley on the presentation of unacknowledged history on mobile devices, and he and Dr. Cooley are actively engaged in ways to go beyond text to fully enable the use of visual media in mobile applications that present human
ities content, especially content that might normally remain unacknowledged by institutional authority.