CSCE 500 -- Fall 2013

Computer Programming and Applications

Duncan A. Buell
Professor
Department of Computer Science and Engineering
University of South Carolina
Columbia, SC 29208
3A69 Swearingen
"buell" in any of several domains
803-777-7848(voice)
803-777-3767(fax)
grizzlefarb

Basics

This Webpage

This webpage will be changing throughout the summer as I add material and content. Please do not hesitate to ask questions. As of today (June 24, 2013) its purpose is primarily to substitute for the email attachment that has been distributed.

Textbook Information

I have been pinged (as of May 30) by the bookstore about the textbook for this course.

The goal in choosing a text for a course like this is to get enough material to keep people interested but also to make sure there is a solid place to which students can turn for answers. I can't say I have a decision at this point, and I won't have a decision for a couple of weeks. I have to read and code some more before I know what I think will work out best.

REQUIRED: At the moment I am leaning toward the idea of using as the main materials:

OTHER REFERENCES:

  • Learning Python (O'Reilly) ($55 print, $40 ebook)
  • It is to be remembered that O'Reilly for each language always has a "learning" book and a "reference" book. The companion reference book, for those who get serious and go further, is Programming Python ($65 print, $52 ebook, $72 both).
  • Tim Budd's Exploring Python can be found on the web (perhaps this is an earlier version).
  • Head First Python has been recommended, and is apparently less than $30.
  • Python: Programming in Context seems to read well, but it's very expensive by comparison, so it's hard to recommend unless one already has access to it.

Overview

This course is designed as a research methods course for graduate students outside Computer Science as a substitute for the foreign language requirement.

CSCE 500 will be taught using the Python programming language. Python is free and freely available on multiple platforms, as is the Natural Language Toolkit that is written in Python and provides functionality for semantic processing of text.

The course will focus on applications in text processing: constructing indices and concordances, extracting key terms and phrases from combinations of words and word co-occurrences in documents, solving the challenges of variant spelling, and learning ways to identify and compare patterns of word usages in documents. Upon completion, students should be ready to embark on a computational project of their choosing that would involve text processing.

The course will resemble a combination of CSCE 145 and CSCE 146, the first two courses in the computing majors, but all exercises will be focused on text.

Bulletin Description: "Concepts and properties of algorithms; programming exercises with emphasis on good programming habits. Credit may not be received for both CSCE 500 and CSCE 145. Open to all majors. May not be used for major credit by computer science and engineering majors."

Computing Issues

THIS COURSE WILL NOT ASSUME PRIOR EXPERIENCE IN PROGRAMMING. The course will be taught in Python. Most of the Python tools are free and freely available. Gambrell 150 is a Windows computer lab. Students should also be able to bring their own laptops (Windows, Mac, or Linux) and use those in class as needed.

Grading

Although this is a course (primarily) for graduate students, it also has to function in a manner similar to what we do at the undergraduate level. The grading will thus be based on the following

  • Homework programs (9 at 25 points each) = 225 total points
  • Final project proposal = 75 total points
  • Midterm exam = 100 points
  • Final exam = 100 points

The homework programs will be done outside of class and submitted as zip files to Blackboard.

The final project proposal will be an overview of a computational project that you would like to accomplish. Such a project could be as small as a term-paper-sized program for one semester or as large as a dissertation project that could take you a year or more. You will not be graded on the "nature" of the project as it might fit in your discipline or on the complexity of the project (within reason). You will be graded essentially on whether you can describe what it is that you would need to do computationally in order to have working code at the end. This would include the issue of source data, the programs needed to process the data, the result you would want, etc.

The final project is intended to be a help to you, not a busywork assignment from me to you. I am more than willing to talk with you during the semester about your goals, and we will probably have a class session or two in which you present your ideas. "Half cooked" (as opposed to "half-baked") ideas are perfectly ok; if you have not done extensive programming before, I can hardly expect you to know how to lay out a project plan for a programming project, but I would hope you might learn some of this in this course.

Python Resources

Version Control, Lab Computers, Etc.

Everything we will use in this course either comes with the textbook or is free and available on the web for download. However, things are are available for download tend to change versions on a regular basis.

We do not believe there will be functionally different versions of software available during this semester, but if you download tools for your home computer it is possible that you will be downloading a different version from that which is used in the computer labs.

For the most part, the things that will different from one version of Eclipse to another or from one version of Java to another are not things you will notice in this class.

Java versions are labelled (and no, this makes no sense) 1.3, 1.4, 5.0, and then 6.0. The key to Java is that you have at least version 5.0 or higher. This is necessary in order for you to do everything in this course.

Similarly, we are currently running Eclipse version 3.? in the labs. We take a risk-averse approach to computer labs, in that we are unlikely to install the most recent version until it's been used and tested (by someone else). I personally use Mac computers, so my version is the most recent version available for the Mac. You will see different versions, but I cannot imagine that you'll notice any difference from one version to the next on the kind of things you'll be using in this class.

As they say in this business, your mileage may vary. You may not find exactly the same set of buttons on any of the versions of Java or Eclipse that you use, but Java 6 and a recent version of Eclipse ought to provide the functionality you need even if the buttons are slightly different. The keys to success include the ability to adapt.

HOWEVER, getting all the pieces to fit together and play nicely the first time can be a little difficult. If you are going to be using your own computer and not one of the computers in the labs to do your homework, you should leave yourself a little extra lead time on the first assignment to make sure that Eclipse can find the Java and the Javadoc executables. When it comes time to do the JUnit assignments, make sure early on that you can use JUnit. It's not smart to wait till midnight before the assignment is due to find out that one package can't find what it needs.

In all things, the computer lab installation is to be taken as the reference installation, which will be verified to work correctly. If you choose to do the programming on some other computer, you are responsible for ensuring that everything works properly. Because the lab computers are provided with the correct installation, it is never acceptable for you to ask to be excused from submissions or deadlines on the basis that your personal computer failed to operate properly or that you mistakenly deleted files.

Notwithstanding the admonition above, it is worth pointing out three issues that arise when running on your own computer.
  • First, you must get an appropriate version of Java installed (this is a free download), and you probably want to get an appropriate version of Eclipse installed (also a free download). You will need to make sure that Eclipse can find the Java compiler and runtime environment.
  • Second, you will need to make sure that Eclipse can find the Javadoc executable.
  • Third, you will need to know whether your source code is being separated from the executable dot class files (in separate src and bin directories, respectively) or is combined. (I combine the two, but your may choose to do otherwise.) The issue that arises is that your executables will look for the data files in the parent directory of src and bin, not in either of those.
I can help you with some of these issues, but I am not going to configure your computer for you, and probably the best way to configure a laptop is to bring it to the lab and check its configuration against what you have in the lab. Working against a reference system that is correctly configured is often much easier than trying to diagnose problems in the abstract.

Academic Honesty

Assignments and examination work are expected to be the sole effort of the student submitting the work. Students are expected to follow the University of South Carolina Honor Code and should expect that every instance of a suspected violation will be reported. Students found responsible for violations of the Code will be subject to academic penalities under the Code in addition to whatever disciplinary sanctions are applied.

There seems to be a widespread misunderstanding of the concept of "your own work." In addition to the USC Code, some good sources of text for what is or is not acceptable behavior are the academic honesty policy statement from Harvey Mudd College, the policy statement from Professor Steven Huss-Lederman at Beloit College, and the text of part of the collaboration policy statement from MIT. You can expect your programming assignments to be checked against those turned in by other members of the class as well as code that I can find on the web. I expect the correlations between your work and that of others to be minimal.

I can also offer an operational definition of what you can do and of how you can distinguish "learning from a group discussion" and "turning in someone else's work." If, after having participated in a group activity, you can walk away, put the books down, have lunch, and then come back afterwards to re-create from your own head the material and techniques you discussed as a group, then you can legitimately say that you have learned from the group but the work you turn in is your own.

There has been a widespread misunderstanding of the purpose of the supplementary instruction, and students have repeatedly turned in work that is simply copied from what was explained in the SI session. If your work is identical or nearly identical to that turned in by some other student, then I will assume a priori that your work has been plagiarized either to or from that student, and under USC rules you are both equally responsible if you are aware of this duplication. You are not permitted to copy the text of code from someone else and use it as your own unless that text comes from the textbook or from the material on the Moodle site, and any such text copied in must be attributed to its source.

On the Proper Use of Computing Resources

Students are expected to be aware of the university policy on use of computing resources, including the Student Guidelines for Responsible Computing found here, as well as the college and departmental policies on proper use of computing resources. Every instance of a suspected violation will be reported. Students should be aware that neither the instructor nor the department are responsible for making alternative arrangements should improper use leading to revocation of access to departmental or college resources make it impossible for you to complete the programming assignments on time.

-->