Skip to Content

STEM Writing Project

Project Links

Available Instructional Resources

  • STEM Writing Resource Guide
  • Scientific writing curriculum for first year undergraduates in biology (modeled on WAC/WID principles of practice)
  • Protocols for:
    • Developing a bins-based report grading protocol using locally collected data
    • Assessing inter-grader reliability



A multi-disciplinary team of college English writing instructors, biology teachers, data scientists, and computational linguists working to improve students’ technical writing. By combining principles from national Writing Across the Curriculum and Writing in the Disciplines programs with classroom research and technology support, we hope to find better ways to train and support student writers in STEM fields.


Scientific thinking and communication skills are vital for 21st century STEM careers. From research we know the most effective way for students to develop these skills is through cycles of writing drafts, receiving actionable feedback and coaching, then making further refinements until a final product emerges. Ideally, students would be writing and revising routinely, but this is too time- and labor-intensive to be practical in high-enrollment STEM courses. Also, college teachers may not know how best to make high-impact writing instruction part of their classes.

This project tested a research-based curriculum to develop scientific writing skills. It combined text analysis technology, teaching assistant training, and an integrated writing training program for undergraduates.  We assessed whether students develop writing skills faster by combining reading and text annotation exercises, automated feedback using text analysis technology, and holistic feedback from their instructors. We also evaluated graduate teaching assistants’ mastery of these training methods.

Using computational linguistics and statistical modeling we explored:

  • How students develop technical writing skills.
  • How automated infrastructure can provide early, frequent feedback to student writers.
  • How to make writing assessment more uniform and transparent.

The SAWHET technology

SAWHET was our platform for collecting and analyzing student lab reports. Students entered their written assignments into our online web form powered by Qualtrics ®. Each section of the paper had a unique entry space (Title, Abstract, Introduction, Material and Methods, Results, Discussion, Figures and Tables, Legends and Literature Cited). While writing, the web form made sure word requirements were met and a complete report was submitted. If students needed assistance, info about how to write each section was provided on the form so students had help at their fingertips. Once a student report was submitted it was automatically analyzed by natural language processing to check for the presence of specific requirements, such as presence of a hypothesis and whether statements weere backed up by citations. SAWHET also performed an automatic readability test for each section of the report, estimated relative rates of scientific terminology use and checked for potential plagiarism based on similar citations of reports already stored in the database. The student’s work together with the analysis report was send to the instructor assessing the student’s work. Analytics of each report was stored in the database together with a copy of the student’s original work.

Image about SAWHET at a glance

With funding from NSF, we expanded SAWHET and used its pooled data to begin assessing student development as scientific writers. SAWHET was unique in it provided feedback to students before an instructor graded their work. The students then had the opportunity to revise and resubmit their work without penalty, increasing the number of times student could improve upon their work.

The original SAWHET platform was useful in a research context but relied on proprietary software as a core component. Future editions of SAWHET would need to be developed on a open-source framework in order to achieve realistic scalability.