autogradescope

autogradescope is a Python package for creating Gradescope autograders with pytest. Features include:

Convenience: write autograders in Python using pytest (instead of unittest).
Informative failure messages: provides students with detailed feedback on their code, including the source code of failing tests.
Weigh tests: assign more points to important tests.
Flexible visibility: show tests to students only after the deadline, or never.
Timeouts: prevent infinite loops or infinite code from hanging the autograder.
Leaderboards: make competitions that use Gradescope’s leaderboard feature.
Extra credit: award points for additional tests.

Getting Started

Installation

autogradescope can be easily installed with pip:

pip install git+https://github.com/eldridgejm/autogradescope

It is also packaged as a Nix flakes. To temporarily install it, run

nix shell github:eldridgejm/autogradescope

You can see the source code on GitHub.

Creating a new autograder

To create a new autograder, run autogradescope and follow the instructions. This will create a new directory, autograder, with a template autograder. The layout of this directory is:

.
├── data/
├── setup/
├── solution/
├── tests
│   ├── test_private.py
│   └── test_public.py
├── Makefile
└── requirements.txt

Place your public and private tests in tests/test_public.py and tests/test_private.py respectively; the next section describes the structure of these files. Put your solution code in solution/, any data files that the submission will need in data/, and any Python packages needed to run the autograder (e.g., pandas, numpy) in requirements.txt.

Writing test modules

Autograder tests are written as pytest unit tests. That is, each test should be written as a Python function whose name starts with test_ in a file whose name starts with test_ in the test/ directory. A test function should use assert to check that a particular outcome holds; if it does not, the test fails and points are not awarded. Tests either award points or do not; to award partial credit, you can use multiple tests.

Note

Autograder tests are typically divided into two categories: public and private. Public tests are always visible to students, while private tests are visible only after the deadline (or are never visible). Reflecting this convention, the template autograder generated by autogradescope includes two test files, tests/test_public.py and tests/test_private.py. In principle, autogradescope is flexible, and you do not need to use this structure; you can have as many test files as you like, but it is recommended to keep public and private tests separate.

The test file should import the student’s submission as a Python module and then write test functions that call functions in the submission. You will want to communicate to students that they should, for example, “write a function named doubler in a fill named pp01.py that takes an integer and returns twice that integer.” If the module cannot be imported (due to the student naming it incorrectly, or because their code imports a module that is not available), the autograder will fail and print a helpful error message.

If a test function includes a docstring, the docstring will be displayed to students as the name of the test (provided that the test is visible to students). It is therefore recommended that each test have an informative docstring, such as “Checks efficiency of your code on a large input” or “Checks that output is sorted”. If a test function does not include a docstring, the function name will be displayed to students instead.

If a test that is visible to students fails, the test code will also be displayed to students to show them precisely what went wrong. For this reason, it is recommended to include comments in the test code to explain what the test is checking for.

An example of a simple autograder test is shown below:

def test_doubling_21():
    """Doubling 21 makes 42."""
    # When we give your code 21, it should return 42.
    assert pp01.doubler(21) == 42

Configuration

Autograder tests require some configuration. For example, the visibility of the tests to students needs to be specified, as is the (weight) of each test, a timeout limiting the time a test can take before it fails, etc.

Defaults for every test in the file are set by creating an instance of autogradescope.Settings and storing it in the global variable SETTINGS. For convenience, comments describing the available settings are included in the starter test files generated by autogradescope. To prevent errors, the test file must include a SETTINGS object; if it does not, the autograder will not run.

Settings for individual tests can be set using the autogradescope.decorators.weight(), autogradescope.decorators.timeout(), and autogradescope.decorators.visibility() decorators. These decorators can be used to override the default settings for individual tests, and can be combined to set multiple settings at once. For example, the following test is worth 2 points, has a timeout of 10 seconds, and is hidden from students:

@weight(2)
@timeout(10)
@visibility("hidden")
def test_easy_1():
    """Doubling 21 makes 42."""
    assert pp01.doubler(21) == 42

Example

The full example below shows a simple autograder test file with a few tests.

"""Public autograder tests."""

from autogradescope import Settings
from autogradescope.decorators import weight, timeout, visibility

# settings =============================================================================

SETTINGS = Settings()

# default_visibility -------------------------------------------------------------------
# This controls the default visibility of the tests. Valid options are:
#
# - "hidden": The test results are never visible to the students.
#
# - "visible": The test results are always visible to the students.
#
# - "after_published": test case will be shown only when the assignment is
#   explicitly published from the "Review Grades" page.
#
# - "after_due_date": test case will be shown after the assignment's due date
#   has passed. If late submission is allowed, then test will be shown only after
#   the late due date.
SETTINGS.default_visibility = "visible"

# default_weight -----------------------------------------------------------------------
# The number of points each test is worth by default.
SETTINGS.default_weight = 1

# default_timeout ----------------------------------------------------------------------
# The number of seconds before a test times out and no points are awarded.
SETTINGS.default_timeout = 60

# leaderboard --------------------------------------------------------------------------
# A dictionary mapping leaderboard categories to scores for this submission. If
# a leaderboard is used, it can be set to an empty dictionary here, and filled in with
# values in the test functions.
# SETTINGS.leaderboard = {}

# tests ================================================================================

# import the student's submission
import pp01

def test_for_smoke():
    """Checks that the submission runs without error."""
    pp01.doubler(1)

@weight(2)
def test_easy_1():
    """Doubling 21 makes 42."""
    assert pp01.doubler(21) == 42

@timeout(10)
def test_lots_of_doubling():
    """Doubling 1 million gives a big number."""
    assert pp01.doubler(1_000_000) == 2 * 1_000_000

For a full example of an autograder that can be compiled and uploaded to Gradescope, see the example directory in the autogradescope GitHub repository.

Building

Running make test within the autograder directory will simulate running the autograder against your solution code. This not only ensures that the autograder is properly configured, but also that the tests pass when run against the correct solution (as they should).

Running make autograder (or simply make) will build the autograder zip file that you will upload to Gradescope; if the build succeeds, the file will be located at _build/autograder.zip. Note that make autograder also runs make test; if the tests fail, the autograder will not be built.

Running make clean will remove the _build/ directory and its contents.

Features

Timeouts

Gradescope autograders can be configured with a maximum time limit. However, if a submission takes too long, the autograder is killed and the student loses all points. To prevent this, you can use the autogradescope.decorators.timeout() decorator to set a timeout for individual tests. If a test takes longer than the specified time, only that test fails, and the student does not lose points for other tests.

Leaderboards

Gradescope assignments can be configured to have leaderboards that rank submissions by performance on a particular metric. To use leaderboards with autogradescope, set the leaderboard attribute of the SETTINGS object to a dictionary mapping leaderboard categories to scores. There are no restrictions on the leaderboard categories you can use. For example, to create a leaderboard that ranks submissions by accuracy, you could write:

CORRECT_LABELS = np.array([0, 1, 0, 1, 1, 0])
YOUR_ACCURACY = (submission.predict() == CORRECT_LABELS).mean()

SETTINGS.leaderboard = {
    "accuracy": YOUR_ACCURACY
}

@weight(2)
def test_accuracy_above_75():
    """Accuracy is above 75%."""
    assert YOUR_ACCURACY > 0.75

Including doctests

Sometimes you may want to distribute starter code to students that includes doctests, since doctests can be a useful way to provide examples of how their code will be called and what it should return. In such cases, you may want to write an autograder test that checks that the doctests pass (this can be a useful public test). To do this, you can use autogradescope.doctests.run(). For example:

import autogradescope.doctests

# import the student's submission
import pp01

def test_doctests():
    """Check that the doctests pass."""
    doctests.run(pp01)

Warning

This will run the doctests as they appear in the student’s submission (which might be different than the doctests in the starter code!). For example, a student can easily make this test pass by deleting the doctests from the starter code. Or, this test might fail because the student _added_ more stringent doctests than the starter code.

Extra credit

A test can be marked as extra credit by using the extra_credit argument to the autogradescope.decorators.weight() decorator. For example:

@weight(2, extra_credit=True)
def test_extra_credit():
    """This test is extra credit."""
    assert pp01.doubler(21) == 42

Practically, because Gradescope allows configuring an autograder to have a maximum number of points that is fewer than the sum of the points awarded by the tests, extra credit can be implemented without this decorator. However, this decorator changes how the results are displayed to the student. With this decorator, the test still shows as “green” (i.e., passed) even if the student does not pass the test. Without this decorator, the test shows as “red” (i.e., failed) if the student does not pass the test.

API

Module: `autogradescope`

class autogradescope.Settings(default_visibility='after_published', default_weight=1, default_timeout=None, leaderboard=None, failure_message=None)[source]

Bases: object

Stores autograder settings.

A test module should have a single instance of this object, named SETTINGS, that is used to store the settings for the autograder.

This object does error checking on the settings to ensure that they are valid, and to prevent typos from causing terrible catastrophes that could result, like students seeing the private tests before the due date.

default_visibility

The default visibility of the tests. Valid options are:

hidden: The test results are never visible to the students.
visible: The test results are always visible to the students.
after_published: test case will be shown only when the assignment is explicitly published from the “Review Grades” page. If not provided, this is the default.
after_due_date: test case will be shown after the assignment’s due date has passed. If late submission is allowed, then test will be shown only after the late due date.

Type:: str

default_weight

The default weight of the tests. Default is 1.

Type:: int

default_timeout

The default timeout for each tests. If None, no timeout will be used beyond Gradescope’s own timeout for the autograder process as a whole. Default is None.

Type:: Optional[int]

leaderboard

A dictionary mapping leaderboard categories to scores for this submission. If None, no leaderboard will be used. Default is None.

Type:: Optional[dict]

failure_message

A function that formats a failure message for a test. If None, a default failure message will be used.

Type:: Optional[Callable]

Module: `autogradescope.decorators`

Decorators that change the behavior of tests.

autogradescope.decorators.timeout(seconds: int)[source]: Changes the timeout from the default of 60 seconds.

autogradescope.decorators.visibility(vis: str)[source]: Changes the visibility from the default of after_published. Validates.

autogradescope.decorators.weight(points: Number, extra_credit: bool = False)[source]

Changes the weight from the default of 1.

Parameters:

points (Number) – The number of points the test is worth.
extra_credit (bool, optional) – Whether the test allows extra credit. Default is False.

Module: `autogradescope.doctests`

A convenience function for running doctests within autograder tests.

autogradescope.doctests.run(module)[source]

Runs a module’s doctests.

This can be used in autograder tests to ensure that the doctests are correct.

Parameters:: module (module) – The module to run doctests for.
Raises:: DoctestError – If any of the doctests fail.

Example

To run the doctests for a student’s submission in an auto-grader test:

import autogradescope.doctests

import pp01

def test_doctests():
    """Checks that the doctests pass."""
    autogradescope.doctests.run(pp01)