autogradescope
autogradescope is a Python package for creating Gradescope autograders with pytest. Features include:
Convenience: write autograders in Python using pytest (instead of
unittest
).Informative failure messages: provides students with detailed feedback on their code, including the source code of failing tests.
Weigh tests: assign more points to important tests.
Flexible visibility: show tests to students only after the deadline, or never.
Timeouts: prevent infinite loops or infinite code from hanging the autograder.
Leaderboards: make competitions that use Gradescope’s leaderboard feature.
Extra credit: award points for additional tests.
Getting Started
Installation
autogradescope can be easily installed with pip:
pip install git+https://github.com/eldridgejm/autogradescope
It is also packaged as a Nix flakes. To temporarily install it, run
nix shell github:eldridgejm/autogradescope
You can see the source code on GitHub.
Creating a new autograder
To create a new autograder, run autogradescope
and follow the
instructions. This will create a new directory, autograder
, with a template
autograder. The layout of this directory is:
.
├── data/
├── setup/
├── solution/
├── tests
│ ├── test_private.py
│ └── test_public.py
├── Makefile
└── requirements.txt
Place your public and private tests in tests/test_public.py
and
tests/test_private.py
respectively; the next section describes the
structure of these files. Put your solution code in solution/
, any data
files that the submission will need in data/
, and any Python packages
needed to run the autograder (e.g., pandas, numpy) in requirements.txt
.
Writing test modules
Autograder tests are written as pytest unit tests. That is, each test should
be written as a Python function whose name starts with test_
in a file
whose name starts with test_
in the test/
directory. A test function
should use assert
to check that a particular outcome holds; if it does not,
the test fails and points are not awarded. Tests either award points
or do not; to award partial credit, you can use multiple tests.
Note
Autograder tests are typically divided into two categories: public and private.
Public tests are always visible to students, while private tests are visible
only after the deadline (or are never visible). Reflecting this convention, the
template autograder generated by autogradescope
includes two test
files, tests/test_public.py
and tests/test_private.py
. In principle,
autogradescope
is flexible, and you do not need to use this structure; you
can have as many test files as you like, but it is recommended to keep public
and private tests separate.
The test file should import the student’s submission as a Python module and then write test functions that call functions in the submission. You will want to communicate to students that they should, for example, “write a function named doubler in a fill named pp01.py that takes an integer and returns twice that integer.” If the module cannot be imported (due to the student naming it incorrectly, or because their code imports a module that is not available), the autograder will fail and print a helpful error message.
If a test function includes a docstring, the docstring will be displayed to students as the name of the test (provided that the test is visible to students). It is therefore recommended that each test have an informative docstring, such as “Checks efficiency of your code on a large input” or “Checks that output is sorted”. If a test function does not include a docstring, the function name will be displayed to students instead.
If a test that is visible to students fails, the test code will also be displayed to students to show them precisely what went wrong. For this reason, it is recommended to include comments in the test code to explain what the test is checking for.
An example of a simple autograder test is shown below:
def test_doubling_21():
"""Doubling 21 makes 42."""
# When we give your code 21, it should return 42.
assert pp01.doubler(21) == 42
Configuration
Autograder tests require some configuration. For example, the visibility of the tests to students needs to be specified, as is the (weight) of each test, a timeout limiting the time a test can take before it fails, etc.
Defaults for every test in the file are set by creating an instance of
autogradescope.Settings
and storing it in the global variable
SETTINGS
. For convenience, comments describing the available settings are
included in the starter test files generated by autogradescope
.
To prevent errors, the test file must include a SETTINGS
object; if it
does not, the autograder will not run.
Settings for individual tests can be set using the
autogradescope.decorators.weight()
,
autogradescope.decorators.timeout()
, and
autogradescope.decorators.visibility()
decorators. These decorators can
be used to override the default settings for individual tests, and can be
combined to set multiple settings at once. For example, the following test is
worth 2 points, has a timeout of 10 seconds, and is hidden from students:
@weight(2)
@timeout(10)
@visibility("hidden")
def test_easy_1():
"""Doubling 21 makes 42."""
assert pp01.doubler(21) == 42
Example
The full example below shows a simple autograder test file with a few tests.
"""Public autograder tests."""
from autogradescope import Settings
from autogradescope.decorators import weight, timeout, visibility
# settings =============================================================================
SETTINGS = Settings()
# default_visibility -------------------------------------------------------------------
# This controls the default visibility of the tests. Valid options are:
#
# - "hidden": The test results are never visible to the students.
#
# - "visible": The test results are always visible to the students.
#
# - "after_published": test case will be shown only when the assignment is
# explicitly published from the "Review Grades" page.
#
# - "after_due_date": test case will be shown after the assignment's due date
# has passed. If late submission is allowed, then test will be shown only after
# the late due date.
SETTINGS.default_visibility = "visible"
# default_weight -----------------------------------------------------------------------
# The number of points each test is worth by default.
SETTINGS.default_weight = 1
# default_timeout ----------------------------------------------------------------------
# The number of seconds before a test times out and no points are awarded.
SETTINGS.default_timeout = 60
# leaderboard --------------------------------------------------------------------------
# A dictionary mapping leaderboard categories to scores for this submission. If
# a leaderboard is used, it can be set to an empty dictionary here, and filled in with
# values in the test functions.
# SETTINGS.leaderboard = {}
# tests ================================================================================
# import the student's submission
import pp01
def test_for_smoke():
"""Checks that the submission runs without error."""
pp01.doubler(1)
@weight(2)
def test_easy_1():
"""Doubling 21 makes 42."""
assert pp01.doubler(21) == 42
@timeout(10)
def test_lots_of_doubling():
"""Doubling 1 million gives a big number."""
assert pp01.doubler(1_000_000) == 2 * 1_000_000
For a full example of an autograder that can be compiled and uploaded to Gradescope, see the example directory in the autogradescope GitHub repository.
Building
Running make test
within the autograder directory will simulate running the
autograder against your solution code. This not only ensures that the
autograder is properly configured, but also that the tests pass when run
against the correct solution (as they should).
Running make autograder
(or simply make
) will build the autograder zip
file that you will upload to Gradescope; if the build succeeds, the file will
be located at _build/autograder.zip
. Note that make autograder
also
runs make test
; if the tests fail, the autograder will not be built.
Running make clean
will remove the _build/
directory and its contents.
Features
Timeouts
Gradescope autograders can be configured with a maximum time limit. However, if
a submission takes too long, the autograder is killed and the student loses all
points. To prevent this, you can use the
autogradescope.decorators.timeout()
decorator to set a timeout for
individual tests. If a test takes longer than the specified time, only that
test fails, and the student does not lose points for other tests.
Leaderboards
Gradescope assignments can be configured to have leaderboards that rank
submissions by performance on a particular metric. To use leaderboards
with autogradescope, set the leaderboard
attribute of the
SETTINGS
object to a dictionary mapping leaderboard categories to scores.
There are no restrictions on the leaderboard categories you can use. For
example, to create a leaderboard that ranks submissions by accuracy, you could
write:
CORRECT_LABELS = np.array([0, 1, 0, 1, 1, 0])
YOUR_ACCURACY = (submission.predict() == CORRECT_LABELS).mean()
SETTINGS.leaderboard = {
"accuracy": YOUR_ACCURACY
}
@weight(2)
def test_accuracy_above_75():
"""Accuracy is above 75%."""
assert YOUR_ACCURACY > 0.75
Including doctests
Sometimes you may want to distribute starter code to students that includes
doctests, since doctests can be a useful way to provide examples of how their
code will be called and what it should return. In such cases, you may
want to write an autograder test that checks that the doctests pass (this can
be a useful public test). To do this, you can use autogradescope.doctests.run()
.
For example:
import autogradescope.doctests
# import the student's submission
import pp01
def test_doctests():
"""Check that the doctests pass."""
doctests.run(pp01)
Warning
This will run the doctests as they appear in the student’s submission (which might be different than the doctests in the starter code!). For example, a student can easily make this test pass by deleting the doctests from the starter code. Or, this test might fail because the student _added_ more stringent doctests than the starter code.
Extra credit
A test can be marked as extra credit by using the extra_credit
argument to
the autogradescope.decorators.weight()
decorator. For example:
@weight(2, extra_credit=True)
def test_extra_credit():
"""This test is extra credit."""
assert pp01.doubler(21) == 42
Practically, because Gradescope allows configuring an autograder to have a maximum number of points that is fewer than the sum of the points awarded by the tests, extra credit can be implemented without this decorator. However, this decorator changes how the results are displayed to the student. With this decorator, the test still shows as “green” (i.e., passed) even if the student does not pass the test. Without this decorator, the test shows as “red” (i.e., failed) if the student does not pass the test.
API
Module: autogradescope
- class autogradescope.Settings(default_visibility='after_published', default_weight=1, default_timeout=None, leaderboard=None, failure_message=None)[source]
Bases:
object
Stores autograder settings.
A test module should have a single instance of this object, named
SETTINGS
, that is used to store the settings for the autograder.This object does error checking on the settings to ensure that they are valid, and to prevent typos from causing terrible catastrophes that could result, like students seeing the private tests before the due date.
- default_visibility
The default visibility of the tests. Valid options are:
hidden
: The test results are never visible to the students.visible
: The test results are always visible to the students.after_published
: test case will be shown only when the assignment is explicitly published from the “Review Grades” page. If not provided, this is the default.after_due_date
: test case will be shown after the assignment’s due date has passed. If late submission is allowed, then test will be shown only after the late due date.
- Type:
str
- default_weight
The default weight of the tests. Default is 1.
- Type:
int
- default_timeout
The default timeout for each tests. If
None
, no timeout will be used beyond Gradescope’s own timeout for the autograder process as a whole. Default isNone
.- Type:
Optional[int]
- leaderboard
A dictionary mapping leaderboard categories to scores for this submission. If
None
, no leaderboard will be used. Default isNone
.- Type:
Optional[dict]
- failure_message
A function that formats a failure message for a test. If
None
, a default failure message will be used.- Type:
Optional[Callable]
Module: autogradescope.decorators
Decorators that change the behavior of tests.
- autogradescope.decorators.timeout(seconds: int)[source]
Changes the timeout from the default of 60 seconds.
Module: autogradescope.doctests
A convenience function for running doctests within autograder tests.
- autogradescope.doctests.run(module)[source]
Runs a module’s doctests.
This can be used in autograder tests to ensure that the doctests are correct.
- Parameters:
module (module) – The module to run doctests for.
- Raises:
DoctestError – If any of the doctests fail.
Example
To run the doctests for a student’s submission in an auto-grader test:
import autogradescope.doctests import pp01 def test_doctests(): """Checks that the doctests pass.""" autogradescope.doctests.run(pp01)