.. autogradescope documentation master file, created by sphinx-quickstart on Sat Mar 16 09:49:20 2024. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. autogradescope ============== `autogradescope` is a Python package for creating `Gradescope `_ autograders with `pytest `_. Features include: * **Convenience**: write autograders in Python using `pytest `_ (instead of ``unittest``). * **Informative failure messages**: provides students with detailed feedback on their code, including the source code of failing tests. * **Weigh tests**: assign more points to important tests. * **Flexible visibility**: show tests to students only after the deadline, or never. * **Timeouts**: prevent infinite loops or infinite code from hanging the autograder. * **Leaderboards**: make competitions that use Gradescope's leaderboard feature. * **Extra credit**: award points for additional tests. Getting Started =============== Installation ------------ `autogradescope` can be easily installed with `pip`: .. code-block:: bash pip install git+https://github.com/eldridgejm/autogradescope It is also packaged as a Nix flakes. To temporarily install it, run .. code-block:: nix nix shell github:eldridgejm/autogradescope You can see the source code on `GitHub `_. Creating a new autograder ------------------------- To create a new autograder, run ``autogradescope`` and follow the instructions. This will create a new directory, ``autograder``, with a template autograder. The layout of this directory is: .. code-block:: bash . ├── data/ ├── setup/ ├── solution/ ├── tests │ ├── test_private.py │ └── test_public.py ├── Makefile └── requirements.txt Place your public and private tests in ``tests/test_public.py`` and ``tests/test_private.py`` respectively; the next section describes the structure of these files. Put your solution code in ``solution/``, any data files that the submission will need in ``data/``, and any Python packages needed to run the autograder (e.g., `pandas`, `numpy`) in ``requirements.txt``. Writing test modules -------------------- Autograder tests are written as `pytest` unit tests. That is, each test should be written as a Python function whose name starts with ``test_`` in a file whose name starts with ``test_`` in the ``test/`` directory. A test function should use ``assert`` to check that a particular outcome holds; if it does not, the test fails and points are not awarded. Tests either award points or do not; to award partial credit, you can use multiple tests. .. note:: Autograder tests are typically divided into two categories: public and private. Public tests are always visible to students, while private tests are visible only after the deadline (or are never visible). Reflecting this convention, the template autograder generated by ``autogradescope`` includes two test files, ``tests/test_public.py`` and ``tests/test_private.py``. In principle, ``autogradescope`` is flexible, and you do not need to use this structure; you can have as many test files as you like, but it is recommended to keep public and private tests separate. The test file should import the student's submission as a Python module and then write test functions that call functions in the submission. You will want to communicate to students that they should, for example, "write a function named `doubler` in a fill named `pp01.py` that takes an integer and returns twice that integer." If the module cannot be imported (due to the student naming it incorrectly, or because their code imports a module that is not available), the autograder will fail and print a helpful error message. If a test function includes a docstring, the docstring will be displayed to students as the name of the test (provided that the test is visible to students). It is therefore recommended that each test have an informative docstring, such as "Checks efficiency of your code on a large input" or "Checks that output is sorted". If a test function does not include a docstring, the function name will be displayed to students instead. If a test that is visible to students fails, the test code will also be displayed to students to show them precisely what went wrong. For this reason, it is recommended to include comments in the test code to explain what the test is checking for. An example of a simple autograder test is shown below: .. code-block:: python def test_doubling_21(): """Doubling 21 makes 42.""" # When we give your code 21, it should return 42. assert pp01.doubler(21) == 42 Configuration ------------- Autograder tests require some configuration. For example, the visibility of the tests to students needs to be specified, as is the (weight) of each test, a timeout limiting the time a test can take before it fails, etc. Defaults for every test in the file are set by creating an instance of :class:`autogradescope.Settings` and storing it in the global variable ``SETTINGS``. For convenience, comments describing the available settings are included in the starter test files generated by ``autogradescope``. To prevent errors, the test file *must* include a ``SETTINGS`` object; if it does not, the autograder will not run. Settings for individual tests can be set using the :func:`autogradescope.decorators.weight`, :func:`autogradescope.decorators.timeout`, and :func:`autogradescope.decorators.visibility` decorators. These decorators can be used to override the default settings for individual tests, and can be combined to set multiple settings at once. For example, the following test is worth 2 points, has a timeout of 10 seconds, and is hidden from students: .. code-block:: python @weight(2) @timeout(10) @visibility("hidden") def test_easy_1(): """Doubling 21 makes 42.""" assert pp01.doubler(21) == 42 Example ------- The full example below shows a simple autograder test file with a few tests. .. code-block:: python """Public autograder tests.""" from autogradescope import Settings from autogradescope.decorators import weight, timeout, visibility # settings ============================================================================= SETTINGS = Settings() # default_visibility ------------------------------------------------------------------- # This controls the default visibility of the tests. Valid options are: # # - "hidden": The test results are never visible to the students. # # - "visible": The test results are always visible to the students. # # - "after_published": test case will be shown only when the assignment is # explicitly published from the "Review Grades" page. # # - "after_due_date": test case will be shown after the assignment's due date # has passed. If late submission is allowed, then test will be shown only after # the late due date. SETTINGS.default_visibility = "visible" # default_weight ----------------------------------------------------------------------- # The number of points each test is worth by default. SETTINGS.default_weight = 1 # default_timeout ---------------------------------------------------------------------- # The number of seconds before a test times out and no points are awarded. SETTINGS.default_timeout = 60 # leaderboard -------------------------------------------------------------------------- # A dictionary mapping leaderboard categories to scores for this submission. If # a leaderboard is used, it can be set to an empty dictionary here, and filled in with # values in the test functions. # SETTINGS.leaderboard = {} # tests ================================================================================ # import the student's submission import pp01 def test_for_smoke(): """Checks that the submission runs without error.""" pp01.doubler(1) @weight(2) def test_easy_1(): """Doubling 21 makes 42.""" assert pp01.doubler(21) == 42 @timeout(10) def test_lots_of_doubling(): """Doubling 1 million gives a big number.""" assert pp01.doubler(1_000_000) == 2 * 1_000_000 For a full example of an autograder that can be compiled and uploaded to Gradescope, see the `example` directory in the `autogradescope` `GitHub repository `_. Building -------- Running ``make test`` within the autograder directory will simulate running the autograder against your solution code. This not only ensures that the autograder is properly configured, but also that the tests pass when run against the correct solution (as they should). Running ``make autograder`` (or simply ``make``) will build the autograder zip file that you will upload to Gradescope; if the build succeeds, the file will be located at ``_build/autograder.zip``. Note that ``make autograder`` also runs ``make test``; if the tests fail, the autograder will not be built. Running ``make clean`` will remove the ``_build/`` directory and its contents. Features ======== Timeouts -------- Gradescope autograders can be configured with a maximum time limit. However, if a submission takes too long, the autograder is killed and the student loses all points. To prevent this, you can use the :func:`autogradescope.decorators.timeout` decorator to set a timeout for individual tests. If a test takes longer than the specified time, only that test fails, and the student does not lose points for other tests. Leaderboards ------------ Gradescope assignments can be configured to have leaderboards that rank submissions by performance on a particular metric. To use leaderboards with `autogradescope`, set the ``leaderboard`` attribute of the ``SETTINGS`` object to a dictionary mapping leaderboard categories to scores. There are no restrictions on the leaderboard categories you can use. For example, to create a leaderboard that ranks submissions by accuracy, you could write: .. code-block:: python CORRECT_LABELS = np.array([0, 1, 0, 1, 1, 0]) YOUR_ACCURACY = (submission.predict() == CORRECT_LABELS).mean() SETTINGS.leaderboard = { "accuracy": YOUR_ACCURACY } @weight(2) def test_accuracy_above_75(): """Accuracy is above 75%.""" assert YOUR_ACCURACY > 0.75 Including doctests ------------------ Sometimes you may want to distribute starter code to students that includes doctests, since doctests can be a useful way to provide examples of how their code will be called and what it should return. In such cases, you may want to write an autograder test that checks that the doctests pass (this can be a useful public test). To do this, you can use :func:`autogradescope.doctests.run`. For example: .. code-block:: python import autogradescope.doctests # import the student's submission import pp01 def test_doctests(): """Check that the doctests pass.""" doctests.run(pp01) .. warning:: This will run the doctests as they appear in the student's submission (which might be different than the doctests in the starter code!). For example, a student can easily make this test pass by deleting the doctests from the starter code. Or, this test might fail because the student _added_ more stringent doctests than the starter code. Extra credit ------------ A test can be marked as extra credit by using the ``extra_credit`` argument to the :func:`autogradescope.decorators.weight` decorator. For example: .. code-block:: python @weight(2, extra_credit=True) def test_extra_credit(): """This test is extra credit.""" assert pp01.doubler(21) == 42 Practically, because Gradescope allows configuring an autograder to have a maximum number of points that is fewer than the sum of the points awarded by the tests, extra credit can be implemented without this decorator. However, this decorator changes how the results are displayed to the student. With this decorator, the test still shows as "green" (i.e., passed) even if the student does not pass the test. Without this decorator, the test shows as "red" (i.e., failed) if the student does not pass the test. API === Module: :mod:`autogradescope` ----------------------------- .. automodule:: autogradescope :members: :undoc-members: :show-inheritance: Module: :mod:`autogradescope.decorators` ---------------------------------------- .. automodule:: autogradescope.decorators :members: :undoc-members: :show-inheritance: Module: :mod:`autogradescope.doctests` -------------------------------------- .. automodule:: autogradescope.doctests :members: :undoc-members: :show-inheritance: .. toctree:: :maxdepth: 2 :caption: Contents: Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`