publish¶
A tool to build and publish certain artifacts at certain times.
publish was desgined specifically for the automatic publication of course materials, such as homeworks, lecture slides, etc.
Terminology¶
An artifact is a file – usually one that is generated by some build process.
A publication is a coherent group of one or more artifacts and their metadata.
A collection is a group of publications which all satisfy the same schema.
A schema is a set of constraints on a publication’s artifacts and metadata.
This establishes a collection -> publication -> artifact hierarchy: each artifact belongs to exactly one publication, and each publication belongs to exactly one collection.
An example of such a hierarchy is the following: all homeworks in a course form a collection. Each publication within the collection is an individual homework. Each publication may have several artifacts, such as the PDF of the problem set, the PDF of the solutions, and a .zip containing the homework’s data.
An artifact may have a release time, before which it will not be built or published. Likewise, entire publications can have release times, too.
Discovering, Building, and Publishing¶
When run as a script, this package follows a three step process of discovering, building, and publishing artifacts.
In the discovery step, the script constructs a collection -> publication -> artifact hierarchy by recursively searching an input directory for artifacts.
In the build step, the script builds every artifact whose release time has passed.
In the publish step, the script copies every released artifact to an output directory.
Discovery¶
In the discovery step, the input directory is recursively searched for collections, publications, and artifacts.
A collection is defined by creating a file named collections.yaml
in a directory.
The contents of the file describe the artifacts and metadata that are required
of each of the publications within the collection. For instance:
# <input_directory>/homeworks/collection.yaml
schema:
required_artifacts:
- homework.pdf
- solution.pdf
optional_artifacts:
- template.zip
metadata_schema:
name:
type: string
due:
type: datetime
released:
type: date
The file above specifies that publications must have homework.pdf
and
solution.pdf
artifacts, and may or may not have a template.zip
artifact. The publications must also have name, due, and released fields
in their metadata with the listed types. The metadata specification is given in a form
recognizable by the cerberus Python package.
A publication and its artifacts are defined by creating a publish.yaml
file
in the directory containing the publication. For instance, the file below
describes how and when to build two artifacts named homework.pdf
and solution.pdf
,
along with metadata:
# <input_directory>/homeworks/01-intro/publish.yaml
metadata:
name: Homework 01
due: 2020-09-04 23:59:00
released: 2020-09-01
artifacts:
homework.pdf:
recipe: make homework
solution.pdf:
file: ./build/solution.pdf
recipe: make solution
release_time: 1 day after metadata.due
ready: false
missing_ok: false
The file
field tells publish where the file will appear when the recipe
is run. is omitted, its value is assumed to be the artifact’s key – for
instance, homework.pdf
’s file
field is simply homework.pdf
.
The release_time
field provides the artifact’s release time. It can be a
specific datetime in ISO 8601 format, like 2020-09-18 17:00:00
, or a
relative date of the form “<number> (hour|day)[s]{0,1} (before|after)
metadata.<field>”, in which case the date will be calculated relative to the
metadata field. The field it refers to must be a datetime.
The ready
field is a manual override which prevents the artifact from
being built and published before it is ready. If not provided, the artifact
is assumed to be ready.
THe missing_ok
field is a boolean which, if false
, causes an error to
be raised if the artifact’s file is missing after the build. This is the
default behavior. If set to true
, no error is raised. This can be useful
when the artifact file is manually placed in the directory and it is
undesirable to repeatedly edit publish.yaml
to add the artifact.
Publications may also have release_time
and ready
attributes. If these
are provided they will take precedence over the attributes of an individual
artifact in the publication. The release time of the publication can be used
to control when its metadata becomes available – before the release time,
the publication in effect does not exist.
The file hierarchy determines which publications belong to which collections.
If a publication file is placed in a directory that is a descendent of a
directory containing a collection file, the publication will be placed in that
collection and its contents will be validated against the collection’s schema.
Publications which are not under a directory containing a collection.yaml
are placed into a “default” collection with no schema. They may contain any
number of artifacts and metadata keys.
Collections, publications, and artifacts all have keys which locate them
within the hierarchy. These keys are inferred from their position in the
filesystem. For example, a collection file placed at
<input_directory>/homeworks/collection.yaml
will create a collection keyed
“homeworks”. A publication within the collection at
<input_directory>/homeworks/01-intro/publish.yaml
will be keyed “01-intro”.
The keys of the artifacts are simply their keys within the publish.yaml
file.
Building¶
Once all collections, publications, and artifacts have been discovered, the script moves to the build phase.
Artifacts are built by running the command given in the artifact’s recipe
field within the directory containing the artifact’s publication.yaml
file.
Different artifacts should have “orthogonal” build processes so that the order
in which the artifacts are built is inconsequential.
If an error occurs during any build the entire process is halted and the program returns without continuing on to the publish phase. An error is considered to occur if the build process returns a nonzero error code, or if the artifact file is missing after the recipe is run.
Publishing¶
In the publish phase, all published artifacts – that is, those which are ready and whose release date has passed – are copied to an output directory. Additionally, a JSON file containing information about the collection -> publication -> artifact hierarchy is placed at the root of the output directory.
Artifacts are copied to a location within the output directory according to the following “formula”:
<output_directory>/<collection_key>/<publication_key>/<artifact_key>
For instance, an artifact keyed homework.pdf
in the 01-intro
publication
of the homeworks
collection will be copied to:
<output_directory>/homeworks/01-intro/homework.pdf
An artifact which has not been released will not be copied, even if the artifact file exists.
publish will create a JSON file named <output_directory>/published.json
.
This file contains nested dictionaries describing the structure of the
collection -> publication -> artifact hierarchy.
For example, the below code will load the JSON file and print the path of a published artifact relative to the output directory, as well as a publication’s metadata.
>>> import json
>>> d = json.load(open('published.json'))
>>> d['collections']['homeworks']['publications']['01-intro']['artifacts']['homework.pdf']['path']
homeworks/01-intro/homework.pdf
>>> d['collections']['homeworks']['publications']['01-intro']['metadata']['due']
2020-09-10 23:59:00
Only those publications and artifacts which have been published appear in the JSON file. In particular, if an artifact has not reached its release time, it will be missing from the JSON representation entirely.
API¶
publish can also be used as a Python package. Its behavior when run as a
script can be reproduced using three high-level functions: discover()
,
build()
, and publish.publish()
.
>>> discovered = publish.discover('path/to/input_directory')
>>> built = publish.build(discovered)
>>> published = publish.publish(built, 'path/to/output/directory')
These functions can be used to build and publish individual collections, publications, and artifacts as well, as described below.
The full API of the package is as follows:
Exceptions
|
Generic error. |
|
Publication does not satisfy schema. |
|
A configuration file is not valid. |
|
Problem while building the artifact. |
Types
|
The inputs needed to build an artifact. |
|
The results of building an artifact. |
|
A published artifact. |
|
A publication. |
|
A collection. |
|
Container of all collections. |
|
Rules governing publications. |
Functions
|
Build a universe/collection/publication/artifact. |
|
Reconstruct a universe/collection/publication/artifact from JSON. |
|
Discover the collections and publications in the filesystem. |
|
Remove nodes from a Universe/Collection/Publication. |
|
Publish a universe/collection/publication/artifact by copying it. |
|
Read a |
|
Read a |
|
Serialize the universe/collection/publication/artifact to JSON. |
|
Make sure that a publication satisfies the schema. |
Types¶
publish provides several types for representing collections, publications, and artifacts.
|
The inputs needed to build an artifact. |
|
The results of building an artifact. |
|
A published artifact. |
|
A publication. |
|
A collection. |
|
Container of all collections. |
There are three artifact types, each used to represent artifacts at different
stages of the discover -> build -> publish process. Each are subclasses of
typing.NamedTuple
.
-
class
publish.
UnbuiltArtifact
(workdir: pathlib.Path, file: str, recipe: Optional[str] = None, release_time: Optional[datetime.datetime] = None, ready: bool = True, missing_ok: bool = False)¶ The inputs needed to build an artifact.
-
workdir
¶ Absolute path to the working directory used to build the artifact.
- Type
pathlib.Path
-
file
¶ Path (relative to the workdir) of the file produced by the build.
- Type
str
-
recipe
¶ Command used to build the artifact. If None, no command is necessary.
- Type
Union[str, None]
-
release_time
¶ Time/date the artifact should be made public. If None, it is always available.
- Type
Union[datetime.datetime, None]
-
ready
¶ Whether or not the artifact is ready for publication. Default: True.
- Type
bool
-
missing_ok
¶ If True and the file is missing after building, then no error is raised and the result of the build is None.
- Type
bool
-
-
class
publish.
BuiltArtifact
(workdir: pathlib.Path, file: str, returncode: Optional[int] = None, stdout: Optional[str] = None, stderr: Optional[str] = None)¶ The results of building an artifact.
-
workdir
¶ Absolute path to the working directory used to build the artifact.
- Type
pathlib.Path
-
file
¶ Path (relative to the workdir) of the file produced by the build.
- Type
str
-
returncode
¶ The build process’s return code. If None, there was no process.
- Type
int
-
stdout
¶ The build process’s stdout. If None, there was no process.
- Type
str
-
stderr
¶ The build process’s stderr. If None, there was no process.
- Type
str
-
-
class
publish.
PublishedArtifact
(path: str)¶ A published artifact.
-
path
¶ The path to the artifact’s file relative to the output directory.
- Type
str
-
For convenience, all three of these types inherit from an Artifact
base class. This makes it easy to check whether an object is an artifact of
any kind using isinstance(x, publish.Artifact)
.
Publications and collections are represented with the Publication
and
Collection
types. Furthermore, a set of collections is represented
with the Universe
type. These three types all inherit from
typing.NamedTuple
.
-
class
publish.
Publication
(metadata: Mapping[str, Any], artifacts: Mapping[str, publish.types.Artifact], ready: bool = True, release_time: Optional[datetime.datetime] = None)¶ A publication.
-
artifacts
¶ The artifacts contained in the publication.
- Type
Dict[str, Artifact]
-
metadata
¶ The metadata dictionary.
- Type
Dict[str, Any]
-
ready
¶ If False, this publication is not ready and will not be published.
- Type
Optional[bool]
-
release_time
¶ The time before which this publication will not be released.
- Type
Optional[datetime.datetime]
-
-
class
publish.
Collection
(schema: Schema, publications: Mapping[str, publish.types.Publication])¶ A collection.
-
publications
¶ The publications contained in the collection.
- Type
Mapping[str, Publication]
-
-
class
publish.
Universe
(collections: Mapping[str, publish.types.Collection])¶ Container of all collections.
-
collections
¶ The collections.
- Type
Dict[str, Collection]
-
These types exist within a hierarchy: A Universe
contains instances of
Collection
which contain instances of Publication
which
contain instances of Artifact
. Universe
, Collection
,
and Publication
are internal nodes of the hierarchy, while
Artifact
instances are leaf nodes.
Internal node types share several methods and attributes, almost as if they were inherited from a parent “InternalNode” base class (which doesn’t exist in actuality):
-
class
publish.
InternalNode
¶ -
_deep_asdict
(self)¶ Recursively compute a dictionary representation of the object.
-
_replace_children
(self, new_children)¶ Replace the node’s children with a new set of children.
-
_children
¶ The node’s children.
-
For instance, the ._children
attribute of a Collection
returns a
dictionary mapping publication keys to Publication
instances.
Schemas and Validation¶
Schemas used to validate publications are represented with the Schema
class.
-
class
publish.
Schema
(required_artifacts: Collection[str], optional_artifacts: Optional[Collection[str]] = None, metadata_schema: Optional[Mapping[str, Mapping]] = None, allow_unspecified_artifacts: bool = False, is_ordered: bool = False)¶ Rules governing publications.
-
required_artifacts
¶ Names of artifacts that publications must contain.
- Type
typing.Collection[str]
-
optional_artifacts
¶ Names of artifacts that publication are permitted to contain. Default: empty list.
- Type
typing.Collection[str], optional
-
metadata_schema
¶ A dictionary describing a schema used to validate publication metadata. In the style of cerberus. If None, no validation will be performed. Default: None.
- Type
Mapping[str, Any], optional
-
allow_unspecified_artifacts
¶ Is it permissible for a publication to have unknown artifacts? Default: False.
- Type
Optional[Boolean]
-
is_ordered
¶ Should the publications be considered ordered by their keys? Default: False
- Type
Optional[Boolean]
-
Validation is performed with the following function:
-
publish.
validate
(publication: publish.types.Publication, against: publish.types.Schema)¶ Make sure that a publication satisfies the schema.
This checks the publication’s metadata dictionary against
against.metadata_schema
. Verifies that all required artifacts are provided, and that no unknown artifacts are given (unlessschema.allow_unspecified_artifacts == True
).- Parameters
publication (Publication) – A fully-specified publication.
against (Schema) – A schema for validating the publication.
- Raises
ValidationError – If the publication does not satisfy the schema’s constraints.
Discovery¶
The discovery of collections, publications, and artifacts is performed using the
discover()
function.
-
publish.
discover
(input_directory, skip_directories=None, callbacks=None, date_context=None, template_vars=None)¶ Discover the collections and publications in the filesystem.
- Parameters
input_directory (Path) – The path to the directory that will be recursively searched.
skip_directories (Optional[Collection[str]]) – A collection of directory names that should be skipped if discovered. If None, no directories will be skipped.
callbacks (Optional[DiscoverCallbacks]) – Callbacks to be invoked during the discovery. If omitted, no callbacks are executed. See
DiscoverCallbacks
for the possible callbacks and their arguments.date_context (Optional[DateContext]) – A date context used to evaluate smart dates. If
None
, an empty context is used.
- Returns
The collections and the nested publications and artifacts, contained in a
Universe
instance.- Return type
Callbacks are invoked at certain points during the discovery. To provide
callbacks to the function, subclass and override the desired members of the
below class, and provide an instance to discover()
.
-
class
publish.
DiscoverCallbacks
¶ Callbacks used in
discover()
. Defaults do nothing.-
on_collection
(path)¶ When a collection is discovered.
- Parameters
path (pathlib.Path) – The path of the collection file.
-
on_publication
(path)¶ When a publication is discovered.
- Parameters
path (pathlib.Path) – The path of the publication file.
-
on_skip
(path)¶ When a directory is skipped.
- Parameters
path (pathlib.Path) – The path of the directory to be skipped.
-
Two low-level functions read_collection_file()
and
read_publication_file()
are also available for reading individual
collection and publication files. Note that they are not recursive: reading a
collection file does not load any publications into the collection. Most of the
time, you probably want discover()
.
-
publish.
read_collection_file
(path)¶ Read a
Collection
from a yaml file.- Parameters
path (pathlib.Path) – Path to the collection file.
- Returns
The collection object with no attached publications.
- Return type
Notes
The file should have one key, “schema”, whose value is a dictionary with the following keys/values:
- required_artifacts
A list of artifacts names that are required
- optional_artifacts [optional]
A list of artifacts that are optional. If not provided, the default value of [] (empty list) will be used.
- metadata_schema [optional]
A dictionary describing a schema for validating publication metadata. The dictionary should deserialize to something recognized by the cerberus package. If not provided, the default value of None will be used.
- allow_unspecified_artifacts [optional]
Whether or not to allow unspecified artifacts in the publications. Default: False.
-
publish.
read_publication_file
(path, schema=None, date_context=None, template_vars=None)¶ Read a
Publication
from a yaml file.- Parameters
path (pathlib.Path) – Path to the collection file.
schema (Optional[Schema]) – A schema for validating the publication. Default: None, in which case the publication’s metadata are not validated.
date_context (Optional[DateContext]) – A context used to evaluate smart dates. If None, no context is provided.
- Returns
The publication.
- Return type
- Raises
DiscoveryError – If the publication file’s contents are invalid.
Notes
The file should have a “metadata” key whose value is a dictionary of metadata. It should also have an “artifacts” key whose value is a dictionary mapping artifact names to artifact definitions.
Optionally, the file can have a “release_time” key providing a time at which the publication should be considered released. It may also have a “ready” key; if this is False, the publication will not be considered released.
If the
schema
argument is not provided, only very basic validation is performed by this function. Namely, the metadata schema and required/optional artifacts are not enforced. See thevalidate()
function for validating these aspects of the publication. If the schema is provided,validate()
is called as a convenience.
Build¶
The building of whole collections, publications, and artifacts is performed
with the build()
function.
-
publish.
build
(parent: Union[publish.types.Universe, publish.types.Collection, publish.types.Publication, publish.types.UnbuiltArtifact], *, ignore_release_time=False, verbose=False, now=<built-in method now of type object>, run=<function run>, exists=<function Path.exists>, callbacks=None)¶ Build a universe/collection/publication/artifact.
- Parameters
parent (Union[Universe, Collection, Publication, UnbuiltArtifact]) – The thing to build. Operates recursively, so if given a
Universe
, for instance, will build all of the artifacts within.ignore_release_time (bool) – If
True
, all artifacts will be built, even if their release time has not yet passed.callbacks (Optional[BuildCallbacks]) – Callbacks to be invoked during the build. If omitted, no callbacks are executed. See
BuildCallbacks
for the possible callbacks and their arguments.
- Returns
A copy of the parent where each leaf artifact is replaced with an instance of
BuiltArtifact
. If the thing to be built is not built due to being unreleased,None
is returned.- Return type
Optional[type(parent)]
Note
If a publication or artifact is not yet released, either due to its release time being in the future or because it is marked as not ready, its recipe will not be run. If the parent node is a publication or artifact that is not built, the result of this function is None. If the parent node is a collection or universe, all of the unbuilt publications and artifacts within are recursively removed from the tree.
Callbacks are invoked at certain points during the build. To provide callbacks
to the function, subclass and override the desired members of the below class,
and provide an instance to build()
.
-
class
publish.
BuildCallbacks
¶ Callbacks used by
build()
-
on_build
(key, node)¶ Called when building a collection/publication/artifact.
-
on_missing
(artifact: publish.types.UnbuiltArtifact)¶ Called when the artifact file is missing, but missing is OK.
-
on_not_ready
(artifact: publish.types.UnbuiltArtifact)¶ Called when the artifact is not ready.
-
on_recipe
(artifact: publish.types.UnbuiltArtifact)¶ Called when artifact is being built using its recipe.
-
on_success
(artifact: publish.types.BuiltArtifact)¶ Called when the build succeeded.
-
on_too_soon
(artifact: publish.types.UnbuiltArtifact)¶ Called when it is too soon to release the artifact.
-
Publish¶
-
publish.
publish
(parent, outdir, prefix='', callbacks=None)¶ Publish a universe/collection/publication/artifact by copying it.
- Parameters
parent (Union[Universe, Collection, Publication, BuiltArtifact]) – The thing to publish.
outdir (pathlib.Path) – Path to the output directory where artifacts will be copied.
prefix (str) – String to prepend between output directory path and the keys of the children. If the thing being published is a
BuiltArtifact
, this is simply the filename.callbacks (PublishCallbacks) – Callbacks to be invoked during the publication. If omitted, no callbacks are executed. See
PublishCallbacks
for the possible callbacks and their arguments.
- Returns
A copy of the parent, but with all leaf artifact nodes replace by
PublishedArtifact
instances. Artifacts which have not yet been released are still converted to PublishedArtifact, but theirpath
is set toNone
.- Return type
type(parent)
Notes
The prefix is build up recursively, so that calling this function on a universe will publish each artifact to
<prefix><collection_key>/<publication_key>/<artifact_key>
Callbacks are invoked at certain points during the publication. To provide
callbacks to the function, subclass and override the desired members of the
below class, and provide an instance to publish()
.
Serializtion¶
Two functions are provided for serializing and deserializing objects to and from JSON.
-
publish.
serialize
(node)¶ Serialize the universe/collection/publication/artifact to JSON.
- Parameters
node (Union[Universe, Collection, Publication, Artifact]) – The thing to serialize as JSON.
- Returns
The object serialized as JSON.
- Return type
str
-
publish.
deserialize
(s)¶ Reconstruct a universe/collection/publication/artifact from JSON.
- Parameters
s (str) – The JSON to deserialize.
- Returns
The reconstructed object; its type is inferred from the string.
- Return type
Universe/Collection/Publication/Artifact
Filtering¶
Collections, publications, and artifacts can be removed using
filter_nodes()
.
-
publish.
filter_nodes
(parent, predicate, remove_empty_nodes=False, callbacks=None)¶ Remove nodes from a Universe/Collection/Publication.
- Parameters
parent – The root of the tree.
predicate (Callable[[node], bool]) – A function which takes in a node and returns True/False whether it should be kept.
remove_empty_nodes (bool) – Whether nodes without children should be removed (True) or preserved (False). Default: False.
- Returns
An object of the same type as the parent, but wth all filtered nodes removed. Furthermore, if a node has no children after filtering, it is removed.
- Return type
type(parent)