yt includes a testing suite which one can run on the codebase to assure that no breaks in functionality have occurred. This testing suite is based on the Nose testing framework. The suite consists of two components, unit tests and answer tests. Unit tests confirm that an isolated piece of functionality runs without failure for inputs with known correct outputs. Answer tests verify the integration and compatibility of the individual code unit by generating output from user-visible yt functions and comparing and matching the results against outputs of the same function produced using older versions of the yt codebase. This ensures consistency in results as development proceeds.
The testing suite should be run locally by developers to make sure they aren’t checking in any code that breaks existing functionality. To further this goal, an automatic buildbot runs the test suite after each code commit to confirm that yt hasn’t broken recently. To supplement this effort, we also maintain a continuous integration server that runs the tests with each commit to the yt version control repository.
What do Unit Tests Do¶
Unit tests are tests that operate on some small set of machinery, and verify that the machinery works. yt uses the Nose framework for running unit tests. In practice, what this means is that we write scripts that assert statements, and Nose identifies those scripts, runs them, and verifies that the assertions are true and the code runs without crashing.
How to Run the Unit Tests¶
One can run the unit tests very straightforwardly from any python interpreter that can import the yt module:
import yt yt.run_nose()
If you are developing new functionality, it is sometimes more convenient to use
the Nose command line interface,
nosetests. You can run the unit tests
nose by navigating to the base directory of the yt git
repository and invoking
$ cd $YT_GIT $ nosetests
$YT_GIT is the path to the root of the yt git repository.
If you want to specify a specific unit test to run (and not run the entire
suite), you can do so by specifying the path of the test relative to the
$YT_GIT/yt directory – note that you strip off one yt more than you
normally would! For example, if you want to run the plot_window tests, you’d
$ nosetests yt/visualization/tests/test_plotwindow.py
Handling yt dependencies¶
We attempt to make yt compatible with a wide variety of upstream software versions. However, sometimes a specific version of a project that yt depends on causes some breakage and must be blacklisted in the tests or a more experimental project that yt depends on optionally might change sufficiently that the yt community decides not to support an old version of that project.
To handle cases like this, the versions of upstream software projects installed on the machines running the yt test suite are pinned to specific version numbers that must be updated manually. This prevents breaking the yt tests when a new version of an upstream dependency is released and allows us to manage updates in upstream projects at our pace.
If you would like to add a new dependency for yt (even an optional dependency)
or would like to update a version of a yt dependency, you must edit the
tests/test_requirements.txt file, this path is relative to the root of the
repository. This file contains an enumerated list of direct dependencies and
pinned version numbers. For new dependencies, simply append the name of the new
dependency to the end of the file, along with a pin to the latest version
number of the package. To update a package’s version, simply update the version
number in the entry for that package.
Finally, we also run a set of tests with “minimal” dependencies installed. Please make sure any new tests you add that depend on an optional dependency are properly set up so that the test is not run if the dependency is not installed. If for some reason you need to update the listing of packages that are installed for the “minimal” dependency tests, you will need to edit
How to Write Unit Tests¶
yt provides several pieces of testing assistance, all in the
module. Describing them in detail is somewhat outside the scope of this
document, as in some cases they belong to other packages. However, a few come
fake_random_ds()provides the ability to create a random dataset, with several fields and divided into several different grids, that can be operated on.
assert_equal()can operate on arrays.
assert_almost_equal()can operate on arrays and accepts a relative allowable difference.
assert_allclose_units()raises an error if two arrays are not equal up to a desired absolute or relative tolerance. This wraps numpy’s assert_allclose to correctly verify unit consistency as well.
amrspace()provides the ability to create AMR grid structures.
expand_keywords()provides the ability to iterate over many values for keywords.
To create new unit tests:
Create a new
tests/directory next to the file containing the functionality you want to test and add an empty
__init__.pyfile to it.
Inside that directory, create a new python file prefixed with
test_and including the name of the functionality.
Inside that file, create one or more routines prefixed with
test_that accept no arguments. The test function should do some work that tests some functionality and should also verify that the results are correct using assert statements or functions.
yielda tuple of the form
argument_two, etc. For example
yield my_test, 'banana', 2.0would be captured by nose and the
my_testfunction will be run with the provided arguments.
fake_random_dsto test on datasets, and be sure to test for several combinations of
nproc, so that domain decomposition can be tested as well.
Test multiple combinations of options by using the
expand_keywords()function, which will enable much easier iteration over options.
For an example of how to write unit tests, look at the file
yt/data_objects/tests/test_covering_grid.py, which covers a great deal of
Debugging failing tests¶
When writing new tests, often one exposes bugs or writes a test incorrectly,
causing an exception to be raised or a failed test. To help debug issues like
nose can drop into a debugger whenever a test fails or raises an
exception. This can be accomplished by passing
nosetests executable. These options will drop into the pdb debugger
whenever an error is raised or a failure happens, respectively. Inside the
debugger you can interactively print out variables and go up and down the call
stack to determine the context for your failure or error.
nosetests --pdb --pdb-failures
In addition, one can debug more crudely using print statements. To do this,
you can add print statements to the code as normal. However, the test runner
will capture all print output by default. To ensure that output gets printed
to your terminal while the tests are running, pass
-s to the
Lastly, to quickly debug a specific failing test, it is best to only run that
one test during your testing session. This can be accomplished by explicitly
passing the name of the test function or class to
nosetests, as in the
$ nosetests yt.visualization.tests.test_plotwindow:TestSetWidth
This nosetests invocation will only run the tests defined by the
Finally, to determine which test is failing while the tests are running, it helps
to run the tests in “verbose” mode. This can be done by passing the
All of the above
nosetests options can be combined. So, for example to run
TestSetWidth tests with verbose output, letting the output of print
statements come out on the terminal prompt, and enabling pdb debugging on errors
or test failures, one would do:
$ nosetests --pdb --pdb-failures -v -s yt.visualization.tests.test_plotwindow:TestSetWidth
What do Answer Tests Do¶
Answer tests test actual data, and many operations on that data, to make sure that answers don’t drift over time. This is how we test frontends, as opposed to operations, in yt.
How to Run the Answer Tests¶
The very first step is to make a directory and copy over the data against which
you want to test. Next, add the config parameter
test_data_dir pointing to
directory with the test data you want to test with, e.g.:
$ yt config set yt test_data_dir /Users/tomservo/src/yt-data
We use a number of real-world datasets for the tests that must be downloaded and
unzipped in the
test_data_dir path you have set. The test datasets, can be
downloaded from https://yt-project.org/data/. We do not explicitly list the
datasets we use in the tests here because the list of necessary datasets changes
regularly, instead you should take a look at the tests you would like to run and
make sure that the necessary data files are downloaded before running the tests.
To run the answer tests, you must first generate a set of test answers locally on a “known good” revision, then update to the revision you want to test, and run the tests again using the locally stored answers.
Let’s focus on running the answer tests for a single frontend. It’s possible to run the answer tests for all the frontends, but due to the large number of test datasets we currently use this is not normally done except on the yt project’s contiguous integration server.
$ cd $YT_GIT $ nosetests --with-answer-testing --local --local-dir $HOME/Documents/test --answer-store --answer-name=local-tipsy yt.frontends.tipsy
This command will create a set of local answers from the tipsy frontend tests
and store them in
$HOME/Documents/test (this can but does not have to be the
same directory as the
test_data_dir configuration variable defined in your
~/.config/yt/ytrc file) in a file named
local-tipsy. To run the tipsy
frontend’s answer tests using a different yt changeset, update to that
changeset, recompile if necessary, and run the tests using the following
$ nosetests --with-answer-testing --local --local-dir $HOME/Documents/test --answer-name=local-tipsy yt.frontends.tipsy
The results from a nose testing session are pretty straightforward to
understand, the results for each test are printed directly to STDOUT. If a test
passes, nose prints a period, F if a test fails, and E if the test encounters an
exception or errors out for some reason. Explicit descriptions for each test
are also printed if you pass
-v to the
nosetests executable. If you
want to also run tests for the ‘big’ datasets, then you will need to pass
nosetests. For example, to run the tests for the
OWLS frontend, do the following:
$ nosetests --with-answer-testing --local --local-dir $HOME/Documents/test --answer-store --answer-big-data yt.frontends.owls
How to Write Answer Tests¶
Tests can be added in the file
You can find examples there of how to write a test. Here is a trivial example:
#!python class MaximumValueTest(AnswerTestingTest): _type_name = "MaximumValue" _attrs = ("field",) def __init__(self, ds_fn, field): super(MaximumValueTest, self).__init__(ds_fn) self.field = field def run(self): v, c = self.ds.find_max(self.field) result = np.empty(4, dtype="float64") result = v result[1:] = c return result def compare(self, new_result, old_result): assert_equal(new_result, old_result)
What this does is calculate the location and value of the maximum of a
field. It then puts that into the variable result, returns that from
run and then in
compare makes sure that all are exactly equal.
To write a new test:
Add the attributes
_type_name(a string) and
_attrs(a tuple of strings, one for each attribute that defines the test – see how this is done for projections, for instance)
Implement the two routines
compareThe first should return a result and the second should compare a result to an old result. Neither should yield, but instead actually return. If you need additional arguments to the test, implement an
Keep in mind that everything returned from
runwill be stored. So if you are going to return a huge amount of data, please ensure that the test only gets run for small data. If you want a fast way to measure something as being similar or different, either an md5 hash (see the grid values test) or a sum and std of an array act as good proxies. If you must store a large amount of data for some reason, try serializing the data to a string (e.g. using
numpy.ndarray.dumps), and then compressing the data stream using
Typically for derived values, we compare to 10 or 12 decimal places. For exact values, we compare exactly.
How To Write Answer Tests for a Frontend¶
To add a new frontend answer test, first write a new set of tests for the data.
The Enzo example in
considered canonical. Do these things:
Create a new directory,
testsinside the frontend’s directory.
Create a new file,
test_outputs.pyin the frontend’s
Create a new routine that operates similarly to the routines you can see in Enzo’s output tests.
This routine should test a number of different fields and data objects.
The test routine itself should be decorated with
@requires_ds(test_dataset_name). This decorator can accept the argument
big_data=Trueif the test is expensive. The
test_dataset_nameshould be a string containing the path you would pass to the
yt.loadfunction. It does not need to be the full path to the dataset, since the path will be automatically prepended with the location of the test data directory. See The Configuration File for more information about the
big_patch_amrroutines that you can yield from to execute a bunch of standard tests. In addition we have created
sph_answerwhich is more suited for particle SPH datasets. This is where you should start, and then yield additional tests that stress the outputs in whatever ways are necessary to ensure functionality.
If you are adding to a frontend that has a few tests already, skip the first two steps.
How to Write Image Comparison Tests¶
We have a number of tests designed to compare images as part of yt. We make use of some functionality from matplotlib to automatically compare images and detect differences, if any. Image comparison tests are used in the plotting and volume rendering machinery.
The easiest way to use the image comparison tests is to make use of the
GenericImageTest class. This class takes three arguments:
A dataset instance (e.g. something you load with
A function the test machinery can call which will save an image to disk. The test class will then find any images that get created and compare them with the stored “correct” answer.
An integer specifying the number of decimal places to use when comparing images. A smaller number of decimal places will produce a less stringent test. Matplotlib uses an L2 norm on the full image to do the comparison tests, so this is not a pixel-by-pixel measure, and surprisingly large variations will still pass the test if the strictness of the comparison is not high enough.
You must decorate your test function with
requires_ds, otherwise the
answer testing machinery will not be properly set up.
Here is an example test function:
from yt.utilities.answer_testing.framework import \ GenericImageTest, requires_ds, data_dir_load from matplotlib import pyplot as plt @requires_ds(my_ds) def test_my_ds(): ds = data_dir_load(my_ds) def create_image(filename_prefix): plt.plot([1, 2], [1, 2]) plt.savefig("%s_lineplot" % filename_prefix) test = GenericImageTest(ds, create_image, 12) # this ensures the test has a unique key in the # answer test storage file test.prefix = "my_unique_name" # this ensures a nice test name in nose's output test_my_ds.__name__ = test.description yield test
The inner function
create_image can create any number of images,
as long as the corresponding filenames conform to the prefix.
Another good example of an image comparison test is the
PlotWindowAttributeTest defined in the answer testing framework and used in
yt/visualization/tests/test_plotwindow.py. This test shows how a new answer
test subclass can be used to programmatically test a variety of different methods
of a complicated class using the same test class. This sort of image comparison
test is more useful if you are finding yourself writing a ton of boilerplate
code to get your image comparison test working. The
more useful if you only need to do a one-off image comparison test.
Enabling Answer Tests on Jenkins¶
Before any code is added to or modified in the yt codebase, each incoming changeset is run against all available unit and answer tests on our continuous integration server. While unit tests are autodiscovered by nose itself, answer tests require definition of which set of tests constitute to a given answer. Configuration for the integration server is stored in tests/tests.yaml in the main yt repository:
answer_tests: local_artio_000: - yt/frontends/artio/tests/test_outputs.py # ... other_tests: unittests: - '-v' - '-s'
Each element under answer_tests defines answer name (local_artio_000 in above snippet) and specifies a list of files/classes/methods that will be validated (yt/frontends/artio/tests/test_outputs.py in above snippet). On the testing server it is translated to:
$ nosetests --with-answer-testing --local --local-dir ... --answer-big-data \ --answer-name=local_artio_000 \ yt/frontends/artio/tests/test_outputs.py
If the answer doesn’t exist on the server yet,
nosetests is run twice and
during first pass
--answer-store is added to the commandline.
In order to regenerate answers for a particular set of tests it is sufficient to change the answer name in tests/tests.yaml e.g.:
--- a/tests/tests.yaml +++ b/tests/tests.yaml @@ -25,7 +25,7 @@ - yt/analysis_modules/halo_finding/tests/test_rockstar.py - yt/frontends/owls_subfind/tests/test_outputs.py - local_owls_000: + local_owls_001: - yt/frontends/owls/tests/test_outputs.py local_pw_000:
would regenerate answers for OWLS frontend.
When adding tests to an existing set of answers (like
it is considered best practice to first submit a pull request adding the tests WITHOUT incrementing
the version number. Then, allow the tests to run (resulting in “no old answer” errors for the missing
answers). If no other failures are present, you can then increment the version number to regenerate
the answers. This way, we can avoid accidentally covering up test breakages.
Adding New Answer Tests¶
In order to add a new set of answer tests, it is sufficient to extend the answer_tests list in tests/tests.yaml e.g.:
--- a/tests/tests.yaml +++ b/tests/tests.yaml @@ -60,6 +60,10 @@ - yt/analysis_modules/absorption_spectrum/tests/test_absorption_spectrum.py:test_absorption_spectrum_non_cosmo - yt/analysis_modules/absorption_spectrum/tests/test_absorption_spectrum.py:test_absorption_spectrum_cosmo + local_gdf_000: + - yt/frontends/gdf/tests/test_outputs.py + + other_tests: unittests:
Restricting Python Versions for Answer Tests¶
If for some reason a test can be run only for a specific version of python it is
possible to indicate this by adding a
[py3] tag. For example:
answer_tests: local_test_000: - yt/test_A.py # [py2] - yt/test_B.py # [py3]
would result in
test_A.py being run only for python2 and
being run only for python3.