Data objects (also called Data Containers) are used in yt as convenience structures for grouping data in logical ways that make sense in the context of the dataset as a whole. Some of the data objects are geometrical groupings of data (e.g. sphere, box, cylinder, etc.). Others represent data products derived from your dataset (e.g. slices, streamlines, surfaces). Still other data objects group multiple objects together or filter them (e.g. data collection, cut region).
To generate standard plots, objects rarely need to be directly constructed. However, for detailed data inspection as well as hand-crafted derived data, objects can be exceptionally useful and even necessary.
To create an object, you usually only need a loaded dataset, the name of
the object type, and the relevant parameters for your object. Here is a common
example for creating a
Region object that covers all of your data volume.
import yt ds = yt.load("RedshiftOutput0005") ad = ds.all_data()
Alternatively, we could create a sphere object of radius 1 kpc on location [0.5, 0.5, 0.5]:
import yt ds = yt.load("RedshiftOutput0005") sp = ds.sphere([0.5, 0.5, 0.5], (1, 'kpc'))
After an object has been created, it can be used as a data_source to certain
ProjectionPlot), one can compute the
bulk quantities associated with that object (see Processing Objects: Derived Quantities),
or the data can be examined directly. For example, if you want to figure out
the temperature at all indexed locations in the central sphere of your
dataset you could:
import yt ds = yt.load("RedshiftOutput0005") sp = ds.sphere([0.5, 0.5, 0.5], (1, 'kpc')) # Show all temperature values print(sp["temperature"]) # Print things in a more human-friendly manner: one temperature at a time print("(x, y, z) Temperature") print("-----------------------") for i in range(sp["temperature"].size): print("(%f, %f, %f) %f" % (sp["x"][i], sp["y"][i], sp["z"][i], sp["temperature"][i]))
yt provides a mechanism for easily selecting data while doing interactive work
on the command line. This allows for region selection based on the full domain
of the object. Selecting in this manner is exposed through a slice-like
syntax. All of these attributes are exposed through the
object, which is an attribute of a
DataSet object, called
.r attribute serves as a persistent means of accessing the full data
from a dataset. You can access this shorthand operation by querying any field
.r object, like so:
ds = yt.load("RedshiftOutput0005") rho = ds.r["density"]
This will return a flattened array of data. The region expression object
r) doesn’t have any derived quantities on it. This is completely
equivalent to this set of statements:
ds = yt.load("RedshiftOutput0005") dd = ds.all_data() rho = dd["density"]
One thing to keep in mind with accessing data in this way is that it is persistent. It is loaded into memory, and then retained until the dataset is deleted or garbage collected.
To select rectilinear regions, where the data is selected the same way that it is selected in a 3D Objects, you can utilize slice-like syntax, supplying start and stop, but not supplying a step argument. This requires that three components of the slice must be specified. These take a start and a stop, and are for the three axes in simulation order (if your data is ordered z, y, x for instance, this would be in z, y, x order).
The slices can have both position and, optionally, unit values. These define
the value with respect to the
domain_left_edge of the dataset. So for
instance, you could specify it like so::
This would return a region that included everything between 100 kpc from the left edge of the dataset to 200 kpc from the left edge of the dataset in the first dimension, and which spans the entire dataset in the second and third dimensions. By default, if the units are unspecified, they are in the “native” code units of the dataset.
This works in all types of datasets, as well. For instance, if you have a geographic dataset (which is usually ordered latitude, longitude, altitude) you can easily select, for instance, one hemisphere with a region selection::
If you specify a single slice, it will be repeated along all three dimensions. For instance, this will give all data::
And this will select a box running from 0.4 to 0.6 along all three dimensions::
yt also provides functionality for selecting regions that have been turned into
voxels. This returns an Arbitrary Grids Objects object. It can be created by
specifying a complex slice “step”, where the start and stop follow the same
rules as above. This is similar to how the numpy
mgrid operation works.
For instance, this code block will generate a grid covering the full domain,
but converted to being 21x35x100 dimensions::
region = ds.r[::21j, ::35j, ::100j]
The left and right edges, as above, can be specified to provide bounds as well. For instance, to select a 10 meter cube, with 24 cells in each dimension, we could supply::
region = ds.r[(20,'m'):(30,'m'):24j, (30,'m'):(40,'m'):24j, (7,'m'):(17,'m'):24j]
This can select both particles and mesh fields. Mesh fields will be 3D arrays, and generated through volume-weighted overlap calculations.
If one dimension is specified as a single value, that will be the dimension along which a slice is made. This provides a simple means of generating a slice from a subset of the data. For instance, to create a slice of a dataset, you can very simply specify the full domain along two axes::
sl = ds.r[:,:,0.25]
This can also be very easily plotted::
sl = ds.r[:,:,0.25] sl.plot()
This accepts arguments the same way::
sl = ds.r[(20.1, 'km'):(31.0, 'km'), (504.143,'m'):(1000.0,'m'), (900.1, 'm')] sl.plot()
As noted above, there are numerous types of objects. Here we group them into:
If you want to create your own custom data object type, see Creating Data Objects.
For 0D, 1D, and 2D geometric objects, if the extent of the object intersects a grid cell, then the cell is included in the object; however, for 3D objects the center of the cell must be within the object in order for the grid cell to be incorporated.
point(coord, ds=None, field_parameters=None, data_source=None)
slice(axis, coord, center=None, ds=None, field_parameters=None, data_source=None)
cutting(normal, coord, north_vector=None, ds=None, field_parameters=None, data_source=None)
all_data()is a wrapper on the Box Region class which defaults to creating a Region covering the entire dataset domain. It is effectively
ds.region(ds.domain_center, ds.domain_left_edge, ds.domain_right_edge).
region(center, left_edge, right_edge, fields=None, ds=None, field_parameters=None, data_source=None)
box(left_edge, right_edge, fields=None, ds=None, field_parameters=None, data_source=None)
boxwrapper, the center is assumed to be the midpoint between the left and right edges.
disk(center, normal, radius, height, fields=None, ds=None, field_parameters=None, data_source=None)
ellipsoid(center, semi_major_axis_length, semi_medium_axis_length, semi_minor_axis_length, semi_major_vector, tilt, fields=None, ds=None, field_parameters=None, data_source=None)
sphere(center, radius, ds=None, field_parameters=None, data_source=None)
See also the section on Filtering your Dataset.
slice(axis, coord, ds, data_source=sph)
cut_region(base_object, conditionals, ds=None, field_parameters=None)
cut_regionis a filter which can be applied to any other data object. The filter is defined by the conditionals present, which apply cuts to the data in the object. A
cut_regionwill work for either particle fields or mesh fields, but not on both simultaneously. For more detailed information and examples, see Cut Regions.
data_collection(center, obj_list, ds=None, field_parameters=None)
data_collectionis a list of data objects that can be sampled and processed as a whole in a single data object.
smoothed_covering_grid(level, left_edge, dimensions, fields=None, ds=None, num_ghost_zones=0, use_pbar=True, field_parameters=None)
arbitrary_grid(left_edge, right_edge, dimensions, ds=None, field_parameters=None)
proj(field, axis, weight_field=None, center=None, ds=None, data_source=None, method="integrate", field_parameters=None)
data_sourcekeyword). Alternatively, one can specify a weight_field and different
methodvalues to change the nature of the projection outcome. See Types of Projections for more information.
streamline(coord_list, length, fields=None, ds=None, field_parameters=None)
streamlinecan be traced out by identifying a starting coordinate (or list of coordinates) and allowing it to trace a vector field, like gas velocity. See Streamlines: Tracking the Trajectories of Tracers in your Data for more information.
Derived quantities are a way of calculating some bulk quantities associated
with all of the grid cells contained in a data object.
Derived quantities can be accessed via the
Here is an example of how to get the angular momentum vector calculated from
all the cells contained in a sphere at the center of our dataset.
ds = load("my_data") sp = ds.sphere('c', (10, 'kpc')) print(sp.quantities.angular_momentum_vector())
Most data objects now have multiple numpy-like methods that allow you to quickly process data. More of these methods will be added over time and added to this list. Most, if not all, of these map to other yt operations and are designed as syntactic sugar to slightly simplify otherwise somewhat obtuse pipelines.
These operations are parallelized.
You can compute the extrema of a field by using the
functions. This will cache the extrema in between, so calling
max will be considerably faster. Here is an example::
ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030") reg = ds.r[0.3:0.6, 0.2:0.4, 0.9:0.95] min_rho = reg.min("density") max_rho = reg.max("density")
This is equivalent to::
min_rho, max_rho = reg.quantities.extrema("density")
max operation can also compute the maximum intensity projection::
proj = reg.max("density", axis="x") proj.plot()
This is equivalent to::
proj = ds.proj("density", "x", data_source=reg, method="mip") proj.plot()
min operator does not do this, however, as a minimum intensity
projection is not currently implemented.
You can also compute the
mean value, which accepts a field, axis and wight
function. If the axis is not specified, it will return the average value of
the specified field, weighted by the weight argument. The weight argument
ones, which performs an arithmetic average. For instance::
mean_rho = reg.mean("density") rho_by_vol = reg.mean("density", weight="cell_volume")
This is equivalent to::
mean_rho = reg.quantities.weighted_average("density", weight_field="ones") rho_by_vol = reg.quantities.weighted_average("density", weight_field="cell_volume")
If an axis is provided, it will project along that axis and return it to you::
rho_proj = reg.mean("temperature", axis="y", weight="density") rho_proj.plot()
sum function will add all the values in the data object. It accepts a
field and, optionally, an axis. If the axis is left unspecified, it will sum
the values in the object::
vol = reg.sum("cell_volume")
If the axis is specified, it will compute a projection using the method
(which does not take into account varying path length!) and return that to
cell_count = reg.sum("ones", axis="z") cell_count.plot()
To compute a projection where the path length is taken into account, you can
proj = reg.integrate("density", "x")
All of these projections supply the data object as their base input.
Often, it can be useful to sample a field at the minimum and maximum of a
different field. You can use the
argmin operations to do
This will return the temperature at the minimum density.
If you don’t specify an
axis, it will return the spatial position of
the maximum value of the queried field. Here is an example::
x, y, z = reg.argmin("density")
The covering grid and smoothed covering grid objects mandate that they be
exactly aligned with the mesh. This is a
holdover from the time when yt was used exclusively for data that came in
regularly structured grid patches, and does not necessarily work as well for
data that is composed of discrete objects like particles. To augment this, the
was created, which enables construction of meshes (onto which particles can be
deposited or smoothed) in arbitrary regions. This eliminates any assumptions
on yt’s part about how the data is organized, and will allow for more
fine-grained control over visualizations.
An example of creating an arbitrary grid would be to construct one, then query the deposited particle density, like so:
import yt ds = yt.load("snapshot_010.hdf5") obj = ds.arbitrary_grid([0.0, 0.0, 0.0], [0.99, 0.99, 0.99], dims=[128, 128, 128]) print(obj["deposit", "all_density"])
While these cannot yet be used as input to projections or slices, slices and projections can be taken of the data in them and visualized by hand.
These objects, as of yt 3.3, are now also able to “voxelize” mesh fields. This means that you can query the “density” field and it will return the density field as deposited, identically to how it would be deposited in a fixed resolution buffer. Note that this means that contributions from misaligned or partially-overlapping cells are added in a volume-weighted way, which makes it inappropriate for some types of analysis.
Boolean Data Objects have not yet been ported to yt 3.0 from yt 2.x. If you are interested in aiding in this port, please contact the yt-dev mailing list. Until it is ported, this functionality below will not work.
A special type of data object is the boolean data object. It works only on three-dimensional objects. It is built by relating already existing data objects with boolean operators. The boolean logic may be nested using parentheses, and it supports the standard “AND”, “OR”, and “NOT” operators:
Please see the The Cookbook for some examples of how to use the boolean data object.
The underlying machinery used in Clump Finding is accessible from any data object. This includes the ability to obtain and examine topologically connected sets. These sets are identified by examining cells between two threshold values and connecting them. What is returned to the user is a list of the intervals of values found, and extracted regions that contain only those cells that are connected.
To use this, call
any 3D data object. This requests a field, the number of levels of levels sets to
extract, the min and the max value between which sets will be identified, and
whether or not to conduct it in log space.
sp = ds.sphere("max", (1.0, 'pc')) contour_values, connected_sets = sp.extract_connected_sets( "density", 3, 1e-30, 1e-20)
The first item,
contour_values, will be an array of the min value for each
set of level sets. The second (
connected_sets) will be a dict of dicts.
The key for the first (outer) dict is the level of the contour, corresponding
contour_values. The inner dict returned is keyed by the contour ID. It
objects. These can be queried just as any other data object. The clump finder
(Clump Finding) differs from the above method in that the contour
identification is performed recursively within each individual structure, and
structures can be kept or remerged later based on additional criteria, such as
Often, when operating interactively or via the scripting interface, it is convenient to save an object to disk and then restart the calculation later or transfer the data from a container to another filesystem. This can be particularly useful when working with extremely large datasets. Field data can be saved to disk in a format that allows for it to be reloaded just like a regular dataset. For information on how to do this, see Geometric Data Containers.