Filtering your Dataset¶
Large datasets are oftentimes too overwhelming to deal with in their entirety, and it can be useful and faster to analyze subsets of these datasets. Furthermore, filtering the dataset based on some field condition can reveal subtle information not easily accessible by looking at the whole dataset. Filters can be generated based on spatial position, say in a sphere in the center of your dataset space, or more generally they can be defined by the properties of any field in the simulation.
Because mesh fields are internally different from particle fields, there are different ways of filtering each type as indicated below; however, filtering fields by spatial location (i.e. geometric objects) will apply to both types equally.
Filtering Mesh Fields¶
Mesh fields can be filtered by two methods: cut region objects
(YTCutRegion
)
and NumPy boolean masks. Boolean masks are simpler, but they only work
for examining datasets, whereas cut regions objects create wholly new
data objects suitable for full analysis (data examination, image generation,
etc.)
Boolean Masks¶
NumPy boolean masks can be used with any NumPy array simply by passing the array a conditional. As a general example of this:
Similarly, if you’ve created a yt data object (e.g. a region, a sphere), you can examine its field values as a NumPy array by simply indexing it with the field name. Thus, it too can be masked using a NumPy boolean mask. Let’s set a simple mask based on the contents of one of our fields.
This was a simple example, but one can make the conditionals that define a boolean mask have multiple parts, and one can stack masks together to make very complex cuts on one’s data. Once the data is filtered, it can be used if you simply need to access the NumPy arrays:
Cut Regions¶
Cut regions are a more general solution to filtering mesh fields. The output of a cut region is an entirely new data object, which can be treated like any other data object to generate images, examine its values, etc. See this.
In addition to inputting string parameters into cut_region to specify filters,
wrapper functions exist that allow the user to use a simplified syntax for
filtering out unwanted regions. Such wrapper functions are methods of
:func: YTSelectionContainer3D
.
- The following exclude and include functions are supported:
include_equal()
- Only include values equal to given valueexclude_equal()
- Exclude values equal to given valueinclude_inside()
- Only include values inside closed intervalexclude_inside()
- Exclude values inside closed intervalinclude_outside()
- Only include values outside closed intervalexclude_outside()
- Exclude values outside closed intervalexclude_nan()
- Exclude NaN valuesinclude_above()
- Only include values above given valueexclude_above()
- Exclude values above given valueinclude_below()
- Only include values below given balueexclude_below()
- Exclude values below given value
Warning
Cut regions are unstable when used on particle fields. Though you can create a cut region using a mesh field or fields as a filter and then obtain a particle field within that region, you cannot create a cut region using particle fields in the filter, as yt will currently raise an error. If you want to filter particle fields, see the next section Filtering Particle Fields instead.
Filtering Particle Fields¶
Particle filters create new particle fields based on the manipulation and cuts on existing particle fields. You can apply cuts to them to effectively mask out everything except the particles with which you are concerned.
Creating a particle filter takes a few steps. You must first define a function which accepts a data object (e.g. all_data, sphere, etc.) as its argument. It uses the fields and information in this geometric object in order to produce some sort of conditional mask that is then returned to create a new particle type.
Here is a particle filter to create a new star
particle type. For Enzo
simulations, stars have particle_type
set to 2, so our filter will select
only the particles with particle_type
(i.e. field = ('all',
'particle_type')
equal to 2.
@yt.particle_filter(requires=["particle_type"], filtered_type="all")
def stars(pfilter, data):
filter = data[pfilter.filtered_type, "particle_type"] == 2
return filter
The particle_filter()
decorator takes a
few options. You must specify the names of the particle fields that are
required in order to define the filter — in this case the particle_type
field. Additionally, you must specify the particle type to be filtered — in
this case we filter all the particle in dataset by specifying the all
particle type.
In addition, you may specify a name for the newly defined particle type. If no
name is specified, the name for the particle type will be inferred from the name
of the filter definition — in this case the inferred name will be stars
.
As an alternative syntax, you can also define a new particle filter via the
add_particle_filter()
function.
def stars(pfilter, data):
filter = data[pfilter.filtered_type, "particle_type"] == 2
return filter
yt.add_particle_filter(
"stars", function=stars, filtered_type="all", requires=["particle_type"]
)
This is equivalent to our use of the particle_filter
decorator above. The
choice to use either the particle_filter
decorator or the
add_particle_filter
function is a purely stylistic choice.
Lastly, the filter must be applied to our dataset of choice. Note that this filter can be added to as many datasets as we wish. It will only actually create new filtered fields if the dataset has the required fields, though.
import yt
ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
ds.add_particle_filter("stars")
And that’s it! We can now access all of the (‘stars’, field) fields from
our dataset ds
and treat them as any other particle field. In addition,
it created some deposit
fields, where the particles were deposited on to
the grid as mesh fields.
We can create additional filters building on top of the filters we have. For example, we can identify the young stars based on their age, which is the difference between current time and their creation_time.
def young_stars(pfilter, data):
age = data.ds.current_time - data[pfilter.filtered_type, "creation_time"]
filter = np.logical_and(age.in_units("Myr") <= 5, age >= 0)
return filter
yt.add_particle_filter(
"young_stars",
function=young_stars,
filtered_type="stars",
requires=["creation_time"],
)
If we properly define all the filters using the decorator yt.particle_filter
or the function yt.add_particle_filter
in advance. We can add the filter
we need to the dataset. If the filtered_type
is already defined but not
added to the dataset, it will automatically add the filter first. For example,
if we add the young_stars
filter, which is filtered from stars
,
to the dataset, it will also add stars
filter to the dataset.
import yt
ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
ds.add_particle_filter("young_stars")
Additional example of particle filters can be found in the notebook.
Particle Unions¶
Multiple types of particles can be combined into a single, conceptual type. As
an example, the NMSU-ART code has multiple “species” of dark matter, which we
union into a single darkmatter
field. The all
particle type is a
special case of this.
To create a particle union, you need to import the ParticleUnion
class from
yt.data_objects.unions
, which you then create and pass into
add_particle_union
on a dataset object.
Here is an example, where we union the halo
and disk
particle types
into a single type, star
. yt will then determine which fields are
accessible to this new particle type and it will add them.
from yt.data_objects.unions import ParticleUnion
u = ParticleUnion("star", ["halo", "disk"])
ds.add_particle_union(u)
Filtering Fields by Spatial Location: Geometric Objects¶
Creating geometric objects for a dataset provides a means for filtering
a field based on spatial location. The most commonly used of these are
spheres, regions (3D prisms), ellipsoids, disks, and rays. The all_data
object which gets used throughout this documentation section is an example of
a geometric object, but it defaults to including all the data in the dataset
volume. To see all of the geometric objects available, see
Available Objects.
Consult the object documentation section for all of the different objects one can use, but here is a simple example using a sphere object to filter a dataset. Let’s filter out everything not within 10 Mpc of some random location, say [0.2, 0.5, 0.1], in the simulation volume. The resulting object will only contain grid cells with centers falling inside of our defined sphere, which may look offset based on the presence of different resolution elements distributed throughout the dataset.