Filtering your Dataset

Large datasets are oftentimes too overwhelming to deal with in their entirety, and it can be useful and faster to analyze subsets of these datasets. Furthermore, filtering the dataset based on some field condition can reveal subtle information not easily accessible by looking at the whole dataset. Filters can be generated based on spatial position, say in a sphere in the center of your dataset space, or more generally they can be defined by the properties of any field in the simulation.

Because mesh fields are internally different from particle fields, there are different ways of filtering each type as indicated below; however, filtering fields by spatial location (i.e. geometric objects) will apply to both types equally.

Filtering Mesh Fields

Mesh fields can be filtered by two methods: cut region objects (YTCutRegion) and NumPy boolean masks. Boolean masks are simpler, but they only work for examining datasets, whereas cut regions objects create wholly new data objects suitable for full analysis (data examination, image generation, etc.)

Boolean Masks

NumPy boolean masks can be used with any NumPy array simply by passing the array a conditional. As a general example of this:

Similarly, if you’ve created a yt data object (e.g. a region, a sphere), you can examine its field values as a NumPy array by simply indexing it with the field name. Thus, it too can be masked using a NumPy boolean mask. Let’s set a simple mask based on the contents of one of our fields.

This was a simple example, but one can make the conditionals that define a boolean mask have multiple parts, and one can stack masks together to make very complex cuts on one’s data. Once the data is filtered, it can be used if you simply need to access the NumPy arrays:

Cut Regions

Cut regions are a more general solution to filtering mesh fields. The output of a cut region is an entirely new data object, which can be treated like any other data object to generate images, examine its values, etc. See this.

In addition to inputting string parameters into cut_region to specify filters, wrapper functions exist that allow the user to use a simplified syntax for filtering out unwanted regions. Such wrapper functions are methods of :func: YTSelectionContainer3D.

The following exclude and include functions are supported:
  • include_equal() - Only include values equal to given value

  • exclude_equal()- Exclude values equal to given value

  • include_inside() - Only include values inside closed interval

  • exclude_inside() - Exclude values inside closed interval

  • include_outside() - Only include values outside closed interval

  • exclude_outside() - Exclude values outside closed interval

  • exclude_nan() - Exclude NaN values

  • include_above() - Only include values above given value

  • exclude_above() - Exclude values above given value

  • include_below() - Only include values below given balue

  • exclude_below() - Exclude values below given value

Warning

Cut regions are unstable when used on particle fields. Though you can create a cut region using a mesh field or fields as a filter and then obtain a particle field within that region, you cannot create a cut region using particle fields in the filter, as yt will currently raise an error. If you want to filter particle fields, see the next section Filtering Particle Fields instead.

Filtering Particle Fields

Particle filters create new particle fields based on the manipulation and cuts on existing particle fields. You can apply cuts to them to effectively mask out everything except the particles with which you are concerned.

Creating a particle filter takes a few steps. You must first define a function which accepts a data object (e.g. all_data, sphere, etc.) as its argument. It uses the fields and information in this geometric object in order to produce some sort of conditional mask that is then returned to create a new particle type.

Here is a particle filter to create a new star particle type. For Enzo simulations, stars have particle_type set to 2, so our filter will select only the particles with particle_type (i.e. field = ('all', 'particle_type') equal to 2.

@yt.particle_filter(requires=["particle_type"], filtered_type="all")
def stars(pfilter, data):
    filter = data[pfilter.filtered_type, "particle_type"] == 2
    return filter

The particle_filter() decorator takes a few options. You must specify the names of the particle fields that are required in order to define the filter — in this case the particle_type field. Additionally, you must specify the particle type to be filtered — in this case we filter all the particle in dataset by specifying the all particle type.

In addition, you may specify a name for the newly defined particle type. If no name is specified, the name for the particle type will be inferred from the name of the filter definition — in this case the inferred name will be stars.

As an alternative syntax, you can also define a new particle filter via the add_particle_filter() function.

def stars(pfilter, data):
    filter = data[pfilter.filtered_type, "particle_type"] == 2
    return filter


yt.add_particle_filter(
    "stars", function=stars, filtered_type="all", requires=["particle_type"]
)

This is equivalent to our use of the particle_filter decorator above. The choice to use either the particle_filter decorator or the add_particle_filter function is a purely stylistic choice.

Lastly, the filter must be applied to our dataset of choice. Note that this filter can be added to as many datasets as we wish. It will only actually create new filtered fields if the dataset has the required fields, though.

import yt

ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
ds.add_particle_filter("stars")

And that’s it! We can now access all of the (‘stars’, field) fields from our dataset ds and treat them as any other particle field. In addition, it created some deposit fields, where the particles were deposited on to the grid as mesh fields.

We can create additional filters building on top of the filters we have. For example, we can identify the young stars based on their age, which is the difference between current time and their creation_time.

def young_stars(pfilter, data):
    age = data.ds.current_time - data[pfilter.filtered_type, "creation_time"]
    filter = np.logical_and(age.in_units("Myr") <= 5, age >= 0)
    return filter


yt.add_particle_filter(
    "young_stars",
    function=young_stars,
    filtered_type="stars",
    requires=["creation_time"],
)

If we properly define all the filters using the decorator yt.particle_filter or the function yt.add_particle_filter in advance. We can add the filter we need to the dataset. If the filtered_type is already defined but not added to the dataset, it will automatically add the filter first. For example, if we add the young_stars filter, which is filtered from stars, to the dataset, it will also add stars filter to the dataset.

import yt

ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
ds.add_particle_filter("young_stars")

Additional example of particle filters can be found in the notebook.

Particle Unions

Multiple types of particles can be combined into a single, conceptual type. As an example, the NMSU-ART code has multiple “species” of dark matter, which we union into a single darkmatter field. The all particle type is a special case of this.

To create a particle union, you need to import the ParticleUnion class from yt.data_objects.unions, which you then create and pass into add_particle_union on a dataset object.

Here is an example, where we union the halo and disk particle types into a single type, star. yt will then determine which fields are accessible to this new particle type and it will add them.

from yt.data_objects.unions import ParticleUnion

u = ParticleUnion("star", ["halo", "disk"])
ds.add_particle_union(u)

Filtering Fields by Spatial Location: Geometric Objects

Creating geometric objects for a dataset provides a means for filtering a field based on spatial location. The most commonly used of these are spheres, regions (3D prisms), ellipsoids, disks, and rays. The all_data object which gets used throughout this documentation section is an example of a geometric object, but it defaults to including all the data in the dataset volume. To see all of the geometric objects available, see Available Objects.

Consult the object documentation section for all of the different objects one can use, but here is a simple example using a sphere object to filter a dataset. Let’s filter out everything not within 10 Mpc of some random location, say [0.2, 0.5, 0.1], in the simulation volume. The resulting object will only contain grid cells with centers falling inside of our defined sphere, which may look offset based on the presence of different resolution elements distributed throughout the dataset.