Filtering your Dataset

Large datasets are oftentimes too overwhelming to deal with in their entirety, and it can be useful and faster to analyze subsets of these datasets. Furthermore, filtering the dataset based on some field condition can reveal subtle information not easily accessible by looking at the whole dataset. Filters can be generated based on spatial position, say in a sphere in the center of your dataset space, or more generally they can be defined by the properties of any field in the simulation.

Because mesh fields are internally different from particle fields, there are different ways of filtering each type as indicated below; however, filtering fields by spatial location (i.e. geometric objects) will apply to both types equally.

Filtering Mesh Fields

Mesh fields can be filtered by two methods: cut region objects (YTCutRegion) and NumPy boolean masks. Boolean masks are simpler, but they only work for examining datasets, whereas cut regions objects create wholly new data objects suitable for full analysis (data examination, image generation, etc.)

Boolean Masks

NumPy boolean masks can be used with any NumPy array simply by passing the array a conditional. As a general example of this:

Notebook
In [1]:
import numpy as np

a = np.arange(5)
bigger_than_two = a > 2
print("Original Array: a = \n%s" % a)
print("Boolean Mask: bigger_than_two = \n%s" % bigger_than_two)
print("Masked Array: a[bigger_than_two] = \n%s" % a[bigger_than_two])
Original Array: a = 
[0 1 2 3 4]
Boolean Mask: bigger_than_two = 
[False False False  True  True]
Masked Array: a[bigger_than_two] = 
[3 4]

Similarly, if you’ve created a yt data object (e.g. a region, a sphere), you can examine its field values as a NumPy array by simply indexing it with the field name. Thus, it too can be masked using a NumPy boolean mask. Let’s set a simple mask based on the contents of one of our fields.

Notebook
In [1]:
import yt

ds = yt.load("Enzo_64/DD0042/data0042")
ad = ds.all_data()
hot = ad["gas", "temperature"].in_units("K") > 1e6
print(
    'Temperature of all data: ad["gas", "temperature"] = \n%s'
    % ad["gas", "temperature"]
)
print("Boolean Mask: hot = \n%s" % hot)
print(
    'Temperature of "hot" data: ad["gas", "temperature"][hot] = \n%s'
    % ad["gas", "temperature"][hot]
)
Temperature of all data: ad["gas", "temperature"] = 
[1.00000000e+00 1.00000000e+00 1.00000000e+00 ... 1.87798863e+07
 1.77985684e+07 1.73020029e+07] K
Boolean Mask: hot = 
[False False False ...  True  True  True]
Temperature of "hot" data: ad["gas", "temperature"][hot] = 
[ 6502480.87990464  7005697.58048854  7453051.94593615 ...
 18779886.27258056 17798568.39391833 17302002.90929025] K

This was a simple example, but one can make the conditionals that define a boolean mask have multiple parts, and one can stack masks together to make very complex cuts on one’s data. Once the data is filtered, it can be used if you simply need to access the NumPy arrays:

Notebook
In [1]:
import yt

ds = yt.load("Enzo_64/DD0042/data0042")
ad = ds.all_data()
overpressure_and_fast = (
    (ad["gas", "pressure"] > 1e-14) &
    (ad["gas", "velocity_magnitude"].in_units("km/s") > 1e2)
)
density = ad["gas", "density"]
print('Density of all data: ad["gas", "density"] = \n%s' % density)
print(
    'Density of "overpressure and fast" data: overpressure_and_fast["gas", "density"] = \n%s'
    % density[overpressure_and_fast]
)
Density of all data: ad["gas", "density"] = 
[2.05609648e-31 1.87887401e-31 2.73497858e-31 ... 3.17107844e-28
 2.45682636e-28 2.13163618e-28] g/cm**3
Density of "overpressure and fast" data: overpressure_and_fast["gas", "density"] = 
[9.67936877e-29 1.03145733e-28 1.17144334e-28 ... 3.17107844e-28
 2.45682636e-28 2.13163618e-28] g/cm**3

Cut Regions

Cut regions are a more general solution to filtering mesh fields. The output of a cut region is an entirely new data object, which can be treated like any other data object to generate images, examine its values, etc.

Notebook