Filtering your Dataset¶
Large datasets are oftentimes too overwhelming to deal with in their entirety, and it can be useful and faster to analyze subsets of these datasets. Furthermore, filtering the dataset based on some field condition can reveal subtle information not easily accessible by looking at the whole dataset. Filters can be generated based on spatial position, say in a sphere in the center of your dataset space, or more generally they can be defined by the properties of any field in the simulation.
Because mesh fields are internally different from particle fields, there are different ways of filtering each type as indicated below; however, filtering fields by spatial location (i.e. geometric objects) will apply to both types equally.
Filtering Mesh Fields¶
Mesh fields can be filtered by two methods: cut region objects
(YTCutRegion
)
and NumPy boolean masks. Boolean masks are simpler, but they only work
for examining datasets, whereas cut regions objects create wholly new
data objects suitable for full analysis (data examination, image generation,
etc.)
Boolean Masks¶
NumPy boolean masks can be used with any NumPy array simply by passing the array a conditional. As a general example of this:
import numpy as np
a = np.arange(5)
bigger_than_two = a > 2
print("Original Array: a = \n%s" % a)
print("Boolean Mask: bigger_than_two = \n%s" % bigger_than_two)
print("Masked Array: a[bigger_than_two] = \n%s" % a[bigger_than_two])
Similarly, if you’ve created a yt data object (e.g. a region, a sphere), you can examine its field values as a NumPy array by simply indexing it with the field name. Thus, it too can be masked using a NumPy boolean mask. Let’s set a simple mask based on the contents of one of our fields.
import yt
ds = yt.load("Enzo_64/DD0042/data0042")
ad = ds.all_data()
hot = ad["gas", "temperature"].in_units("K") > 1e6
print(
'Temperature of all data: ad["gas", "temperature"] = \n%s'
% ad["gas", "temperature"]
)
print("Boolean Mask: hot = \n%s" % hot)
print(
'Temperature of "hot" data: ad["gas", "temperature"][hot] = \n%s'
% ad["gas", "temperature"][hot]
)
This was a simple example, but one can make the conditionals that define a boolean mask have multiple parts, and one can stack masks together to make very complex cuts on one’s data. Once the data is filtered, it can be used if you simply need to access the NumPy arrays:
import yt
ds = yt.load("Enzo_64/DD0042/data0042")
ad = ds.all_data()
overpressure_and_fast = (
(ad["gas", "pressure"] > 1e-14) &
(ad["gas", "velocity_magnitude"].in_units("km/s") > 1e2)
)
density = ad["gas", "density"]
print('Density of all data: ad["gas", "density"] = \n%s' % density)
print(
'Density of "overpressure and fast" data: overpressure_and_fast["gas", "density"] = \n%s'
% density[overpressure_and_fast]
)
Cut Regions¶
Cut regions are a more general solution to filtering mesh fields. The output of a cut region is an entirely new data object, which can be treated like any other data object to generate images, examine its values, etc.