Filtering & Matching Macros
These macros filter, sort, and manipulate molecular datasets based on structural patterns and properties.
Match SMARTS String
OpenBabel.@match_smarts_string — Macro@match_smarts_string(expr, pattern)Filter molecules that match a SMARTS pattern.
Arguments
expr: Previous command in the chainpattern: SMARTS pattern string
Example
# Keep only molecules with benzene rings
@chain begin
@read_file("database.smi", "smi")
@match_smarts_string("c1ccccc1")
@output_as("benzene_compounds.smi", "smi")
@execute
endUsage Examples
Filter molecules containing benzene rings:
@chain begin
@read_file("database.smi", "smi")
@match_smarts_string("c1ccccc1") # Aromatic benzene ring
@output_as("benzene_compounds.smi", "smi")
@execute
endCommon SMARTS patterns:
| Pattern | Description |
|---|---|
c1ccccc1 | Benzene ring |
[OH] | Hydroxyl group |
[NH2] | Primary amine |
C=O | Carbonyl group |
[#6]=[#8] | Carbon double bonded to oxygen |
[R] | Any atom in a ring |
Don't Match SMARTS String
OpenBabel.@dont_match_smarts_string — Macro@dont_match_smarts_string(expr, pattern)Exclude molecules that match a SMARTS pattern.
Example
# Remove molecules with benzene rings
@chain begin
@read_file("database.smi", "smi")
@dont_match_smarts_string("c1ccccc1")
@output_as("no_benzene.smi", "smi")
@execute
endUsage Examples
Exclude molecules with specific functional groups:
@chain begin
@read_file("compounds.smi", "smi")
@dont_match_smarts_string("[OH]") # Remove alcohols
@output_as("no_alcohols.smi", "smi")
@execute
endSort By
OpenBabel.@sort_by — Macro@sort_by(expr, property)Sort molecules by a property in ascending order.
Arguments
expr: Previous command in the chainproperty: Property name ("MW", "logP", "TPSA", etc.)
Example
@chain begin
@read_file("molecules.sdf", "sdf")
@add_properties(["MW"])
@sort_by("MW")
@output_as("sorted_by_mw.sdf", "sdf")
@execute
endUsage Examples
@chain begin
@read_file("molecules.sdf", "sdf")
@add_properties(["MW"])
@sort_by("MW") # Ascending order
@output_as("sorted_by_mw.sdf", "sdf")
@execute
endSort By Reverse
OpenBabel.@sort_by_reverse — Macro@sort_by_reverse(expr, property)Sort molecules by a property in descending order.
Example
@chain begin
@read_file("molecules.sdf", "sdf")
@add_properties(["logP"])
@sort_by_reverse("logP") # Highest logP first
@output_as("sorted_by_logp_desc.sdf", "sdf")
@execute
endUsage Examples
Sort by logP in descending order:
@chain begin
@read_file("molecules.sdf", "sdf")
@add_properties(["logP"])
@sort_by_reverse("logP") # Descending order
@output_as("high_logp_first.sdf", "sdf")
@execute
endRemove Duplicate Molecules
OpenBabel.@remove_duplicate_mols — Macro@remove_duplicate_mols(expr)Remove duplicate molecules from the dataset.
Example
@chain begin
@read_file("database.smi", "smi")
@remove_duplicate_mols()
@output_as("unique_molecules.smi", "smi")
@execute
endUsage Examples
Remove duplicate structures:
@chain begin
@read_file("raw_data.smi", "smi")
@remove_duplicate_mols()
@output_as("unique_molecules.smi", "smi")
@execute
endConvert Dative Bonds
OpenBabel.@convert_dative_bonds — Macro@convert_dative_bonds(expr)Convert dative bonds (coordinate covalent bonds) to normal bonds in the molecular representation.
Arguments
expr: Previous command in the chain
Example
@chain begin
@read_file("coordination_compounds.mol", "mol")
@convert_dative_bonds()
@output_as("converted_bonds.mol", "mol")
@execute
endUsage Examples
Convert dative bonds to standard representation:
@chain begin
@read_file("complexes.sdf", "sdf")
@convert_dative_bonds()
@output_as("converted_bonds.sdf", "sdf")
@execute
endRemove Hydrogens
OpenBabel.@remove_hydrogens — Macro@remove_hydrogens(expr)Remove explicit hydrogen atoms from the molecular structure.
Arguments
expr: Previous command in the chain
Example
@chain begin
@read_file("explicit_h_molecules.mol", "mol")
@remove_hydrogens()
@output_as("implicit_h_molecules.mol", "mol")
@execute
endUsage Examples
Remove explicit hydrogens:
@chain begin
@read_file("molecules.sdf", "sdf")
@remove_hydrogens()
@output_as("implicit_h.sdf", "sdf")
@execute
endSet Atom Order Canonical
OpenBabel.@set_atom_order_canonical — Macro@set_atom_order_canonical(expr)Reorder atoms in molecules to follow a canonical (standardized) ordering.
Arguments
expr: Previous command in the chain
Example
@chain begin
@read_file("unordered_molecules.smi", "smi")
@set_atom_order_canonical()
@output_as("canonical_molecules.smi", "smi")
@execute
endUsage Examples
Standardize atom ordering:
@chain begin
@read_file("molecules.sdf", "sdf")
@set_atom_order_canonical()
@output_as("canonical_order.sdf", "sdf")
@execute
endSeparate Fragments
OpenBabel.@separate_fragments — Macro@separate_fragments(expr)Separate multi-fragment molecules into individual fragments as separate molecules.
Arguments
expr: Previous command in the chain
Example
@chain begin
@read_file("salt_complexes.smi", "smi")
@separate_fragments()
@output_as("individual_fragments.smi", "smi")
@execute
endUsage Examples
Split salts and complexes:
@chain begin
@read_file("salts.sdf", "sdf")
@separate_fragments()
@output_as("fragments.sdf", "sdf")
@execute
endIgnore Bad Molecules
OpenBabel.@ignore_bad_molecules — Macro@ignore_bad_molecules(expr)Skip molecules that cannot be parsed or contain errors, continuing with valid molecules.
Arguments
expr: Previous command in the chain
Example
@chain begin
@read_file("mixed_quality_data.smi", "smi")
@ignore_bad_molecules()
@gen_3D_coords("fast")
@output_as("valid_molecules.sdf", "sdf")
@execute
endUsage Examples
Skip invalid molecular structures:
@chain begin
@read_file("raw_data.smi", "smi")
@ignore_bad_molecules()
@output_as("valid_molecules.smi", "smi")
@execute
endStart With Index
OpenBabel.@start_with_index — Macro@start_with_index(expr, idx)Start processing molecules from a specific index in the input file.
Arguments
expr: Previous command in the chainidx: Starting index (1-based indexing)
Example
@chain begin
@read_file("large_database.sdf", "sdf")
@start_with_index(100) # Start from the 100th molecule
@add_properties(["MW"])
@output_as("subset_molecules.sdf", "sdf")
@execute
endUsage Examples
@chain begin
@read_file("huge_database.sdf", "sdf")
@start_with_index(1000) # Begin from molecule 1000
@add_properties(["MW", "logP"])
@sort_by("MW")
@output_as("subset_processed.sdf", "sdf")
@execute
endComplete Workflows
Data Cleaning Pipeline
@chain begin
@read_file("raw_data.smi", "smi")
@ignore_bad_molecules() # Skip invalid molecules
@remove_duplicate_mols() # Remove duplicates
@remove_hydrogens() # Implicit hydrogens
@set_atom_order_canonical() # Standardize atom order
@separate_fragments() # Split salts/complexes
@match_smarts_string("[!#1]") # Keep non-hydrogen containing
@output_as("cleaned_data.smi", "smi")
@execute
endStructure-Based Filtering
@chain begin
@read_file("compounds.smi", "smi")
@match_smarts_string("c1ccccc1") # Keep aromatic compounds
@dont_match_smarts_string("[OH]") # Remove alcohols
@add_properties(["MW", "logP"])
@sort_by_reverse("logP")
@output_as("filtered_aromatics.sdf", "sdf")
@execute
end