Filtering & Matching Macros

These macros filter, sort, and manipulate molecular datasets based on structural patterns and properties.

Match SMARTS String

OpenBabel.@match_smarts_stringMacro
@match_smarts_string(expr, pattern)

Filter molecules that match a SMARTS pattern.

Arguments

  • expr: Previous command in the chain
  • pattern: SMARTS pattern string

Example

# Keep only molecules with benzene rings
@chain begin
    @read_file("database.smi", "smi")
    @match_smarts_string("c1ccccc1")
    @output_as("benzene_compounds.smi", "smi")
    @execute
end
source

Usage Examples

Filter molecules containing benzene rings:

@chain begin
    @read_file("database.smi", "smi")
    @match_smarts_string("c1ccccc1")  # Aromatic benzene ring
    @output_as("benzene_compounds.smi", "smi")
    @execute
end

Common SMARTS patterns:

PatternDescription
c1ccccc1Benzene ring
[OH]Hydroxyl group
[NH2]Primary amine
C=OCarbonyl group
[#6]=[#8]Carbon double bonded to oxygen
[R]Any atom in a ring

Don't Match SMARTS String

OpenBabel.@dont_match_smarts_stringMacro
@dont_match_smarts_string(expr, pattern)

Exclude molecules that match a SMARTS pattern.

Example

# Remove molecules with benzene rings
@chain begin
    @read_file("database.smi", "smi")
    @dont_match_smarts_string("c1ccccc1")
    @output_as("no_benzene.smi", "smi")
    @execute
end
source

Usage Examples

Exclude molecules with specific functional groups:

@chain begin
    @read_file("compounds.smi", "smi")
    @dont_match_smarts_string("[OH]")  # Remove alcohols
    @output_as("no_alcohols.smi", "smi")
    @execute
end

Sort By

OpenBabel.@sort_byMacro
@sort_by(expr, property)

Sort molecules by a property in ascending order.

Arguments

  • expr: Previous command in the chain
  • property: Property name ("MW", "logP", "TPSA", etc.)

Example

@chain begin
    @read_file("molecules.sdf", "sdf")
    @add_properties(["MW"])
    @sort_by("MW")
    @output_as("sorted_by_mw.sdf", "sdf")
    @execute
end
source

Usage Examples

@chain begin
    @read_file("molecules.sdf", "sdf")
    @add_properties(["MW"])
    @sort_by("MW")  # Ascending order
    @output_as("sorted_by_mw.sdf", "sdf")
    @execute
end

Sort By Reverse

OpenBabel.@sort_by_reverseMacro
@sort_by_reverse(expr, property)

Sort molecules by a property in descending order.

Example

@chain begin
    @read_file("molecules.sdf", "sdf")
    @add_properties(["logP"])
    @sort_by_reverse("logP")  # Highest logP first
    @output_as("sorted_by_logp_desc.sdf", "sdf")
    @execute
end
source

Usage Examples

Sort by logP in descending order:

@chain begin
    @read_file("molecules.sdf", "sdf")
    @add_properties(["logP"])
    @sort_by_reverse("logP")  # Descending order
    @output_as("high_logp_first.sdf", "sdf")
    @execute
end

Remove Duplicate Molecules

OpenBabel.@remove_duplicate_molsMacro
@remove_duplicate_mols(expr)

Remove duplicate molecules from the dataset.

Example

@chain begin
    @read_file("database.smi", "smi")
    @remove_duplicate_mols()
    @output_as("unique_molecules.smi", "smi")
    @execute
end
source

Usage Examples

Remove duplicate structures:

@chain begin
    @read_file("raw_data.smi", "smi")
    @remove_duplicate_mols()
    @output_as("unique_molecules.smi", "smi")
    @execute
end

Convert Dative Bonds

OpenBabel.@convert_dative_bondsMacro
@convert_dative_bonds(expr)

Convert dative bonds (coordinate covalent bonds) to normal bonds in the molecular representation.

Arguments

  • expr: Previous command in the chain

Example

@chain begin
    @read_file("coordination_compounds.mol", "mol")
    @convert_dative_bonds()
    @output_as("converted_bonds.mol", "mol")
    @execute
end
source

Usage Examples

Convert dative bonds to standard representation:

@chain begin
    @read_file("complexes.sdf", "sdf")
    @convert_dative_bonds()
    @output_as("converted_bonds.sdf", "sdf")
    @execute
end

Remove Hydrogens

OpenBabel.@remove_hydrogensMacro
@remove_hydrogens(expr)

Remove explicit hydrogen atoms from the molecular structure.

Arguments

  • expr: Previous command in the chain

Example

@chain begin
    @read_file("explicit_h_molecules.mol", "mol")
    @remove_hydrogens()
    @output_as("implicit_h_molecules.mol", "mol")
    @execute
end
source

Usage Examples

Remove explicit hydrogens:

@chain begin
    @read_file("molecules.sdf", "sdf")
    @remove_hydrogens()
    @output_as("implicit_h.sdf", "sdf")
    @execute
end

Set Atom Order Canonical

OpenBabel.@set_atom_order_canonicalMacro
@set_atom_order_canonical(expr)

Reorder atoms in molecules to follow a canonical (standardized) ordering.

Arguments

  • expr: Previous command in the chain

Example

@chain begin
    @read_file("unordered_molecules.smi", "smi")
    @set_atom_order_canonical()
    @output_as("canonical_molecules.smi", "smi")
    @execute
end
source

Usage Examples

Standardize atom ordering:

@chain begin
    @read_file("molecules.sdf", "sdf")
    @set_atom_order_canonical()
    @output_as("canonical_order.sdf", "sdf")
    @execute
end

Separate Fragments

OpenBabel.@separate_fragmentsMacro
@separate_fragments(expr)

Separate multi-fragment molecules into individual fragments as separate molecules.

Arguments

  • expr: Previous command in the chain

Example

@chain begin
    @read_file("salt_complexes.smi", "smi")
    @separate_fragments()
    @output_as("individual_fragments.smi", "smi")
    @execute
end
source

Usage Examples

Split salts and complexes:

@chain begin
    @read_file("salts.sdf", "sdf")
    @separate_fragments()
    @output_as("fragments.sdf", "sdf")
    @execute
end

Ignore Bad Molecules

OpenBabel.@ignore_bad_moleculesMacro
@ignore_bad_molecules(expr)

Skip molecules that cannot be parsed or contain errors, continuing with valid molecules.

Arguments

  • expr: Previous command in the chain

Example

@chain begin
    @read_file("mixed_quality_data.smi", "smi")
    @ignore_bad_molecules()
    @gen_3D_coords("fast")
    @output_as("valid_molecules.sdf", "sdf")
    @execute
end
source

Usage Examples

Skip invalid molecular structures:

@chain begin
    @read_file("raw_data.smi", "smi")
    @ignore_bad_molecules()
    @output_as("valid_molecules.smi", "smi")
    @execute
end

Start With Index

OpenBabel.@start_with_indexMacro
@start_with_index(expr, idx)

Start processing molecules from a specific index in the input file.

Arguments

  • expr: Previous command in the chain
  • idx: Starting index (1-based indexing)

Example

@chain begin
    @read_file("large_database.sdf", "sdf")
    @start_with_index(100)  # Start from the 100th molecule
    @add_properties(["MW"])
    @output_as("subset_molecules.sdf", "sdf")
    @execute
end
source

Usage Examples

@chain begin
    @read_file("huge_database.sdf", "sdf")
    @start_with_index(1000)  # Begin from molecule 1000
    @add_properties(["MW", "logP"])
    @sort_by("MW")
    @output_as("subset_processed.sdf", "sdf")
    @execute
end

Complete Workflows

Data Cleaning Pipeline

@chain begin
    @read_file("raw_data.smi", "smi")
    @ignore_bad_molecules()      # Skip invalid molecules
    @remove_duplicate_mols()     # Remove duplicates
    @remove_hydrogens()          # Implicit hydrogens
    @set_atom_order_canonical()  # Standardize atom order
    @separate_fragments()        # Split salts/complexes
    @match_smarts_string("[!#1]")  # Keep non-hydrogen containing
    @output_as("cleaned_data.smi", "smi")
    @execute
end

Structure-Based Filtering

@chain begin
    @read_file("compounds.smi", "smi")
    @match_smarts_string("c1ccccc1")     # Keep aromatic compounds
    @dont_match_smarts_string("[OH]")    # Remove alcohols
    @add_properties(["MW", "logP"])
    @sort_by_reverse("logP")
    @output_as("filtered_aromatics.sdf", "sdf")
    @execute
end