ChemInformant

A Robust Data Acquisition Engine for the Modern Scientific Workflow

ChemInformant is a robust data acquisition engine for the PubChem database, engineered for the modern scientific workflow. It intelligently manages network requests, performs rigorous runtime data validation, and delivers analysis-ready results, providing a dependable foundation for any computational chemistry project in Python.

✨ Key Features

Analysis-Ready Pandas/SQL Output: The core API (get_properties) returns either a clean Pandas DataFrame or a direct SQL output, eliminating data wrangling boilerplate and enabling immediate integration with both the Python data science ecosystem and modern database workflows.
Automated Network Reliability: Ensures your workflows run flawlessly with built-in persistent caching, smart rate-limiting, and automatic retries. It also transparently handles API pagination (ListKey) for large-scale queries, delivering complete result sets without any manual intervention.
Flexible & Fault-Tolerant Input: Natively accepts mixed lists of identifiers (names, CIDs, SMILES) and intelligently handles any invalid inputs by flagging them with a clear status in the output, ensuring a single bad entry never fails an entire batch operation.
A Dual API for Simplicity and Power: Offers a clear get_<property>() convenience layer for quick lookups, backed by a powerful get_properties engine for high-performance batch operations.
Guaranteed Data Integrity: Employs Pydantic v2 models for rigorous, runtime data validation when using the object-based API, preventing malformed or unexpected data from corrupting your analysis pipeline.
Terminal-Ready CLI Tools: Includes chemfetch and chemdraw for rapid data retrieval and 2D structure visualization directly from your terminal, perfect for quick lookups without writing a script.
Modern and Actively Maintained: Built on a contemporary tech stack for long-term consistency and compatibility, providing a reliable alternative to older or less frequently updated libraries.

📦 Installation

Install the library from PyPI:

pip install ChemInformant

To include plotting capabilities for use with the tutorial, install the [plot] extra:

pip install "ChemInformant[plot]"

🚀 Quick Start

Retrieve multiple properties for multiple compounds, directly into a Pandas DataFrame, in a single function call:

import ChemInformant as ci

# 1. Define your identifiers
identifiers = ["aspirin", "caffeine", 1983] # 1983 is paracetamol's CID

# 2. Specify the properties you need
properties = ["molecular_weight", "xlogp", "cas"]

# 3. Call the core function
df = ci.get_properties(identifiers, properties)

# 4. Save the results to an SQL database
ci.df_to_sql(df, "sqlite:///chem_data.db", "results", if_exists="replace")

# 5. Analyze your results!
print(df)

Output:

  input_identifier   cid status  molecular_weight  xlogp       cas
0          aspirin  2244     OK            180.16    1.2   50-78-2
1         caffeine  2519     OK            194.19   -0.1   58-08-2
2             1983  1983     OK            151.16    0.5  103-90-2

➡️ Click to see Convenience API Cheatsheet

Function	Description
`get_weight(id)`	Molecular weight (float)
`get_formula(id)`	Molecular formula (str)
`get_cas(id)`	CAS Registry Number (str)
`get_iupac_name(id)`	IUPAC name (str)
`get_canonical_smiles(id)`	Canonical SMILES with Canonical→Connectivity fallback (str)
`get_isomeric_smiles(id)`	Isomeric SMILES with Isomeric→SMILES fallback (str)
`get_xlogp(id)`	XLogP (calculated hydrophobicity) (float)
`get_synonyms(id)`	List of synonyms (List[str])
`get_compound(id)`	Full, validated `Compound` object (Pydantic v2 model)

Note: This table shows key convenience functions for demonstration. ChemInformant provides 22 convenience functions in total, covering molecular descriptors, mass properties, stereochemistry, and more.

All functions accept a CID, name, or SMILES and return None/[] on failure.

ChemInformant also includes handy command-line tools for quick lookups directly from your terminal:

chemfetch: Fetches properties for one or more compounds.

chemfetch aspirin --props "cas,molecular_weight,iupac_name"

chemdraw: Renders the 2D structure of a compound.
```
chemdraw aspirin
```

📚 Documentation & Examples

For a deep dive, please see our detailed guides:

➡️ Online Documentation: The official documentation site contains complete API references, guides, and usage examples. This is the most comprehensive resource.
➡️ Interactive User Manual: Our Jupyter Notebook Tutorial provides a complete, end-to-end walkthrough. This is the best place to start for a hands-on experience.
➡️ Performance Benchmarks: You can review and run our Benchmark Script to see the performance advantages of batching and caching.

🤔 Why ChemInformant?

ChemInformant's core mission is to serve as a high-performance data backbone for the Python cheminformatics ecosystem. By delivering clean, validated, and analysis-ready Pandas DataFrames, it enables researchers to effortlessly pipe PubChem data into powerful toolkits like RDKit, Scikit-learn, or custom machine learning models, transforming multi-step data acquisition and wrangling tasks into single, elegant lines of code.

A detailed comparison with other existing tools is provided in our JOSS paper.

🤝 Contributing

Contributions are welcome! For guidelines on how to get started, please read our contributing guide. You can open an issue to report bugs or suggest features, or submit a pull request to contribute code.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📑 Citation

@article{He2025,
  doi       = {10.21105/joss.08341},
  url       = {https://doi.org/10.21105/joss.08341},
  year      = {2025},
  publisher = {The Open Journal},
  volume    = {10},
  number    = {112},
  pages     = {8341},
  author    = {He, Zhiang},
  title     = {ChemInformant: A Robust and Workflow-Centric Python Client for High-Throughput PubChem Access},
  journal   = {Journal of Open Source Software}
}

Name		Name	Last commit message	Last commit date
Latest commit History 367 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
images		images
paper		paper
src/ChemInformant		src/ChemInformant
tests		tests
.gitignore		.gitignore
.zenodo.json		.zenodo.json
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
codecov.yml		codecov.yml
pyproject.toml		pyproject.toml
wide-cli-demo.gif		wide-cli-demo.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChemInformant

✨ Key Features

📦 Installation

🚀 Quick Start

📚 Documentation & Examples

🤔 Why ChemInformant?

🤝 Contributing

📄 License

📑 Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Languages

License

HzaCode/ChemInformant

Folders and files

Latest commit

History

Repository files navigation

ChemInformant

✨ Key Features

📦 Installation

🚀 Quick Start

📚 Documentation & Examples

🤔 Why ChemInformant?

🤝 Contributing

📄 License

📑 Citation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Languages

Packages