Skip to content

Commit dced9e4

Browse files
authored
Documentation improvements (#134)
* Docstring corrections * Add PUG REST page to the docs * Docs tweaks * docs code blocks * Improve getting started docs
1 parent b728ffa commit dced9e4

File tree

14 files changed

+117
-76
lines changed

14 files changed

+117
-76
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
PubChemPy provides a way to interact with PubChem in Python. It allows chemical searches by name, substructure and similarity, chemical standardization, conversion between chemical file formats, depiction and retrieval of chemical properties.
77

8-
```python
8+
```pycon
99
>>> from pubchempy import get_compounds, Compound
1010
>>> comp = Compound.from_cid(1423)
1111
>>> print(comp.smiles)

docs/guide/advanced.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
(advanced)=
22

3-
# Advanced Usage
3+
# Advanced usage
44

55
This guide covers advanced PubChemPy usage patterns, API best practices, error handling, logging, and low-level request functions.
66

@@ -13,16 +13,16 @@ If there are too many results for a request, you will receive a TimeoutError. Th
1313
If retrieving full compound or substance records, instead request a list of cids or sids for your input, and then request the full records for those identifiers individually or in small groups. For example:
1414

1515
```python
16-
sids = get_sids('Aspirin', 'name')
16+
sids = get_sids("Aspirin", "name")
1717
for sid in sids:
1818
s = Substance.from_sid(sid)
1919
```
2020

2121
When using the `formula` namespace or a `searchtype`, you can also alternatively use the `listkey_count` and `listkey_start` keyword arguments to specify pagination. The `listkey_count` value specifies the number of results per page, and the `listkey_start` value specifies which page to return. For example:
2222

2323
```python
24-
get_compounds('CC', 'smiles', searchtype='substructure', listkey_count=5)
25-
get('C10H21N', 'formula', listkey_count=3, listkey_start=6)
24+
get_compounds("CC", "smiles", searchtype="substructure", listkey_count=5)
25+
get("C10H21N", "formula", listkey_count=3, listkey_start=6)
2626
```
2727

2828
## Logging
@@ -61,8 +61,8 @@ A simple fix is to specify the proxy information via urllib:
6161
```python
6262
import urllib
6363
proxy_support = urllib.request.ProxyHandler({
64-
'http': 'http://<proxy.address>:<port>',
65-
'https': 'https://<proxy.address>:<port>'
64+
"http": "http://<proxy.address>:<port>",
65+
"https": "https://<proxy.address>:<port>"
6666
})
6767
opener = urllib.request.build_opener(proxy_support)
6868
urllib.request.install_opener(opener)

docs/guide/compound.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
(compound)=
22

3-
# Compound
3+
# Compounds
44

55
The {func}`~pubchempy.get_compounds` function returns a list of {class}`~pubchempy.Compound` objects. You can also instantiate a {class}`~pubchempy.Compound` object directly if you know its CID:
66

@@ -14,9 +14,9 @@ Each {class}`~pubchempy.Compound` has a `record` property, which is a dictionary
1414

1515
Additionally, each {class}`~pubchempy.Compound` provides a {meth}`~pubchempy.Compound.to_dict` method that returns PubChemPy's own dictionary representation of the Compound data. As well as being more concisely formatted than the raw `record`, this method also takes an optional parameter to filter the list of the desired properties:
1616

17-
```python
17+
```pycon
1818
>>> c = pcp.Compound.from_cid(962)
19-
>>> c.to_dict(properties=['atoms', 'bonds', 'inchi'])
19+
>>> c.to_dict(properties=["atoms", "bonds", "inchi"])
2020
{'atoms': [{'aid': 1, 'element': 'o', 'x': 2.5369, 'y': -0.155},
2121
{'aid': 2, 'element': 'h', 'x': 3.0739, 'y': 0.155},
2222
{'aid': 3, 'element': 'h', 'x': 2, 'y': 0.155}],
@@ -25,7 +25,13 @@ Additionally, each {class}`~pubchempy.Compound` provides a {meth}`~pubchempy.Com
2525
'inchi': u'InChI=1S/H2O/h1H2'}
2626
```
2727

28-
## 3D Compounds
28+
## 3D compounds
29+
30+
By default, compounds are returned with 2D coordinates. Use the `record_type` keyword argument to specify otherwise:
31+
32+
```python
33+
pcp.get_compounds("Aspirin", "name", record_type="3d")
34+
```
2935

3036
Many properties are missing from 3D records, and the following properties are *only* available on 3D records:
3137

docs/guide/contribute.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
(contribute)=
22

3-
# Contribute
3+
# Contributing
44

55
The [Issue Tracker] is the best place to post any feature ideas, requests and bug reports.
66

docs/guide/download.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ The {func}`~pubchempy.download` function is for saving a file to disk. The follo
77
Examples:
88

99
```python
10-
pcp.download('PNG', 'asp.png', 'Aspirin', 'name')
11-
pcp.download('CSV', 's.csv', [1,2,3], operation='property/ConnectivitySMILES,SMILES')
10+
pcp.download("PNG", "asp.png", "Aspirin", "name")
11+
pcp.download("CSV", "s.csv", [1,2,3], operation="property/ConnectivitySMILES,SMILES")
1212
```
1313

1414
For PNG images, the `image_size` argument can be used to specify `large`, `small`

docs/guide/gettingstarted.md

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -10,19 +10,19 @@ Retrieving information about a specific Compound in the PubChem database is simp
1010

1111
Begin by importing PubChemPy:
1212

13-
```python
13+
```pycon
1414
>>> import pubchempy as pcp
1515
```
1616

1717
Let's get the {class}`~pubchempy.Compound` with [CID 5090]:
1818

19-
```python
19+
```pycon
2020
>>> c = pcp.Compound.from_cid(5090)
2121
```
2222

2323
Now we have a {class}`~pubchempy.Compound` object called `c`. We can get all the information we need from this object:
2424

25-
```python
25+
```pycon
2626
>>> print(c.molecular_formula)
2727
C17H14O4S
2828
>>> print(c.molecular_weight)
@@ -43,34 +43,42 @@ All the code examples in this documentation will assume you have imported PubChe
4343
```python
4444
from pubchempy import Compound, get_compounds
4545
c = Compound.from_cid(1423)
46-
cs = get_compounds('Aspirin', 'name')
46+
cs = get_compounds("Aspirin", "name")
4747
```
4848
````
4949

5050
## Searching
5151

52-
What if you don't know the PubChem CID of the Compound you want? Just use the {func}`~pubchempy.get_compounds` function:
52+
What if you don't know the PubChem CID of the Compound you want? Just use the {func}`~pubchempy.get_compounds` function, for example with a compound name input:
5353

54-
```python
55-
>>> results = pcp.get_compounds('Glucose', 'name')
54+
```pycon
55+
>>> results = pcp.get_compounds("Glucose", "name")
5656
>>> print(results)
5757
[Compound(5793)]
5858
```
5959

60-
The first argument is the identifier, and the second argument is the identifier type, which must be one of `name`, `smiles`, `sdf`, `inchi`, `inchikey` or `formula`. It looks like there are 4 compounds in the PubChem Database that have the name Glucose associated with them. Let's take a look at them in more detail:
60+
The first argument is the identifier, and the second argument is the identifier type, which must be one of `name`, `smiles`, `sdf`, `inchi`, `inchikey` or `formula`. More often than not, only a single result will be returned, but sometimes there are multiple results for a given identifier. Therefore, {func}`~pubchempy.get_compounds` returns a list of {class}`~pubchempy.Compound` objects (even if there is only one result).
6161

62-
```python
62+
It is possible to iterate over this list to get the individual {class}`~pubchempy.Compound` objects:
63+
64+
```pycon
6365
>>> for compound in results:
6466
... print(compound.smiles)
6567
C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O
6668
```
6769

68-
It looks like they all have different stereochemistry information.
70+
Or you can access the first result directly:
6971

70-
Retrieving the record for a SMILES string is just as easy:
72+
```pycon
73+
>>> compound = results[0]
74+
>>> print(compound.smiles)
75+
C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O
76+
```
7177

72-
```python
73-
>>> pcp.get_compounds('C1=CC2=C(C3=C(C=CC=N3)C=C2)N=C1', 'smiles')
78+
Retrieving the compound record(s) for a SMILES input is just as easy:
79+
80+
```pycon
81+
>>> pcp.get_compounds("C1=CC2=C(C3=C(C=CC=N3)C=C2)N=C1", "smiles")
7482
[Compound(1318)]
7583
```
7684

docs/guide/introduction.md

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,7 @@ PubChemPy relies entirely on the PubChem database and chemical toolkits provided
88

99
This is important to remember when using PubChemPy: Every request you make is transmitted to the PubChem servers, evaluated, and then a response is sent back. There are some downsides to this: It is less suitable for confidential work, it requires a constant internet connection, and some tasks will be slower than if they were performed locally on your own computer. On the other hand, this means we have the vast resources of the PubChem database and chemical toolkits at our disposal. As a result, it is possible to do complex similarity and substructure searching against a database containing tens of millions of compounds in seconds, without needing any of the storage space or computational power on your own local computer.
1010

11-
## The PUG REST web service
12-
13-
You don't need to worry too much about how the PubChem web service works, because PubChemPy handles all of the details for you. But if you want to go beyond the capabilities of PubChemPy, there is some helpful documentation on the PubChem website.
14-
15-
- [PUG REST Tutorial]: Explains how the web service works with a variety of usage examples.
16-
- [PUG REST Specification]: A more comprehensive but dense specification that details every possible way to use the web service.
11+
See the {doc}`pugrest` page for more information about how PubChemPy uses the PubChem web service.
1712

1813
## PubChemPy license
1914

@@ -27,6 +22,4 @@ You don't need to worry too much about how the PubChem web service works, becaus
2722
[^f1]: That's a lot of acronyms! PUG stands for "Power User Gateway", a term used to describe a variety of methods for programmatic access to PubChem data and services. REST stands for [Representational State Transfer], which describes the specific architectural style of the web service.
2823

2924
[pubchem website]: https://pubchem.ncbi.nlm.nih.gov
30-
[pug rest specification]: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest
31-
[pug rest tutorial]: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest-tutorial
3225
[representational state transfer]: https://en.wikipedia.org/wiki/Representational_state_transfer

docs/guide/pandas.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,31 +2,29 @@
22

33
# *pandas* integration
44

5-
## Getting *pandas*
5+
## Installing *pandas*
66

7-
*pandas* must be installed to use its functionality from within PubChemPy. The easiest way is to use pip:
7+
*pandas* must be installed to use its functionality from within PubChemPy. It is an optional dependency, so it is not installed automatically with PubChemPy. The easiest way is to use pip:
88

99
```bash
1010
pip install pandas
1111
```
1212

13-
See the [pandas documentation] for more information.
13+
See the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/) for more information.
1414

1515
## Usage
1616

1717
It is possible for {func}`~pubchempy.get_compounds`, {func}`~pubchempy.get_substances` and {func}`~pubchempy.get_properties` to return a pandas DataFrame:
1818

1919
```python
20-
df1 = pcp.get_compounds('C20H41Br', 'formula', as_dataframe=True)
20+
df1 = pcp.get_compounds("C20H41Br", "formula", as_dataframe=True)
2121
df2 = pcp.get_substances([1, 2, 3, 4], as_dataframe=True)
22-
df3 = pcp.get_properties(['smiles', 'xlogp', 'rotatable_bond_count'], 'C20H41Br', 'formula', as_dataframe=True)
22+
df3 = pcp.get_properties(["smiles", "xlogp", "rotatable_bond_count"], "C20H41Br", "formula", as_dataframe=True)
2323
```
2424

2525
An existing list of {class}`~pubchempy.Compound` objects can be converted into a dataframe, optionally specifying the desired columns:
2626

2727
```python
28-
cs = pcp.get_compounds('C20H41Br', 'formula')
29-
df4 = pcp.compounds_to_frame(cs, properties=['smiles', 'xlogp', 'rotatable_bond_count'])
28+
cs = pcp.get_compounds("C20H41Br", "formula")
29+
df4 = pcp.compounds_to_frame(cs, properties=["smiles", "xlogp", "rotatable_bond_count"])
3030
```
31-
32-
[pandas documentation]: https://pandas.pydata.org/pandas-docs/stable/

docs/guide/properties.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
The {func}`~pubchempy.get_properties` function allows the retrieval of specific properties without having to deal with entire compound records. This is especially useful for retrieving the properties of a large number of compounds at once:
66

77
```python
8-
p = pcp.get_properties('SMILES', 'CC', 'smiles', searchtype='superstructure')
8+
p = pcp.get_properties("SMILES", "CC", "smiles", searchtype="superstructure")
99
```
1010

1111
Multiple properties may be specified in a list, or in a comma-separated string. The available properties are: MolecularFormula, MolecularWeight, ConnectivitySMILES, SMILES, InChI, InChIKey, IUPACName, XLogP, ExactMass, MonoisotopicMass, TPSA, Complexity, Charge, HBondDonorCount, HBondAcceptorCount, RotatableBondCount, HeavyAtomCount, IsotopeAtomCount, AtomStereoCount, DefinedAtomStereoCount, UndefinedAtomStereoCount, BondStereoCount, DefinedBondStereoCount, UndefinedBondStereoCount, CovalentUnitCount, Volume3D, XStericQuadrupole3D, YStericQuadrupole3D, ZStericQuadrupole3D, FeatureCount3D, FeatureAcceptorCount3D, FeatureDonorCount3D, FeatureAnionCount3D, FeatureCationCount3D, FeatureRingCount3D, FeatureHydrophobeCount3D, ConformerModelRMSD3D, EffectiveRotorCount3D, ConformerCount3D.
@@ -15,8 +15,8 @@ Multiple properties may be specified in a list, or in a comma-separated string.
1515
Get a list of synonyms for a given input using the {func}`~pubchempy.get_synonyms` function:
1616

1717
```python
18-
pcp.get_synonyms('Aspirin', 'name')
19-
pcp.get_synonyms('Aspirin', 'name', 'substance')
18+
pcp.get_synonyms("Aspirin", "name")
19+
pcp.get_synonyms("Aspirin", "name", "substance")
2020
```
2121

2222
Inputs that match more than one SID/CID will have multiple, separate synonyms lists returned.
@@ -26,14 +26,14 @@ Inputs that match more than one SID/CID will have multiple, separate synonyms li
2626
CAS Registry Numbers are not officially supported by PubChem, but they are often present in the synonyms associated with a compound. Therefore it is straightforward to retrieve them by filtering the synonyms to just those with the CAS Registry Number format:
2727

2828
```python
29-
for result in pcp.get_synonyms('Aspirin', 'name'):
30-
cid = result['CID']
29+
for result in pcp.get_synonyms("Aspirin", "name"):
30+
cid = result["CID"]
3131
cas_rns = []
32-
for syn in result.get('Synonym', []):
33-
match = re.match(r'(\d{2,7}-\d\d-\d)', syn)
32+
for syn in result.get("Synonym", []):
33+
match = re.match(r"(\d{2,7}-\d\d-\d)", syn)
3434
if match:
3535
cas_rns.append(match.group(1))
36-
print(f'CAS registry numbers for CID {cid}: {cas_rns}')
36+
print(f"CAS registry numbers for CID {cid}: {cas_rns}")
3737
```
3838

3939
## Identifiers

docs/guide/pugrest.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
(pugrest)=
2+
3+
# PUG REST
4+
5+
PUG (Power User Gateway) REST is a web service that PubChem provides for programmatic access to its data. PubChemPy uses this web service to interact with the PubChem database, allowing you to search for compounds, substances, and assays, retrieve their properties, and perform various operations without needing to download or store large datasets locally.
6+
7+
You don't need to worry too much about how the PubChem web service works, because PubChemPy handles all of the details for you. But understanding the underlying architecture can help you use PubChemPy more effectively and troubleshoot issues.
8+
9+
## PUG REST architecture
10+
11+
The PUG REST API is built around a three-part request pattern:
12+
13+
1. **Input**: Specifies which records you're interested in (by CID, name, SMILES, etc.)
14+
2. **Operation**: Defines what to do with those records (retrieve properties, search, etc.)
15+
3. **Output**: Determines the format of the returned data (JSON, XML, CSV, etc.)
16+
17+
This modular design allows for flexible combinations. For example, you can combine structure input via SMILES with property retrieval operations and CSV output - all handled seamlessly by PubChemPy.
18+
19+
## Request flow
20+
21+
When you make a request with PubChemPy:
22+
23+
1. Your Python request is translated into a PUG REST URL (and possibly some POST data).
24+
2. The request is sent to PubChem's servers via HTTPS.
25+
3. PubChem processes the request using their chemical databases and toolkits.
26+
4. Results are returned and parsed by PubChemPy into Python objects.
27+
28+
PubChem contains over 300 million substance records, over 100 million standardized compound records, and over 1 million biological assays. All this data may be accessed and processed through PubChemPy without requiring local storage or computational resources.
29+
30+
## When to use alternatives
31+
32+
While PubChemPy and PUG REST are excellent for many tasks, consider alternatives for:
33+
34+
- **Bulk data processing**: Use PubChem's bulk download services for large datasets
35+
- **Confidential work**: Consider local chemical toolkits for sensitive data
36+
- **Offline work**: The PUG REST API requires an internet connection
37+
38+
## Further reading
39+
40+
If you want to go beyond the capabilities of PubChemPy, there is helpful documentation about programmatic access to PubChem data on the PubChem website:
41+
42+
- [Programmatic Access to PubChem](https://pubchem.ncbi.nlm.nih.gov/docs/programmatic-access): Overview of how to access PubChem data programmatically.
43+
- [PUG REST Tutorial](https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest): Explains how the web service works with a variety of usage examples.
44+
- [PUG REST Specification](https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest-tutorial): A more comprehensive but dense specification that details every possible way to use the web service.

0 commit comments

Comments
 (0)