@@ -1671,6 +1671,208 @@ function takes a number of arguments. Only the first is required.
16711671* ``chunksize ``: Number of rows to write at a time
16721672* ``date_format ``: Format string for datetime objects
16731673
1674+ .. _io.csv_precision :
1675+
1676+ Floating Point Precision in CSV
1677+ ++++++++++++++++++++++++++++++++
1678+
1679+ When working with floating point numbers in CSV files, it's important to understand
1680+ that precision can be lost during the write/read roundtrip. This section explains
1681+ why this happens and how to control precision using the ``float_format `` parameter.
1682+
1683+ Understanding Precision Loss
1684+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1685+
1686+ Floating point numbers are represented internally using binary format, which can
1687+ lead to precision issues when converting to and from text representation in CSV files.
1688+ Consider this example:
1689+
1690+ .. ipython :: python
1691+
1692+ import pandas as pd
1693+ import numpy as np
1694+
1695+ # Create a DataFrame with a problematic floating point value
1696+ df = pd.DataFrame({' value' : [0.1 + 0.2 ]})
1697+ print (f " Original value: { df[' value' ].iloc[0 ]!r } " )
1698+
1699+ # Save to CSV and read back
1700+ df.to_csv(' test_precision.csv' , index = False )
1701+ df_read = pd.read_csv(' test_precision.csv' )
1702+ print (f " After CSV roundtrip: { df_read[' value' ].iloc[0 ]!r } " )
1703+ print (f " Values are equal: { df[' value' ].iloc[0 ] == df_read[' value' ].iloc[0 ]} " )
1704+
1705+ .. ipython :: python
1706+ :suppress:
1707+
1708+ import os
1709+ if os.path.exists(' test_precision.csv' ):
1710+ os.remove(' test_precision.csv' )
1711+
1712+ In this case, the slight precision loss occurs because the decimal ``0.3 `` cannot be
1713+ exactly represented in binary floating point format.
1714+
1715+ Using float_format for Precision Control
1716+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1717+
1718+ The ``float_format `` parameter allows you to control how floating point numbers are
1719+ formatted when written to CSV. This can help preserve precision and ensure reliable
1720+ roundtrip operations.
1721+
1722+ .. ipython :: python
1723+
1724+ # Example with high precision number
1725+ df = pd.DataFrame({' precision_test' : [123456789.123456789 ]})
1726+ print (f " Original: { df[' precision_test' ].iloc[0 ]} " )
1727+
1728+ # Default behavior
1729+ df.to_csv(' default.csv' , index = False )
1730+ df_default = pd.read_csv(' default.csv' )
1731+
1732+ # With explicit precision control
1733+ df.to_csv(' formatted.csv' , index = False , float_format = ' %.15g ' )
1734+ df_formatted = pd.read_csv(' formatted.csv' )
1735+
1736+ print (f " Default read: { df_default[' precision_test' ].iloc[0 ]} " )
1737+ print (f " Formatted read: { df_formatted[' precision_test' ].iloc[0 ]} " )
1738+
1739+ .. ipython :: python
1740+ :suppress:
1741+
1742+ for f in [' default.csv' , ' formatted.csv' ]:
1743+ if os.path.exists(f):
1744+ os.remove(f)
1745+
1746+ Format Specifiers
1747+ ~~~~~~~~~~~~~~~~~
1748+
1749+ Different format specifiers have different effects on precision and output format:
1750+
1751+ **Fixed-point notation (f) **:
1752+ - ``'%.6f' `` - 6 decimal places: ``123456789.123457 ``
1753+ - ``'%.10f' `` - 10 decimal places: ``123456789.1234567910 ``
1754+ - Best for: Numbers with known decimal precision requirements
1755+
1756+ **General format (g) **:
1757+ - ``'%.6g' `` - 6 significant digits: ``1.23457e+08 ``
1758+ - ``'%.15g' `` - 15 significant digits: ``123456789.123457 ``
1759+ - Best for: Preserving significant digits, automatic scientific notation
1760+
1761+ **Scientific notation (e) **:
1762+ - ``'%.6e' `` - Scientific with 6 decimal places: ``1.234568e+08 ``
1763+ - ``'%.10e' `` - Scientific with 10 decimal places: ``1.2345678912e+08 ``
1764+ - Best for: Very large or very small numbers
1765+
1766+ .. ipython :: python
1767+
1768+ # Demonstrate different format effects
1769+ df = pd.DataFrame({' number' : [123456789.123456789 ]})
1770+
1771+ formats = {' %.6f ' : ' 6 decimal places' ,
1772+ ' %.10g ' : ' 10 significant digits' ,
1773+ ' %.6e ' : ' scientific notation' }
1774+
1775+ for fmt, description in formats.items():
1776+ df.to_csv(' temp.csv' , index = False , float_format = fmt)
1777+ with open (' temp.csv' , ' r' ) as f:
1778+ csv_content = f.read().strip().split(' \n ' )[1 ]
1779+ print (f " { description:20 } : { csv_content} " )
1780+
1781+ .. ipython :: python
1782+ :suppress:
1783+
1784+ if os.path.exists(' temp.csv' ):
1785+ os.remove(' temp.csv' )
1786+
1787+ Best Practices
1788+ ~~~~~~~~~~~~~~
1789+
1790+ **For high-precision scientific data **:
1791+ Use ``float_format='%.17g' `` to preserve maximum precision:
1792+
1793+ .. ipython :: python
1794+
1795+ # High precision example
1796+ scientific_data = pd.DataFrame({
1797+ ' measurement' : [1.23456789012345e-10 , 9.87654321098765e15 ]
1798+ })
1799+ scientific_data.to_csv(' scientific.csv' , index = False , float_format = ' %.17g ' )
1800+
1801+ .. ipython :: python
1802+ :suppress:
1803+
1804+ if os.path.exists(' scientific.csv' ):
1805+ os.remove(' scientific.csv' )
1806+
1807+ **For financial data **:
1808+ Use fixed decimal places like ``float_format='%.2f' ``:
1809+
1810+ .. ipython :: python
1811+
1812+ # Financial data example
1813+ financial_data = pd.DataFrame({
1814+ ' price' : [19.99 , 1234.56 , 0.01 ]
1815+ })
1816+ financial_data.to_csv(' financial.csv' , index = False , float_format = ' %.2f ' )
1817+
1818+ .. ipython :: python
1819+ :suppress:
1820+
1821+ if os.path.exists(' financial.csv' ):
1822+ os.remove(' financial.csv' )
1823+
1824+ **For ensuring exact roundtrip **:
1825+ Test your specific data to find the minimum precision needed:
1826+
1827+ .. ipython :: python
1828+
1829+ def test_roundtrip_precision (df , float_format ):
1830+ """ Test if a float_format preserves data during CSV roundtrip."""
1831+ df.to_csv(' test.csv' , index = False , float_format = float_format)
1832+ df_read = pd.read_csv(' test.csv' )
1833+ return df.equals(df_read)
1834+
1835+ # Test data
1836+ test_df = pd.DataFrame({' values' : [123.456789 , 0.000123456 , 1.23e15 ]})
1837+
1838+ # Test different precisions
1839+ for fmt in [' %.6g ' , ' %.10g ' , ' %.15g ' ]:
1840+ success = test_roundtrip_precision(test_df, fmt)
1841+ print (f " Format { fmt} : { ' ✓' if success else ' ✗' } roundtrip success " )
1842+
1843+ .. ipython :: python
1844+ :suppress:
1845+
1846+ if os.path.exists(' test.csv' ):
1847+ os.remove(' test.csv' )
1848+
1849+ **dtype Preservation Note **:
1850+ Be aware that CSV format does not preserve NumPy dtypes. All numeric data
1851+ will be read back as ``float64 `` or ``int64 `` regardless of the original dtype:
1852+
1853+ .. ipython :: python
1854+
1855+ # dtype preservation example
1856+ original_df = pd.DataFrame({
1857+ ' float32_col' : np.array([1.23 ], dtype = np.float32),
1858+ ' float64_col' : np.array([1.23 ], dtype = np.float64)
1859+ })
1860+
1861+ print (" Original dtypes:" )
1862+ print (original_df.dtypes)
1863+
1864+ original_df.to_csv(' dtypes.csv' , index = False )
1865+ read_df = pd.read_csv(' dtypes.csv' )
1866+
1867+ print (" \n After CSV roundtrip:" )
1868+ print (read_df.dtypes)
1869+
1870+ .. ipython :: python
1871+ :suppress:
1872+
1873+ if os.path.exists(' dtypes.csv' ):
1874+ os.remove(' dtypes.csv' )
1875+
16741876 Writing a formatted string
16751877++++++++++++++++++++++++++
16761878
0 commit comments