-
Notifications
You must be signed in to change notification settings - Fork 4
PyBCStandardLib
= Batteries Included - The Python Standard Library = A Tour of the Python Standard Library
A brief tour of the [http://docs.python.org/library/ python standard library]
One of python's more compelling features is the fully featured standard library. In this session we briefly touch several components of the python standard library that might be of interest to scientific minded users. The intent is not to give you a deep understanding of how to use each of these libraries, but rather to expose you to the breath and power of the libraries that come with every python distribution.
== Libraries and python 3.0 ==
Python is in the midst of a (slightly) backwards-compatibility breaking transition between the 2.x series and the 3.x. As scary as that might sound, this is not cause for concern. Most of the changes revolve around finally removing deprecated features (irix specific libraries removed), cleaning up the standard library (Queue renamed to queue), and fixing conceptual and syntactic inconsistencies (print is now a function, integer division is explicit so 1/2 now is 0.5 and 1//2 is 0). Nearly all of your code will port without modification. Tools exits (2to3) to yell at you for using deprecated features and automatically convert 2.6 to 3.0 code. If you want to tryout 3.0 features in 2.6, you can always import !__future!__.
With all fo that said, much of the scientific software stack depends on numpy, which has not been ported to 3.0. Because of this, we will only talk about libraries that exist relatively recent python versions (2.5) and survived the transition to 3.0. If a library gets renamed or reorganized, we will tell you.
- Things completely neglected, but in the standard library:
-
- [http://docs.python.org/library/markup.html Markup language parsing] (HTML/XML)
- [http://docs.python.org/library/internet.html Networking protocol] (ftp/http/smtp/imap/cgi)
There are somewhat specialized topics and you should got to the talks by people that use them.
== [http://docs.python.org/library/string.html String Manipulation] == {{{ #!python import string }}}
Python's string library contains useful constants like string.lowercase ('abcdefghijklmnopqrstuvwxyz') and string.octdigits ('01234567'). It used to contain string functions before they were pushed into the string class. In python 2.6 and beyond there are powerful functions and classes for doing string substitution.
== [http://docs.python.org/library/re.html Regular Expressions] == {{{ #!python import re }}} Regular expressions (RE's) describe patters in text and are highly useful for doing text processing. While really useful, they are not part of the base language (as compared to perl). You also need to use raw strings. Python strings interpret the backslash character specially. This allows the 'n' syntax for newline. The problem is that backslash has special meaning in RE's. So to get a RE backslash you have to escape the python backslash with another backslash. If you ever needed a literal backslash, you'd need another RE backslash, so two more backslash characters, bringing to total to four. This is obnoxious. The solution is to use "raw strings" Raw strings to not treat any characters a special. To make a raw string, you prefix an 'r'. r'n' is the backslash and 'n' character. 'n' is the newline character.
To be fast, python RE's are handled in a special purpose library written in C. This introduces the unfortunate .compile() syntax to convert python strings into RE's.
[http://docs.python.org/howto/regex.html#regex-howto HOWTO]
== [http://docs.python.org/library/optparse.html Command Line Arguments] == {{{ #!python import optparse }}} If you spend much time in the shell, you will quickly want the ability to write your own command line callable programs. You will inevitably want to pass options to these programs. optparse does all the heavy lifting of parsing command line options for you. The syntax is simple and intuitive. Play with the following example, it covers most of the common usages. {{{ #!CodeExample #!python #!/usr/bin/env python
from optparse import OptionParser
parser = OptionParser()
parser.add_option('-m') parser.add_option('-n', '--nameless-option') parser.add_option('-f', '--file', dest='filename')
# Boolean Flags and actions parser.add_option('-q', '--quiet', action='store_false', dest='verbose') parser.add_option('-v', '--verbose', action='store_true', dest='verbose', default=True)
parser.add_options('-a', help='this option has help info')
(options, args) = parser.parse_args()
parser.usage = 'i get printed when you pass -h or --help'
print 'options', options print 'args', args }}}
== [http://docs.python.org/library/datetime.html Dates, Times, and Durations] == {{{ #!python import datetime }}} Python provides support for dealing with times, dates and durations. If all you need to do is time manipulation or calendar work, also check our the intuitively named libraries time and calendar. {{{ #!CodeExample #!python import datetime
startdate = datetime.datetime(2010, 1, 12, 14, 0, 0, 0) duration = datetime.timedelta(days=3, hours=6) print startdate print (startdate + duration).weekday() }}}
Deals with dates and times, can deal with timezones and leap years and such automatically.
== [http://docs.python.org/library/math.html Mathematical Functions] == {{{ #!python import math }}} math has basic mathematical functions like exp, log10, sin, cos. Also has rounding functions like floor, and ceil. Random other mathematical things like isnan and modf (modulus for floats)
== [http://docs.python.org/library/cmath.html Complex Numbers] == {{{ #!python import cmath }}} cmath as basic mathematical functions for complex numbers. Common functions like polar, rect, phase are in there too.
== [http://docs.python.org/library/decimal.html Decimal Numbers] == {{{ #!python import decimal }}} In some cases, floating point representation of numbers is not sufficient. decimal represents numbers exactly (1.1 != 1.000000000001), will propagation of significant figures and can handle (0.1 + 0.1 + 0.1 - 0.3 = 0).
== [http://docs.python.org/library/random.html Random numbers] == {{{ #!python import random }}} If you need random numbers, this is where you get them. There are function of uniform sequences, as well as triangular, beta, gaussian, gamma, log norm, and more distributions. Numbers are generated using the Mersenne twister algorithm.
== [http://docs.python.org/library/itertools.html Iteration Tools] == {{{ #!python import itertools }}} itertools contains powerful tools for iteration. Many ideas are taken from functional programming languages like SML and Haskell. {{{ #!CodeExample #!python import itertools import math
a = [1,2,3] b = [4,5,6]
print 'chain' for x in itertools.chain(a,b):
print x
print 'imap' for x in itertools.imap(math.pow, a, b):
print x
print 'combinations' for x in itertools.combinations('ABCD', 2):
print x
}}}
== [http://docs.python.org/library/pickle.html Serialization/Saving/Restoring State] == {{{ #!python import pickle import cPickle }}} Python provides a convenient way of serializing objects, writing them to files and reading them from files. Effectively this means you can save the state of a program and restore it later. For most purposes, pickle and cPickle do the same thing and cPickle should be used as it is implemented in C and up to 1000 times faster. In python 3.0 have been unified. pickle is implemented in pure python, but an optimized version (probably written in C) is used when it exits.
Shamelessly [http://docs.python.org/library/pickle.html#example stolen] from the python documentation: {{{ #!CodeExample #!python import pickle
- data1 = {'a': [1, 2.0, 3, 4+6j],
- 'b': ('string', u'Unicode string'), 'c': None}
selfref_list = [1, 2, 3] selfref_list.append(selfref_list)
output = open('data.pkl', 'wb')
# Pickle dictionary using protocol 0. pickle.dump(data1, output)
# Pickle the list using the highest protocol available. pickle.dump(selfref_list, output, -1)
output.close() }}}
{{{ #!CodeExample #!python import pprint, pickle
pkl_file = open('data.pkl', 'rb')
data1 = pickle.load(pkl_file) pprint.pprint(data1)
data2 = pickle.load(pkl_file) pprint.pprint(data2)
pkl_file.close() }}}
== [http://docs.python.org/library/os.html Interacting with the OS] == {{{ #!python import os }}} With os you can change directories (os.chdir()), get environment variables (os.environ['HOME']), make directories (os.mkdir()) and much more. If you're trying to do shell scripting, this is probably the place to start.
== [http://docs.python.org/library/os.path.html Pathname Manipulation] == {{{ #!python import os.path }}} Paths are just strings in python, but os.path provides a platform independent way of manipulating with paths. You can os.path.join() paths, and test if something is a directory (os.path.isdir()) and lots of related functionality. If you need to traverse a full directory tree and so stuff, investigate os.path.walk().
== [http://docs.python.org/library/tempfile.html Temporary Files] == {{{ #!python import tempfile }}} tempfile securely creates and deletes files for temporary storage. There are options for spooled files and normal files.
== [http://docs.python.org/library/shutil.html High Lever Shell Operations] == {{{ #!python import shutil }}} shutil has high level shell manipulation functionality. Works well for for copying, moving and deleting multiple files or file trees. os has similar functionality, but lower level.
== [http://docs.python.org/library/archiving.html Compression] == {{{ #!python import zlib import gzip }}} [http://docs.python.org/library/zlib.html zlib] is an interface to the zlib library. [http://docs.python.org/library/gzip.html gzip] is the python replacement for gzip/gunzip. Confused? If you're working with *.gz files and really don't care about the underlying behavior that compresses those files, use gzip. If you are directly working with zlib, you should use zlib. You probably want to use gzip. If you're using either of these, you're probably on a *nix platform where gzip is ubiquitous.
{{{ #!python import zipflie }}} [http://docs.python.org/library/zipfile.html zipfile] handles creation and manipulation of files in the zip format.
{{{ #!python import tarfile }}} [http://docs.python.org/library/tarfile.html tarfile] handles creation and interaction with tar archives. It can read and write gzip and bz2 compressed archives. If you're dealing with multiple compressed files on *nix, you probably want to be using tarfile.
{{{ #!python import bz2 }}} [http://docs.python.org/library/bz2.html bz2] handles creation and manipulation of files compressed using bz2.
== Databases ([http://docs.python.org/library/sqlite3.html SQLite]) == {{{ #!python import sqlite3 }}} Python has built in hooks for working with SQLite databases.
== [http://docs.python.org/library/csv.html Tabular Data] == {{{ #!python import csv }}} Comma Separated Value formats are common in data tables and in the Excel world. As such it is useful to be able to easily parse such files. Numpy has similar functionality in [http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html numpy.loadtxt] that I find more convenient for tabular, space delimited data.
== [http://docs.python.org/library/timeit.html#timeit.timeit Performance Testing] == {{{ #!python import timeit }}} timeit provides a convenient way to time the execution of code using the Timer class. Timer takes string arguments for the actual code to be executed and setup code. The setup code is not part of the timing and is used for, er, setup. A couple convenience functions were introduced in python 2.6. {{{ #!CodeExample #!python import timeit import sys
- def fib(num):
-
- if num == 1:
- return 1
return num*fib(num-1)
- if __name__ == '__main__':
-
setup = """from __main__ import fib""" smt = """fib(14)"""
print 'using python', sys.version_info if sys.version_info >= (2, 6):
a = timeit.repeat(smt, setup, repeat=1, number=10)
- elif sys.version_info >= (2, 3):
- T = timeit.Timer(smt, setup) a = T.repeat(repeat=4, number=200)
- else:
- raise "must use python 2.3 or newer"
print min(a)
}}}
== [http://docs.python.org/library/glob.html Wildcard Filename Completion] == {{{ #!python import glob }}} glob does *nix-style pathname pattern matching with *, ? and []
{{{ #!python import fnmatch }}} fnmatch does the same thing as glob, but for filenames, rather than pathnames.
== [http://docs.python.org/library/collections.html Container Data Types] == {{{ #!python import collections }}} collections contains the deque (double ended queue) container type.
New to 2.6, are Abstract Base Classes (ABC). With these you can require that a subclass meets certain requirements. An iterable sequence must define !__iter!__, for example. collections defines several ABC you can test the functionality of an object. You can also subclass collections.Iterable to require that your class define the needed functionality. (It also gives you the convenience functions for free!) {{{ #!CodeExample #!python if isinstance(myvar, collections.Iterable):
- for x in myvar:
- print x
}}} Other ABC in collections are: Callable, Container, Hashable, !ItemsView, Iterable, Iterator, !KeysView, Mapping, !MappingView, !MutableMapping, !MutableSequence, !MutableSet, Sequence, Set, Sized, !ValuesView.
== [http://docs.python.org/library/unittest.html Unit Testing] == {{{ #!python import unittest }}} Python comes with a unit testing framework. It is the python version of Java's JUnit, which is the Java version of Smalltalk's testing framework. unittest has four import important concepts:
- test fixtures - preparation and clean up actions needed for doing actual tests. ex: starting a server or populating a test database
- test cases - actual tests. The base class for these is TestCase
- test suite - logical collections of test cases (or suites)
- test runner - executes the tests and reports back. This is the thing you call as a user.
Let's look at a [http://docs.python.org/library/unittest.html#basic-example simple example]. {{{ #!CodeExample #!python import random import unittest
class TestSequenceFunctions(unittest.TestCase):
- def setUp(self):
- self.seq = range(10)
- def testshuffle(self):
- # make sure the shuffled sequence does not lose any elements random.shuffle(self.seq) self.seq.sort() self.assertEqual(self.seq, range(10))
- def testchoice(self):
- element = random.choice(self.seq) self.assert_(element in self.seq)
- def testsample(self):
self.assertRaises(ValueError, random.sample, self.seq, 20) for element in random.sample(self.seq, 5):
self.assert_(element in self.seq)
- if __name__ == '__main__':
- unittest.main()
}}}
- setUp() is an overloaded function of TestCase that sets things up (tearDown() also exists)
- main() is the command line interface to the test script and runs all the other methods of TestSequenceFunctions
- assertEqual, assert_ and assertRaises test different aspects of expected behavior
== [http://docs.python.org/library/simplehttpserver.html Web Services] == {{{ #!python import SimpleHTTPServer }}} Python comes with lots of functionality for doing web stuff. If you attended Nico's talk on Django you probably know more than I do at this point. Just to whet your appetite for web services with python, here is how you run a simple http fileserver. {{{ #!python import SimpleHTTPServer SimpleHTTPServer.test() }}} Run this and then point your web browser to [http://localhost:8000/ http://localhost:8000/]. Note in python 3,SimpleHTTPServer is merged into http.server.
== [http://docs.python.org/library/urllib.html Using Resources by URL] == {{{ #!python import urllib }}}