-
Notifications
You must be signed in to change notification settings - Fork 98
Expand file tree
/
Copy pathpython_text_processing_intro.html
More file actions
55 lines (50 loc) · 2.23 KB
/
python_text_processing_intro.html
File metadata and controls
55 lines (50 loc) · 2.23 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
<!DOCTYPE html>
<html>
<head>
<script src="/bjc-r/llab/loader.js"></script>
<title>Introduction to Text Processing</title>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-176402054-1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-176402054-1');
</script>
</head>
<body>
<p>
One of the most interesting and important things we can do as a programmer is to manipulate and analyze real world data. Python is a particularly great language for data processing, because of the very powerful open source data manipulation libraries available <a href="https://wiki.python.org/moin/NumericAndScientific">online</a>.
</p>
<p>
Installing such libraries is beyond the scope of this course (see CS61A and/or C8 instead), so we'll be working mostly with libraries that are built into Python. In this lab, we'll work with text data.
</p>
<p>
This lab will also give you a chance to develop a full python program totally from scratch (with no starter code).
</p>
<p>
To get started on this lab:
<ul>
<li>Create a new folder called "datalab" on your computer (the name doesn't actually matter, but I'll assume that's the name you used).
</li>
<li>Download the data for this lab from <a href="/bjc-r/prog/python/text_processing.zip">this link</a>.
</li>
<li>Unzip this file into the datalab folder. You can unzip this file through the command line by using the <code>unzip</code> command.
<pre><code>$$ unzip text_processing.zip
$$ ls
text_processing.zip text_processing
$$ cd text_processing
$$ ls
beatles.txt nietzsche.txt
democratic_debate_2015.txt presedential_debate_2016.txt
ee_cummings.txt republican_debate_2015.txt
gettysburg.txt savio.txt
horse_ebooks.txt state_of_the_union_2015.txt
i_have_a_dream.txt
jay_z_lyrics.txt
</code></pre>
</li>
</p>
<p>If you need help with this process, make sure to ask your lab TA.</p>
</body>
</html>