Skip to content

Files accompanying CIJ course - Finding Stories with Data

License

Notifications You must be signed in to change notification settings

helenalofving/FSwD

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FINDING STORIES WITH DATA

Files accompanying CIJ course Finding Stories with Data

Module 1

  • You will need this file which contains exercises consolidating the skills taught in session 1.

  • This is the short presentation

    • If you just want to read about pivot tables this pdf is all you need
  • Spreadsheets for journalists is the handout I wrote for a one-day course covering everything we now do in the 4 online sessions of Finding Stories with Data

Module 2

The Awesome Public Datasets repo as the name suggests - is awesome!

These tabs will take you to the information and sites in the main presentation

Module 3 Module 3 is entirely remote - you do the work in your own time, preferably on day 3 in order to bring any questions or issues you encounter to discussion at the beginning of Module 4.

  • Rather than spending a whole live session being bombarded with new information, you can come prepared, possibly having had a chance to try some of the techniques for yourself, or ready to ask questions and solve them in the session.
  • ImportHTML - converting a table from a web page to a spreadsheet, using Googlesheets. NB - since making this video it has become impossible (for reasons we cannot understand!) to scrape the parliament.uk data shown in the video. If you want to work with that data, take the pdf version and convert it to csv (as shown below). Or you can try using the free Chrome extension Instant Data Scraper - for reasons we don't understand, this DOES work!
  • Clean with Open Refine - cleaning names in a dataset where slight differences make the computer see them as different entries
  • Export your OpenRefine project How to convert your data back to Excel/Googlesheets after cleaning it in OpenRefine. The source table is this Wikipedia page if you want to follow the same steps
  • Reconcile with Open Refine Matching names in a column in OpenRefine with company names on OpenCorporates.com. NB – to make use of the OpenCorporates reconciliation service you need to add this address to the reconciliation menu in OpenRefine https://opencorporates.com/reconcile (see 1.18” in this video). See here for documentation. NB In order to make this work with your copy of OpenRefine, you will also need to create an OpenCorporates account, and apply for a (free) API key. OpenCorporates will then add your IP address to their list of users whose computers are allowed to use the reconcile function.
  • Convert tables in pdf files to csv spreadsheets with Tabula another excellent free program
  • Enhance one dataset by merging relevant data from another using the Vlookup formula.
  • Clean names with OpenRefine - Remove honorifics such as “Mr”, “Mrs”, “Dr, while also creating a reproducible script to save time doing the same job on future occasions
  • Power Query A relatively new Excel feature which allows you to merge data without learning VLOOKUP. It also keeps a record of what you have done, so you can check or reproduce your work – eg when you use a later edition of the same data.

Practice material accompanying the videos • To practice “VLookup” you need to download a copy of this worksheet containing the two datasets. • If you want to practise cleaning the MPs names (removing titles etc) using OpenRefine as in the video "Cleaning names with OpenRefine", you will need to download a copy of this dataset • To get a copy of the company donation data demonstrated in the cleaning and reconciling videos (2nd and 4th on the list), open this link in a new tab, and download a copy of the resulting file - using the [Export results] button.

• To practice the Power Query demo with the data used in the video you will need to download copies of these two datasets: – donations to MPs and list of MPs elected in 2019

  • Although OpenRefine contains links to useful instructional videos, this pdf is intended as a useful written guide to get you started

  • Reconciling lists using OpenRefine is covered in this pdf

  • In September 2024 we added an optional presentation on webscraping, which also introduces APIs, and why you need to know about them. The summary of the presentation is in this pdf

  • Scraping the web is an excellent tipsheet from IRE (it's well worth joining IRE. At around $70 a year it's worth joining just for the library of tipsheets!)

When you're ready to do some more practice, read on:

Challenge part 1 (optional) – pdf to spreadsheet. This link takes you so a relatively small list (37 names) of House of Commons staff who have declared receipt of Gifts or Hospitality. As in the real world, you may receive such data as a pdf file. Although you can ask the publisher to send it to you as csv/xls, you may want to convert it to those formats yourself. We recommend Tabula, but other methods are available, including CometDocs, see this review page

If you prefer, you can start here - Challenge part 2 – the pdf conversion should give you this file which has five columns and 37 rows. (NB – some of the entries in column 5 are so long they make the dataset very wide: in order to simplify the conversion process, we restricted the width of this column, so you will see that your conversion isn’t quite the same as this file – Gifts_visits.xlsx This makes no difference to the exercises) Column A contains the staff members’ family names in upper case, and column B their first names. Can you create a new column combining the names, all in one case (ie all upper, or with a capital first letter for each part), with “Firstname Surname”. Hint – spreadsheet programs can do this using formulas such as UPPER, PROPER, and CONCATENATE. You may also want to try it in OpenRefine and see which you prefer. If you want more practice, try combining the MPs’ names into another new column, in one case, “SURNAME, FIRSTNAME” – ie, with a comma and space after the surname.

Module 4

  • Why visualise is the presentation used in module 4.
  • There's a quick introduction to making graphics from spreadsheets in two parts Part 1 takes you raw data to a graph, and introduces you to Datawrapper. Part 2 takes a look at creating a graph from a pivot table.
  • This video looks at how to do the same thing with Googlesheets
  • There's a more detailed step-by-step guide to making a visualisation from an Excel pivot table here and from googlesheets here
  • This video looks at choosing the kind of chart most appropriate to the needs of your story
  • For handouts on choosing the right chart we recommend this interactive page produced by the Financial Times - Visual Vocabulary
  • For those who like a pdf to refer to, this Choosing a Chart is the one referred to during the course.

About

Files accompanying CIJ course - Finding Stories with Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published