Parliament in public
The Parliamentary Library has digitised decades of old Hansard. We learn how the team got the job done, and how the project has created a new constituency for the record of Australia's parliament.
On June 22, 2011, Bob Katter asked that the House of Representatives move a motion to note a matter of vital national interest.
Footballer Jonathan Thurston, a lynchpin of the Queensland Rugby League team, faced the prospect of a two-match ban for pushing a referee. If enacted, the ban would prevent Thurston from playing in the deciding third match of the annual State of Origin series.
Katter thought the charge was trumped-up.
“In light of the facts of this case,” Katter said: “This citing by the Sydney-based Match Review Committee could be regarded as irresponsible or, even possibly, mischievous.” Katter’s motion did not find a seconder – although Bronwyn Bishop raised a point of order declaring it “highly disorderly” – and the Speaker therefore batted it away.
The brief debate was almost certainly not a highlight of Australian democratic history, but for Roxanne Missingham it was just the kind of exchange that shows how Parliament contributes to and reflects the life of the nation, which makes it worthy of careful preservation and a substantial effort to make it publicly available.
Missingham’s interest comes from her role as Parliamentary Librarian for the Australian Parliament, where she has just led a project to digitise decades of Hansard, the transcription of every word uttered in Parliament, and tried to reproduce the experience of leafing through old volumes recorded during Parliaments past.
“The genesis of the project was our librarians’ gene,” Missingham says. “We are great believers in opening access to information. The Parliament is a wonderful source of discussions about national policies and perspectives. I felt we had an incredible opportunity to make those available and we had the technology to do it and we wanted to open Parliament for the nation.” Access and history Missingham says the project therefore had twin goals. “The first was a philosophical agenda of supporting democracy in a Web 2.0 world” and describes the second as “a commitment to open Parliament for the nation.” “To put it at its most basic, there is a little bit of Parliament that touches all Australians. Part of what Australia is built on is decisions made in parliament over 110 years.” Some of the material has little to do with politics. “Grievance debates have many mentions of people and their stories,” Missingham says.
“You can read about the prices of bread or the importance of telegraph access, or health problems.
You can see a remarkable set of stories about the changing of the Australian community. And reading quotes like Prime Minister Andrew Fisher vowing to ‘stand beside our own to help and defend Britain to the last man and the last shilling’ at the start of World War One is still remarkably moving.” But before Missingham could take Hansard online to make those stories available to everyone, she had to destroy it. Or at least one copy.
“We had to find a ‘sacrificial set’,” she explains.
“To scan Hansard, we had to take the spines off the books and divide them off into days.” That process was tricky because there are few full sets of Hansard in existence.
Six weeks of searching turned up a set of all-but four volumes, and owners willing to have them despined for the digital preservation process.
Missingham says “a team of people in the basement” then removed spines and “put in all the day markers for each day of Parliament.” That process was made difficult by the fact that on two occasions Parliament has not officially risen, an anomaly that meant some speeches officially belong to the day before they were delivered.
Sorting the sacrificial set by date created what Missingham calls “a massive line-up against the walls” waiting to be scanned.
“Some of the pages were on an angle and some were blurred,” Missingham says. “In the war years, paper was precious, so it was made thin: we had bleedthrough of text.” Those issues meant that a single scan was not enough to take the archive online, so the Parliamentary Library worked with an outside provider that analysed and refined the scans.
“First we used optical character recognition software, then we went through and turned it all into XML. We differentiated by the name of each speaker and the kind of debate or speech: all the structured data that we with our little librarian hearts just love.” Missingham asked the library’s providers to achieve 95 per cent accuracy and feels that level was achieved, but some peculiar artefacts emerged. “Some of the old fonts printed very heavily,” Missingham says. “’Mr Speaker’ came out as ‘Mr Spearer’.” Harold Holt sometimes emerged as ‘Harold Hots’, while Tough-as-nails Fraser Government Defence Minister Jim Killen became ‘Jim Kitten’.
The Library’s response was to do three rounds of manual error checking of speakers’ names, as Missingham felt that detail was a critical element of the online archive. The team also spent a lot of time checking to ensure that all speeches matched the sitting day on which they were delivered, another critical accuracy issue.
Missingham says that while correcting these minutiae was time-consuming – Library staff worked late into the night on many occasions to finish the job – it was also essential to meet the Library’s aim of creating a resource for the nation.
“One of the things we learned came from a conversation with another Parliament, which asked us how we planned to finish the project. That parliament still had about 10 per cent of Hansard unscanned and unfinished.
“We said you just have to do it. We were absolutely committed to finishing it. It’s never as easy as you think: we got it done and are very proud that we got to as close as possible of 100 per cent.” The result, now online at parlinfo.aph.gov.au/ parlInfo/search/search.w3p, is a colossal collection of documents dating back to 1901, all searchable using a number of criteria and downloadable as PDF files.
“We went with PDF, which we think of as a facsimile, because you can have the experience of looking at the old Hansards,” Missingham says. “It is lovely to look at the old pages and get a sense of the history and the fonts, plus all the standard textual stuff.”
A new audience
The site received a quiet launch time to coincide with the 110th anniversary of the opening of Australia’s Parliament, but was otherwise conducted with little fanfare. The site has nonetheless attracted a strong audience.
“Hansard gets millions of hits,” Missingham says.
“Estimates hearings mean especially big traffic and Godwin Grech’s testimony created huge traffic: we had a couple of big blips.” The Hansard archive is not causing traffic spikes of that magnitude, but has brought a new audience.
“Historians love it,” Missingham says. “I have had historians phone from universities and the Department of Foreign Affairs and Trade asking lovely nerdy questions, which seems to me that readers are using it in different ways. That’s a different community to our usual audience of those interested in advocacy on contemporary issues.” Missingham is excited to have won that new audience, as she sees it as another example of changing the Library’s role and the services it offers to the community.
“In the last two years, we have adopted Facebook and Twitter and started a blog called Flagpost (http:// parliamentflagpost.blogspot.com) that has had 38,000 hits. We’ve turned on comments in the blog and for us we really see that the benefit in a 2.0 world is a more informed public with better access to information.” The new Hansard search is therefore another step in that direction, and Missingham says it has generated “steady and very positive feedback“. The Library will therefore continue to innovate in this vein.
“You need to put info out there, get comfortable with that and then get more interactive
As the Australian Government continues the next phases of vaccine delivery, it has been faced...
The Victorian Auditor-General's Office (VAGO) has called for a rethink around the public...
A research report has revealed that almost half of the public sector CDOs in APAC are unclear...