Printer Friendly

Text analysis: a simple "big data" tool for local government.

Text analysis is an intuitive and low-cost tool that can quickly analyze data. Simple algorithms display the readability of a given text passage, helping governments determine how easily staff, officials, and the general public can access its documents. A limited analysis of budget documents from major U.S. cities shows that comprehensive annual financial reports (CAFRs) are often at or above a college reading level, while popular annual financial reports (PAFRs) range from an 8th to 12th grade level.

This article develops suggestions to make documents more accessible and readable, and provides a case study for text readability measurement used by the City of Dubuque, Iowa, in a performance-measurement context.


Text analysis is a broad term that is defined in a variety of ways. It generally refers to a set of procedures that analyze written text and produce scores that capture different dimensions of the text, such as readability. Text analysis examines the structure and length of sentences and words through classification schemes such as an automated count of multisyllabic words and a numeric measure of the text's grade level. A paragraph with many multisyllabic and technical words and a high grade level is challenging for the general population to read, so it may be necessary to change the wording and structure to make interpretation consistent across segments of the population.

A common measure is the Flesch Reading Ease score, which is used to evaluate text on a scale from 0 to 100, with 0 being very difficult to read and 100 being very easy. It can be run with most types of text, from newspaper articles to technical reports. When it was published in 1949, creator Rudolf Flesch estimated that fewer than 5 percent of all U.S. adults could read at a college level, while 93 percent could read at a 5th grade level. (1) This number has likely shifted, but the average adult still reads at approximately a 6th to 8th grade level. Microsoft Word can compute the Flesch Reading Ease score for any written document. For example, the score for this article is 35.9, with a grade level of 13.8.

There are many other types of measures and many programs that quickly automate readability measures, as well as more advanced measures such as sentiment analysis, an algorithm that determines if the writing contains different sentiments (e.g., positive or negative). These measures are highly complex and require an extensive understanding of linguistics and programming, so without expert guidance, governments should probably focus on readability scores rather than trying to conduct sentiment analyses.


An analyst can easily measure the readability of any finance document. PAFRs should be analyzed because they are specifically created for the general public. GASB Statement No. 34, Basic Financial Statements--and Management's Discussion and Analysis--for State and Local Governments, also recommends analyzing the CAFR Management's Discussion and Analysis (MD&A) section to make sure it is readable at an appropriate level. (2) This is a worthwhile goal, although governments need to take care not to change the original intent when simplifying the technical words and complicated sentence structures.

If a PAFR's reading level is too high for its intended audience, readers might misunderstand the information. Readability can be increased by shortening sentence length, avoiding complex and technical accounting phrases, using an active rather than a passive voice, and trying to explain things with simple references and metaphors that anyone can understand. Sometimes, linking budget concepts to actual policy outputs (e.g., saying fiscal reserves are the equivalent of a savings account that can maintain service during financial uncertainty) can be more effective than trying to provide a technical discussion.

These methods can also be used for the PAFR's overall structure. For example, instead of relying on para graphs, a PAFR can call out facts with an outline of key bullet points. This format can help display key facts about the budget quickly.

Diagrams and graphics that illustrate trends and distributions can also be helpful. For example, it can be easier to explain historical trends in revenues, expenditures, and reserves with a simple trend graph than with text (although text analysis can't be used on graphs). This approach can reduce the wordiness of a PAFR while communicating the data in a more accessible manner. Ideally, a PAFR should combine short and easily understood text with graphics and images to convey the key budgetary information that citizens may be interested in.


To better understand CAFR and PAFR readability, the authors selected reports from a few sample municipalities and ran random paragraphs through text analysis using the online program, which provides individual text analysis scores and an overall composite score. Many other programs can also run these types of measures, with extensions for analytics platforms such as SAS and RapidMiner, among others. Governments should find an analytics platform that best suits their specific needs, since the underlying algorithm for readability scores is the same regardless of the platform.

Some of the CAFRs were very difficult to read, with some paragraphs in the MD&A section at or above a college level. This isn't surprising, since CAFRs communicate technical information. Some of the CAFR paragraphs were nearly a quarter of a page, and they used extremely dense language that only budgeters, accountants, and economists would know. This may be unavoidable, but additional explanation and summary outlines would help non-experts understand the information.

Readability is more of a concern for PAFRs, since they focus on the public rather than technical experts. Scores for the PAFRs tested ranged from an 8th grade to 12th grade reading level. The PAFRs with higher reading levels used long paragraphs instead of short outlines of key facts. Also, longer, less familiar words (e.g., "stewardship") and budgeting terminology (e.g., "unencumbered balance" and "fiscal reserves") should be avoided or explained in terminology that can be easily understood.


Governments can integrate text analysis into their workflow by following several steps. It's helpful to start by running text analysis on the CAFR, PAFR, and perhaps even webpages, looking for general trends. Consider developing a policy that requires certain documents to be below an eigth-grade reading level. Create an accountability system for reducing the complexity of documents or webpages that are above the reading level selected for that type of document.


As the City of Dubuque, Iowa, embarked on a quest to develop performance measures, the question of how to reach the public efficiently and effectively came up. Since the outcomes were developed for public consumption, readability was essential.

To this end, the city formed a group to create the performance measure website, which identified several outcomes and goals after meeting with the Dubuque Leadership Team. The performance measure group then gathered in front of a whiteboard, writing out one outcome at a time to be reworked and analyzed. It used the free Readability Test Tool at to analyze each word in an outcome or goal that would be on the website home page. Any words that were beyond an eigth-grade reading level were changed and retested to give residents the best chance of understanding each message the city wanted to convey.

Frequently, words from the city's vision statements would be integrated into the outcomes Dubuque was trying to measure, including "sustainability," "viable," "equitable," and "livable." The reading level of these words scored between grade 21 and 43, and they were in every outcome. The city decided to find other words to tell its story for a website that was geared toward the public. For example, the word "equitable" was used in each of the city's eight goals. To convey the same message to residents at a lower reading grade level, the city substituted "for all." This new phrase scored a grade level of -1, meaning some of the cities tiniest residents, aged four and five, could understand the meaning. Combining too many words in one sentence or phrase also increases the readability score, but the team found that integrating very readable words helps keep the entire phrase at or below an eigth-grade reading level.

After analyzing each word on the whiteboard, the team then analyzed the entire phrase. If the readability score rose above an eigth-grade level, the phrase was reworded. The process for assessing readability for the entire website was completed in a four-hour session where approximately one hour was dedicated to assessing readability.

The outcomes of this analysis were phrases of four to eight words, whereas the content on subsequent web pages would include paragraphs explaining major projects. A member of the performance measure group was identified to ensure readability throughout 30 additional webpages, a project expected to take an additional four to five hours, before the website goes live in September 2017.

Dubuque has implemented some effective ways of reaching out to citizens in language they can easily understand, which improves fiscal transparency. The city found that it's easier to have a readability project conducted by a small group that can quickly offer synonyms for words that need to be changed. The team members also need to understand exactly how changing a word can change the meaning of the original concept. Dubuque has started issuing a citizen's guide to the budget and will soon produce its first PAFR.


Local governments should consider using text analysis to check the grade level of all documents, making sure they effectively communicate the intended content and message. Increasing the readability of text across government documents can also improve transparency. Even if an organization doesn't have a formal transparency policy, ensuring that all stakeholders--including taxpayers, interest groups, and elected officials--can read government documents should help in communicating facts and building a shared understanding across the community. It might also engender trust and increase citizen participation.


(1.) Rudolf Flesch, The Art of Readable Writing: With the Flesch Readability Formula, 1948.

(2.) Government Accounting Standards Board Statement No. 34, Basic Financial Statements --and Management's Discussion and Analysis --for State and Local Governments.

VINCENT REITANO is a public finance associate in GFOA's Research and Consulting Center in Chicago, Illinois. ALEXIS STEGER is senior budget analyst for the City of Dubuque, Iowa.
COPYRIGHT 2017 Government Finance Officers Association
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:SOLUTIONS
Author:Reitano, Vincent; Steger, Alexis
Publication:Government Finance Review
Geographic Code:1USA
Date:Aug 1, 2017
Previous Article:CalPERS: a plan for reigning in risk.
Next Article:The state of employer-sponsored health care and how Washington could change it.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters