CD-ROM tax research.
One CD-ROM can store about 660 megabytes of information (660 million bytes, or key strokes), representing about 275,000 pages of text. To put this in perspective, one CD-ROM can easily replace more than 100 feet of tax reference shelf space. The advantages of CD-ROM include massive data storage, indestructibility, mixed media formats, inexpensive drives, ease of use, low duplication cost and unlimited usage.(3)
CD-ROMs typically include electronic versions of vendor tax services, full text of the Code and regulations, revenue rulings and procedures from 1954 on, and IRS publications, among others. Further, depending on the vendor, the full text of tax cases, letter rulings and other tax authority may be available. Current CD-ROM tax vendors include (alphabetically): CCH Inc., Kleinrock Publishing, Matthew Bender & Company, Practitioners Publishing Company, Research Institute of America (RIA), Tax Analysts, Tax Management and West Publishing.
Part I of this article focuses on tips, tricks and traps in using CD-ROM databases; it is not acceptable to merely know how to search a database--a researcher must also develop the expertise to avoid pitfalls and establish techniques to ensure that the right answer is obtained in the least amount of time. Part II, in a future issue of The Tax Adviser, will address analysis of search results and search modification techniques.
CD-ROM vs. Hard Copy Tax Research
CD-ROM is not a substitute for printed materials, although it clearly may replace some (and eventually, almost all) hard copy.(4) When used without the corresponding hard copy counterpart, CD-ROM materials represent a significant savings in library costs (in terms of space saved and lack of need to file paper updates); the major vendors usually provide new discs each month.
The most common approach to CD-ROM research is to start with a client issue(s) and usually proceed to extract certain "keywords." For this purpose, keywords are words typically used in a tax service index or in other tax materials to access relevant materials for answering questions. This hard copy method, however, has at least three critical limitations. The results of an index search depend on (1) the finite number of words that the publisher has chosen to index; (2) the extent to which all occurrences of a word are indexed in the text; and (3) the ability to indicate word combinations and their proximity to each other in constructing the index. In electronic research, the use of Booleanlogic(5) proximity connectors provides a powerful tool for connecting indexed words to create a search request.
Tax research with CD-ROMs uses "hypertext" links. Hypertext is a tool by which certain words, phrases, titles or citations referenced in one piece of text may be linked to their full-text counterpart in another part of a text, which could be located in an entirely different volume. By linking these references, they can be accessed immediately to transfer the user to the full text of the referenced item.
For instance, most CD-ROM searches for a Code section yield not just that section, but easily accessible cross-reference links to, for example, the related regulations, definitions of certain words in a tax glossary, the corresponding portions of a multivolume tax service in which the section is discussed, or other Code sections cited within the searched section. After using a hypertext link to get to a second level, a user can return to the original document or explore new hypertext links in the second-level document to get to third- and fourth-level (or higher) hypertext-linked documents. Most CD-ROM tax products permit a user to backtrack through these levels and may even provide the means for any of them to be reviewed again at a later time.
Mechanics of Computerized Searching
Computerized database searches operate by taking the user's keyword(s) and searching the selected database index for occurrences of the word in the specific documents (or document segments) previously selected. This is similar to the index one might find in a tax service or treatise, except that almost every single word (i.e., potential keyword) is indexed (sometimes, "noise" words such as "and," "the," "are," etc. are not indexed).
Once the location of each keyword is identified, the search software begins to limit the relevant "hits" in a database search to those instances in which each set of keywords meets the proximity conditions (the restrictions of the keyword connector(s), explained below) included in the search request.(6)
Planning a Search Strategy
When a research problem arises, the first task is to define the problem to determine the focus of the research. When formulating a research problem, the researcher should identify it as a compliance problem or a planning problem. Compliance problems are usually easier to research because they generally occur after the fact (i.e., the facts have already been determined and usually are not subject to change). In contrast, researching a planning problem requires knowledge of both known facts (e.g., corporate taxpayer, fiscal year) and the range of possible facts. Planning problems thus involve before-the-facts research, and the facts controlling the result may be created or altered to obtain the desired tax result.
* Understand the facts
To adequately research a tax problem, one must have a good understanding of the facts.(7) The absence of such knowledge may lead to making certain assumptions about what the facts are. To the extent such assumptions are critical to the results, they may need to be verified later.
Review of the facts generates keywords and relationships to be written down, a tactic particularly useful when one is new to searching; nevertheless, even experienced researchers use it when appropriate. Diagramming the relationships and transactions, especially when the facts are complex, can also be fruitful in understanding the facts and developing a search strategy.
* Evaluate the issue
The research question(s) previously formulated need to be refined to construct search requests. The first question is, what is the real issue? The problem should be observed from many different angles; insight may be gained by viewing the problem from the client's or the IRS's perspective. The most relevant facts should be distilled and assessed, and any need for additional information should be noted.
Of course, knowledge of the subject matter and its jargon and expertise in the specific field aid greatly when doing a search. The idea is to be able to recognize the answer if it turns up. It is better not to do a search than to do a bad one. One may have to review the research subject area before commencing a search. Printed materials (e.g., tax research services, tax articles or treatises) may be the easiest to review, but the electronic sources on CD-ROMs may be more readily accessible. Such reviews are also excellent vehicles for discovering related issues.
This research should be accompanied by the noting of concepts and keywords. At the same time, it should be decided whether the research area is within the researcher's expertise or whether the matter should be referred to another practitioner within or outside of the firm. Alternatively, another practitioner (in or out of the firm) might review the conclusions reached based on the research. The risks (i.e., potential liability) of not compensating for a lack of knowledge can far outweigh the cost of using a specialist when appropriate.
* Construct keyword searches
To produce keyword terms for searching a database, divide the concepts into subconcepts, then extract the most relevant keywords from this set. Some researchers use a highly structured (methodical) approach for this purpose, while others rely on an intuitive ("hop around") approach.
Notwithstanding these ad hoc techniques, there are three main approaches(8) to constructing electronic research requests:
1. Using building blocks. This involves breaking down concepts into their logical groupings, and creating relationships between the groups using OR, AND, or NOT connectors. Synonyms or equivalent terms are then incorporated into the evolving search request.
2. Growing pearls. This concerns aiming for the perfect result. The researcher envisions the on-target conclusion sought--i.e., the pearl to be grown--and builds outward in the searches from these leads to support the desired conclusion. For instance, a restricted keyword search may lead to more expansive results in a follow-up search based on the prior results (e.g., a cited case in the initial search can be used as a search term that leads to other related references, which can also be searched).
3. Using successive fractions. The researcher starts very broadly, then modifies the initial search terms by using what he has learned from the results. This approach casts a wide net by pursuing every possible reference to the topic. The results are then interpreted and followed by focusing on the more specific terms, by adding successively more restrictive research terms to the original search request.
Of course, the results of these research approaches may suggest an alternative or expanded methodology incorporating one or more of these strategies; for example, the building block approach may be modified to change the connectors of several terms from AND to OR to yield more results (addressed later). Similarly, the pearl growing method can be used to sample the database and then create a follow-up search request constructed from the most promising results of the initial request. Finally, if one approach does not work, one of the others may.
Some researchers use a heuristic approach to develop search requests. The set of search terms most relevant to the research question is tried first. If that search is unproductive, a broad set of search terms that might be useful is tried. If that does not work, a third set of search terms attempts to use the more remotely relevant possibilities.
The CD-ROM databases selected for searching are a key component of a search strategy and may even influence the structuring of a search request. Unless specific, relevant authority is sought (e.g., a Code section), most likely, a tax service database will be used first.
However, the selection of a CD-ROM database may be affected when the researcher thinks he knows the answer and merely seeks confirming authority. In such case, if the researcher knows the authority sought (e.g., a ruling, case, regulation, etc. , this is the database to start with.
In selecting any database, the researcher should find the highest level of authority related to the research problem. Accordingly, to the extent the researcher does not know the form such authority will take, he should choose the database of highest authority (e.g., final regulations versus district court cases).
As with hard copy tax research, a Code section approach is valuable if the researcher is aware of the principal Code section(s) involved. This may be as simple as reviewing the Code section at the start of the research. Such a review familiarizes the researcher with the problem area and provides relevant terminology to use in formulating a search request.
Before actually using a CD-ROM database, the researcher should be familiar with its contents; the database should be browsed to encourage familiarity. In most instances, the database contents are most easily reviewed by accessing a table of contents (if any). The scope of the database should be ascertained: the period covered (e.g., all Tax Court cases, or only those from 1990 on), whether the full text of all documents is included, whether documents have been excluded (e.g., withdrawn revenue rulings), and whether all documents have been indexed. The user manual may be instructive in this regard.
Some vendors' discs include their single-volume tax handbook with direct hypertext links to the references that are also on the disc. In many instances, this is a very useful starting point, because it provides a brief overview of what is available. In addition, this type of limited search may actually yield a preliminary conclusion.
Search Terms and Connectors
Keyword searches depend on the nature of the CD-ROM database and the search engine employed. Historically, computerized databases were only searchable by designated keywords from a list constructed by the database administrators and a hit occurred when the keyword was found in a document's keyword field. Subsequently, databases began to permit keyword searches of document titles, whether or not contained in the keyword field; eventually, the abstract of each document was searchable.(9) Presently, full-text searches are accomplished by the use of keywords and the specific search engine's syntax.(10)
* Searching with wildcards
CD-ROM search software uses different wildcard characters (one or more special characters that can stand in for other character(s) in a keyword); this is analogous to the use of the "joker" in card games. CD-ROM-based searches use fairly standard wildcards. For instance, the software may permit an asterisk (*) anywhere in a term to find all derivations (e.g., tax(*) yields hits for tax, taxes, taxed, taxation, taxpayer, etc. , or a question mark (?) to replace a character anywhere in a word (e.g., taxe? finds taxes or taxed, but not tax or taxation). The wildcards may be used as a prefix, suffix or anywhere else in a term. However, while the asterisk can represent any number of characters in the word(s) found, one or more question marks in the term stand only for words with a single character in the exact location of each question mark in the search term.
Finally, both wildcards can be used in the same term and at the same time (e.g., ?ax* will retrieve any word in the database starting with any single letter followed by "ax" and then any number of other characters in the term--e.g., tax, taxes, taxed, taxation; or fax, sax, saxophone, Saxon. Thus, use of wildcards can produce documents totally unrelated to the objective of a search request.
Creating Keyword Search Terms
The next step in building a keyword search request is to: 1. Divide the problem into its subparts. 2. Consider the hierarchy of how the problem's components relate to each other. 3. Identify words and phrases that express the key concepts. 4. Evaluate synonyms for the subconcepts. 5. If there are only one or two keywords, use them and build on the results to revise the search. 6. Consider the variations, possibilities and related terms.
Each CD-ROM product has specific rules or syntax as to how its search engine operates. Some common techniques follow:
 Any phrase (i.e., sequence of words) to be searched must be enclosed in quotation marks (e.g., "net operating loss"); otherwise, the spaces between words are treated as ANDs (e.g., net operating loss will be searched as net AND operating AND loss). Similarly, when connectors (e.g., AND, OR, NOT) are contained in the search phrase, the search software will require the phrase to be included in quotes (e.g., "research and development," "trade or business").
 Many potential keywords are in a transition state to a single-word contraction (e.g., tax payer, yearend) or acronym (e.g., NOL). Thus, any search request should contain any known contracted spellings or acronyms.
 Similarly, consider that a revenue ruling from the 1950s might phrase a term a little differently than would be used today (e.g., deferred-compensation arrangement).
 A related problem occurs when terms have alternative meanings and/or spellings depending on whether they are nouns or verbs (e.g., carry-forward, carryforward, carry forward).
 Another problem is that certain words are commonly misspelled in tax documents. Thus, consider including misspellings or wildcards.
 Any word about which the researcher is unsure of the spelling should be looked up in a hard-cover or electronic dictionary. Many words have alternative spellings (e.g., theater, theatre) that should be included in a search request.
 Dictionaries and thesauruses should be reviewed to expand search terms to include synonyms and other related words that may exist in the database; however, any synonyms that have alternative meanings and are likely to retrieve irrelevant material should be avoided.
 A CD-ROM researcher must determine whether the software searches for variations on the root search word. For example, with some CD-ROM software, a singular term does not automatically search for the plural. Consequently, it is crucial when constructing a search request to know the software's plural syntax or use wildcards when appropriate (e.g., a search request for "net operating loss" should be phrased "net operating loss" to pick up "net operating losses").
* Using keyword connectors
In using connectors, keep in mind that sometimes people mean "or" when they say "and." For example, a search request might state Canadian import AND export, but the researcher might only want information on Canadian import; thus, the connector should have been OR.(11) In addition, the use of NOT can also be important. For example, the search request IBM NOT Apple will avoid referring to documents that contain both IBM and Apple; likewise, IRA NOT Army refers one to references on individual retirement accounts, but not the Irish Republican Army.
In a similar vein, some search engines include the connector OR. The goal of this connector is to obtain search results containing either of two terms, but not both of them (e.g., IBM OR Apple would yield documents containing either).
Proximity connectors are used to find the use of two words within a certain number of words of each other, or in the same sentence, paragraph or segment of a document (e.g., "consolidated" within 15 words of "alternative minimum tax"). Such requests must be used carefully so as not to be overbroad or too narrow.
* Order of processing connectors
It is often helpful to be able to control the order in which operators are evaluated. CD-ROM software usually permits the use of parentheses to control this function. Otherwise, the software may simply read the search request from left to right and evaluate the operators in the order in which they appear in the sentence. For instance, the search request ACRS OR depreciation AND recapture will cause the program to search for all the documents in a database that contain "ACRS" or "depreciation," and "recapture." The final set of results will contain some documents with "ACRS" and "recapture," some will contain "depreciation" and "recapture" and some will contain all three words.
On the other hand, use of ACRS OR (depreciation AND recapture) yields all the documents that contain either "ACRS" or both "depreciation" and "recapture. " The program first evaluates the words inside the parentheses as one search term.
The usefulness of parentheses to a researcher is greater than simply ordering the search process; thus, they should be used in search requests with any degree of complexity.
Part I of this article has discussed formulating the search request. Part 11 will explore evaluating, refining and modifying the search request following the initial results and address when to conclude the research, plus other techniques.
RELATED ARTICLE: Tax Information Phone Service
For members of the Tax Section and Private Companies Practice Section
On Nov. 15, 1995, the AICPA will commence a two-year pilot program to provide tax information phone service (TIPS) to members of the Tax Section and the Private Companies Practice Section (PCPS). If successful, the program will then be expanded to include all AICPA members, with a discounted rate for these section members.
For many years, members have identified tax information phone service as one of the benefits they would most like the AICPA to provide. Because of the breadth, depth and seasonal volume of potential tax questions, providing a quality service at a reasonable price did not appear to be possible. However, with technology changes in tax research and with some innovative ideas from a working group of the Tax Section and PCPS, a workable program was developed and unanimously approved by the Tax Section Executive Committee, the Private Companies Practice Executive Committee and the AICPA Board of Directors.
The program will feature two types of service. The first will be provided by a professional staff of up to 13, who will be selected for tax technical and practice experience, research skills and communications ability. They will help with framing issues, locating general references and basic research, using a state of the art electronic research facility that will be located in the Institute's Jersey City, N.J. offices. No written responses will be provided. The second type of service will take over when the caller needs more than the staff can or should provide, such as when a member needs a written opinion in an area involving the sophisticated interplay of different parts of the Internal Revenue Code. In this situation, the staff will give the caller the names and phone numbers of three practitioners from the Tax Section or PCPS who have knowledge and experience in the area of needed assistance, and the caller will contact them to select an individual for a compensated consultation.
TIPS will greatly augment the resources available to members in providing tax services. On demand, and with little extra cost, users will have a knowledgeable and experienced staff person, fully equipped with reference material, and an organized network of tax professionals for consultations.
The TTPS program will be funded by the AICPA, the Tax Section and the PCPS and by user charges to cover costs. The user charges will be assessed by 900 service telephone charges or by major credit cards; the cost will be $2 for most of the year and $3 from January I to April 15. The TIPS service will operate on extended hours during busy season to serve members in all parts of the country, and may be available on Saturdays during busy season and on extended hours during the rest of the year, depending on demand.
Tax Section and PCPS members have already received registration forms for the referral network, and additional information will be sent out shortly to tell members how to use the TIPS program for answers to tax questions.
If you are not a member and would like to use this service, please call (202) 434-9270 to join the Tax Section or (800) 272-3476 to join the PCPS. If you have questions about the program, please call Bill Stromsem at (202) 434-9227.
Editor's note: Mr. Black is a member of the AICPA Tax Division Tax Computer Applications Committee. This article is adapted from the author's AICPA self-study CPE course, "Tax Research Using CD-ROM Services" (which includes actual tax research CD-ROMs). (1) See Khani and Zarowin, "A Journal Survey: Counting on Technology," 179 Journal of Accountancy 59 (May 1995); "Technology: Spotlight on Tax Software," 178 Journal of Accountancy 49 (Oct. 1994). (2) Hoglund and Hicks, "Computer Usage and Tax Software in a Tax Practice: AICPA Tax Division Survey Results," 25 The Tax Adviser 46 (July 1994). (3) See Danzinger, "CD-ROM: Is the Future Now?," 14 Bulletin of the American Society for Information Science 19 (Oct./Nov. 19871.) (4) This is already the case for some publications; see, e.g., Holzberg, "Making a Case for CD-ROM," 9 CD-ROM World 60 (Feb. 1994), concerning how the Bureau of National Affairs (Tax Management) is phasing out publication of the printed version of its Tax Practice Series. Similarly, RIA announced that its state and local tax reporters will be offered only through RIA's OnPoint System of CD-ROMs. See Scott, "RIA Scraps Paper Version of Product," 9 Accounting Today 1 (4/24/95). Moreover, in 1993, CCH Inc. indicated that within five years, only a very small portion of its products would be in both CD-ROM and print. See Scott, "Tax Software Vendors Battle to Capture CD-ROM Market," 7 Accounting Today 18 (11/1/93). (5) Boolean logic is derived from George Boole, a mid-1800s mathematician, who developed a form of algebra dealing with logical (i.e., true/false) values rather than calculating numeric values. (6) In this article, keyword connectors are capitalized. However, most CD-ROM databases generally disregard any differences in a search request between capitalized and uncapitalized letters. (7) For an excellent discussion of the importance of knowing the facts, see Gardner and Stewart, "The Critical Role of Facts," Tax Research Techniques [formerly Sommerfeld and Streuling, AICPA Study No. 5] [AICPA, 1993], p. 11. (8) See Hawkins and Wagers, "Online Bibliographic Search Strategy Development," 6 Online 12 (May 1982). (9) The fourth stage of the evolution in searching has not yet been achieved, but we are moving into an era when natural language searches will be available. See, e.g., "Headline: Cascade Systems' Mediasphere Database Engine Targets Publishers with Promise of Easy Data Searchers," Computergram International (11/11/94). Natural language scarches avoid the need to use the specific keyword-search syntax required by each search engine, and use conversational language instead. (10) The next stage of evolution may use software that will rely not only on natural language, but also will aim to combine one or more of the following: (1) selecting the most relevant databases to search, (2) evaluating the potential value of each document retrieved and (3) recommending solutions on a ranked basis that incorporates both the strength of the tax authority and the risk of an IRS audit on the issue researched. Obviously, such artificial intelligence ability would not be limited to these results. See, e.g., Black, Carroll and Rex, "Expert Systems: A New Tool to Enhance a Tax Practice," 20 The Tax Adviser 13 (Jan. 1990). (11) See Ojala, "Knowing Your `ands' from Your `ors'; Tried and True Methods Can Help You Get a Jump on the Searching Learning Curve," 8 Link-Up 8 (Nov. 1991).
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||part 1|
|Author:||Black, Robert L.|
|Publication:||The Tax Adviser|
|Date:||Oct 1, 1995|
|Previous Article:||Avoiding constructive dividends when a corporation purchases stock under a buy-sell agreement.|
|Next Article:||S corporation current developments: S corporation eligibility, elections and terminations; operations; reorganizations; and proposed legislative...|