Printer Friendly

SYNC OR SWIM: Captions.

There are some questions that we don't like to ask ourselves--some topics that we don't like to broach: things we don't like to look more closely at, because of how they shift the lens on the way we view ourselves. No-one likes to think of themselves as the bad guy... but sometimes we are. The Time's Up and #MeToo movements have forced us to confront not only how entrenched discrimination and harassment are in society, but also how we fit into that dynamic. Have we always been above reproach? How do we feel about the times when we weren't? Respect Victoria's 'Respect Women: Call It Out' campaign focuses on the idea that not being a perpetrator of sexual harassment isn't enough--that inaction can also be harmful. Those lies we tell ourselves that we don't have to step up, that we don't have to act, aren't good enough. That's not a message everyone wants to hear.

This April's Australia-wide vegan protests' provoked a backlash for the same reason. Humans have a complex relationship with meat, one that we don't like to examine in too much detail (our--at the time of writing, current--Prime Minister, Scott Morrison was so outraged that he referred to the protesters as 'green-collared criminals', (2) with all of the overzealousness of a school bully who has just realised that your name rhymes with a bodily expulsion). When we are forced to look closely at why we eat meat, it's hard to reconcile our view of ourselves, our relationships with animals and the death, waste and cruelty involved in the industry. The protests and issues currently under the spotlight force us to look at a lot more of the grey in our lives--that is, they make us confront our own guilt and consider the difference between 'not bad' and 'good'.

The same is true of accessibility. 'How accessible is your content?' is, at both a personal and an organisational level, a very jarring question. And, if your answer is anything other than 'very', how do you reconcile that? Where is the line? What is 'accessible enough'? And what does your response to any of this say about you?

Technology is shifting to become more accessible. There has been an increase in capability and ease, but design and automation have played a big role here too. As I write this article in Microsoft Word, I can run an accessibility check on the document to assess whether people with disabilities would likely have difficulty reading it. Placing an image in the document, for instance, changes how accessible the document is. Microsoft Office's support page states that 'if the image or object is not inline, it may be difficult for screen reader users to interact with the object', and that 'it may also be difficult to know where the object is relative to the text'. (3) The Accessibility Checker will also suggest that I include 'alt' text to describe the image for situations in which the image cannot be visually identified by a person (or screen reader).

This sort of automation for inclusivity is becoming more and more common across websites and applications. We've got responsive web design, inclusively designed fonts, the aforementioned screen readers and a range of other assistive technology, but there is still a big human element in all of these. And that is where this gets hairy, because it's at this point that we have to make the decision on how much accessibility is worth to us. Of course, we would never tell ourselves that we are okay with knowingly excluding people, much in the same way that we separate ourselves from our decisions about what we eat and where it comes from. It's hard to put ourselves in a position where, morally, we aren't entirely steady.

One of the most difficult (or expensive) areas to ensure accessibility is in video. It's becoming easier to find accessible forms of media; but, at the same time, the media that we create exponentially expands the problem. So how do you make the content you create accessible? One method is captioning.

What is it?

Captioning, not to be confused with subtitling, is the addition of text on top of moving images. This text is descriptive of the audio, but, because of the nuanced ways in which we use audio to tell stories, these descriptions go beyond just what is said: they can include references to music and descriptions of sounds, so that a viewer with hearing loss is able to get a comparable viewing experience to anybody else's. Subtitling, on the other hand, is a translation of spoken audio (and, sometimes, written text) into the viewer's native language. In both cases, the text is anchored to parts of the video time code, so that it appears on screen at a particular time in relation to what it is referring to.

Captioning can be either 'open' or 'closed'. Closed captions can be turned on or off when needed; they are a separate component that works if a media player or device supports it. Open captions, on the other hand, are hard-coded into the media and cannot be turned off; they're a part of the video file. (4) Both types of captioning have their positives depending on the situation and who the audience for the captions is.

Why use it?

While society has come a long way in how we view difference, it's still primarily built around what we can see and categorise. While that in itself is problematic, one of the issues that comes with it is that accessibility only tends to be prioritised as a reactive measure when people's opportunities to engage with culture and services have been hampered or blocked (and they are forced to either bring awareness to this, adapt or be excluded (5)). In the case of captions, that's not as simple as the level or nature of hearing loss. While there is a broad spectrum among people with such conditions--categories include Deaf, deaf and hard of hearing, distinctions that are worth taking some time to investigate yourself (6)--these are not the only people for whom the decision to not caption can be a barrier. Closed captioning can also be beneficial to those with autism and intellectual developmental disorders, as well as those who are not native speakers of the language spoken. (7) Further, having a transcript on screen can assist children in learning and understanding new words and solidifying correct spelling. (8) And if you have ever been to a gym, bar or restaurant, you yourself would have experienced the difference closed captioning can make to engagement and entry. In scenarios in which audio is muted, whether captions are on or off can play a big role in your choices as an audience member. How often, at the gym, have you engaged with the TV displaying captions rather than the one that doesn't (regardless of your genre preferences)?

And, of course, in case doing what's right isn't its own reward, captioning also results in searchability and marketing gains--what is a barrier for access is also a barrier for entry. Social media platforms often utilise an opt-in approach for sound, which means that, if a viewer isn't in a position where they can have sound playing, they might not engage with that particular piece of content. (9)

A captioned video lessens that entry barrier--YouTube's auto-previewing feature utilises closed captions too, so that, even if a user hasn't selected your video, they could still be watching and engaging with it (and those stats still count). The captions also factor into the searchability of your content, which further widens your reach.

How does it work?

As a concept, captioning is really simple--it's just blocks of text with specific time-code markers as to where they start and end. Live captioning is more complicated, as it requires a stenographer who uses a special keyboard to write out words as they are spoken--they write on a stenotype machine, which is designed and used very differently to a computer's Qwerty keyboard, and they use shorthand (which, in the case of live TV captioning, is translated into regular text for the on-screen captions). (10) In a world of cost-cutting, it was inevitable that this would eventually be automated using some sort of speech-to-text technology (a concept explored on the amazing British comedy series W1A (11)), but it's not the same.

Live captioning is an art. If you've watched someone do it, the act of writing on a stenotype machine looks more like playing a piano than it does typing. While this is in itself an important service and fascinating area to explore (and you should check out Stanley Sakai's blog on the subject (12) if you'd like to learn more), what we're looking at here is more of the after-the-fact captioning--that is, the captioning that occurs in post-production to package a product up. The other closed captions you see on TV or YouTube tend to involve a transcription being synced.

In recent years, advancements in speech-to-text capabilities have made captioning on a smaller scale much more possible--YouTube's current speech-to-text capabilities are particularly impressive. But (and it is a very big but) they are not perfect, and simply outputting videos with 'probably close enough' captions is, in itself, a less-than-ideal approach to accessibility.

For hearing people, garbled captions are somewhere in the range of amusing to distracting--but, if those captions are your only means of discerning what is being said, those transcription errors are something else entirely. In terms of accessibility, what you deem to be 'good enough' says a lot about you and your organisation. Where do you draw the line on the extent to which it's acceptable to exclude people? (13)


YouTube's auto-generated captions aren't perfect, but they are an amazing starting point for making your content accessible. It doesn't matter what privacy setting your video is on either--YouTube will auto-generate captions for your video whether it is public, private or unlisted. Their caption editor is simple to use, and allows you to download a text file that you can then use wherever else you may need captions (spoiler alert: it's everywhere).

When you upload a video (as long as it's under fifteen minutes), YouTube will auto-generate captions for it--they will be published as 'English (Automatic)' (Figure 1). Select this to begin editing-the lines are greyed out until you select EDIT (Figure 2).

You'll notice that the text is broken into pieces and that each line has two time codes--one for when the caption first appears on screen, and one for when it stops (Figure 3). You can move through the text by pressing play and checking how the auto-generated text lines up with the audio, correcting mistakes and inserting some punctuation along the way. When you come to a mistake, select the line and you will be able to start editing (Figure 4).

As you make corrections, they are automatically changed on the caption display. Spelling mistakes will also be highlighted with a red wiggly underline so as to help you avoid as many errors as possible (Figure 5). If you find yourself with a line that is too long and thus potentially too hard to read, you can make corrections there too. Take text from one line and add it to the other (Figure 6). Otherwise, moving your mouse over the bookends will allow you to click and drag the 'in' and 'out' points for those captions (Figure 7).

It's important to make sure that there is a way to distinguish who's speaking (if there are multiple voices). In the example in Figure 7, the line 'I'd like to talk to the inspector' is spoken by the Hot Fuzz (Edgar Wright, 2007) character Nicholas Angel (Simon Pegg). Simultaneously press the 'Shift' and 'Enter' keys to push the start of that sentence to a new line, and, at the beginning, you can write the character's name (Figure 8).

The process of captioning will probably take you about twice as long as the length of the video--I find it takes me about twenty minutes to caption a ten-minute video. But undertaking this process provides you with a great opportunity to take one final pass on your piece--and because the context you are watching it in is so different, you are more likely to find mistakes here that you've overlooked while editing (Figure 9). When you finish this process, you can select PUBLISH EDITS to ensure that your updated captions are visible to your audience (Figure 10).

Once you've published your edits, if you reselect your captions, you can go back in and, under the ACTIONS menu, you'll have the option to download the subtitle file (Figure 11). Once you've downloaded this file, you can hard-code the captions into your video for sharing on social media or uploading to another video platform, or even just use it as a transcript for future reference. It'll also work on smart TVs and on supported players--you usually just have to ensure that the video and captions have the same filename (just with different extensions).

Kevin Lavery was an Art and Media teacher in Melbourne for nearly ten years before packing up for life in Brisbane. He is now the training & liaison officer for TAFE Queensland's eLearning Services. YouTube's auto-generated captions have offered up the following transcriptions of his standard 'Hi I'm Kev Lavery' intro: 'Hi I'm Kelly Marie': Hi I'm Kip Larry: Hi I'm Kevlar Brae: and Hi I'm kill everyone.


(1) See 'Vegan Protesters Halt Traffic, Block Abattoirs in National Protests', The New Daily, 8 April 2019, <>, accessed 24 April 2019.

(2) See Latika Bourke, 'PM Lashes "Un-Australian" Vegan Protesters', The Canberra Times, 8 April 2019, <>, accessed 24 April 2019.

(3) 'Rules for the Accessibility Checker', Microsoft Office Support website, <>, accessed 24 April 2019.

(4) Gemma Matheson, 'The Difference Between Open and Closed Captions', Access Innovation Media blog, <>, accessed 24 April 2019.

(5) Northwest Health Foundation. 'Confronting Our Ableism', Striving for Disability Equity, 17 May 2016, <>, accessed 24 April 2019.

(6) See 'Terminology for Deafness', Aussie Deaf Kids website, <>, accessed 24 April 2019.

(7) Lydia Callis, 'Stop Making Excuses and Start Captioning Your Videos', HuffPost, 21 April 2015, <>, accessed 24 April 2019.

(8) '7 Unexpected Beneficiaries of Captioning', Access Innovation Media blog, <>, accessed 24 April 2019.

(9) 'The Complete Guide to Closed Captioning', Access Innovation Media blog, <>, accessed 24 April 2019.

(10) 'Stan's Quick and Dirty: How Stenography Works', YouTube, 26 April 2015. <>, accessed 24 April 2019.

(11) See Brendan O'Regan. 'BBC's Satirical Offering W1A Is Just A1', The Irish Catholic, 12 October 2017, <>, accessed 24 April 2019.

(12) See The Stanographer, <>, accessed 24 April 2019.

(13) While it's a little beyond the scope of this article, there is another really interesting aspect of captioning in relation to censorship and ensuring captions are a true representation of what has transpired. As an example, it's worth having a look into the furore surrounding Netflix's decision to rather 'creatively' revise their Queer Eye captions. See Ace Ratcliff, 'I Rely on Closed Captions to Enjoy a Show and I Don't Appreciate Netflix's Way of Censoring Them', SELF, 10 July 2018, <>, accessed 24 April 2019.
COPYRIGHT 2019 Australian Teachers of Media
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2019 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Lavery, Kevin
Publication:Screen Education
Geographic Code:1USA
Date:Jun 1, 2019
Previous Article:First Steps SWINBURNE UNIVERSITY'S 2018 GRADUATE SHORT FILMS: The 2018 Swinburne University of Technology graduate screening showcased a number of...
Next Article:Creative Devices: JOEL AARONS provides tips on creating composite artwork in Adobe Photoshop.

Terms of use | Privacy policy | Copyright © 2022 Farlex, Inc. | Feedback | For webmasters |