Looking together: word and image interplay in picture books

May 19, 2020 by s.nagele

Picture books are rich resources for language development as well as visual literacy development, regardless of the age of your students. In a previous post we discussed some of the key elements of visual storytelling. Now we turn to the written text and its relationship with the images in the books. Please note, in this post we are focusing on the written text in contrast to the running commentary, oral storytelling or labelling that can also accompany the reading of picture books.

Three layers of text

The picture books in The Thinking Train series have three layers of text. The first one is a narrative, which tells us about what is happening and creates a context for the action: ‘It’s Friday. Charlie’s in the playground with the other children’. In this double page from The bully below, the narrative layer sets the context. The narrative layer may differ in each book, sometimes telling us more about the sequence of events or the feelings of the characters.

The second layer of text is the speech bubbles, which has a communicative function and makes the text more realistic and interactive. It also serves as a model for practising speech for the readers/pupils.

The third layer is the text present in the small orange activity box. On this page below the box says ‘Speak. Are the children happy about Lisa’s party?’ which directs pupils’ attention to specific details, here making them think about the feelings which are not described in the text. The questions in these activity boxes often highlight aspects of the story which can only be understood through a detailed understanding of the images and their relation to the text. This requires slow looking and discussion in the classroom, which leads to excellent learning opportunities plus the development of higher order thinking skills. Let’s turn to two possible ways the image and these texts can interact.

Double page from the reader The bully written by Herbert Puchta and Günter Gerngross. Illustrated by Lorenzo Sabbatini. © Helbling Languages

Word and image

The relationship between written text and images in the same page has been defined as ‘word and image interplay’ (Salisbury and Styles, 2012), or intersemiosis or intermodal relations (e.g. Painter et al., 2013), both of which sources describe the relationship between words and images in a larger context. Different analysts approach this interplay from different perspectives and in different degrees of specificity. Some studies map out detailed taxonomies to describe the relationship between word and image in picture books or comics (Agosto, 1999; Doonan, 1999; Nodelman, 1988; Schwarz, 1982; Styles and Arizpe, 2001). Famous classifications include Scott McCloud’s description of word-image combinations in his study Understanding Comics: The Invisible Art (1993). You can read a short overview of his proposed relationships here.

In general, the most practical way to look at word and image on a page is through an understanding of how each one tell us the same story, or, in other words, divergence and convergence (Painter et al., 2013). These reflections also tell us how much word and image implicate the same attitudes, and they tell us how well different meanings can be recovered directly from the text and how much what we read depends on the images.

The two most widely used approaches towards word and image interplay study if the images and the text tell us similar or different stories and how well they support each other’s meanings. This is called covergence/divergence (Painter et al., 2013) or ‘counterpoint’ (Nikolajeva and Scott, 2000). When text and image are convergent, they have the same meanings, when they are divergent, they tell us different stories. Similarly, ‘counterpoint’ informs us about the differences between the meanings words and images have and whether they contradict or support each other.

A more detailed analysis explains ‘equal and unequal’ relations (Martinec and Salway, 2013). Martinec and Salway further specify that when image and text are equal they can be either independent or complementary, and when they are unequal, then either the text or the image is subordinate to the other.

In what follows, we will show different examples from The Thinking Train books to illustrate these different possibilities and other examples where the text highlights meanings in the images without explicitly saying them.

The sick dragon

Double page from The sick dragon written by Herbert Puchta and Gavin Biggs. Illustrated by Andrea Alemanno. © Helbling Languages

In this double spread, the images take up the larger part of the pages, and the strong colours of the dragons (red, blue, purple and green) with the one dragon breathing fire direct our attention to the visual part of the narrative. When we read the text, we have information about the relationship between the dragons, something we would only be able to guess from the pictures. In this sense, the text and image complement each other in the overall meaning of the double page. The text does not become redundant or repetitive, but we could not say that it is independent from the images as they are in a strong relationship with each other.

→ Activity tip: Ask students to first guess who is who in the images. Then, read the text and as you are reading, ask them to specify the relationship between the dragons. The ‘Point’ activity in the orange box is a reminder of this activity which focuses on vocabulary practice and reinforces the relationship between text and images asking the students to create links with the images thanks to their understanding of the text.

The desert race

Double page from The desert race written by Herbert Puchta and Gavin Biggs. Illustrated by Lorenzo Sabbatini. © Helbling Languages

In this double page, the visual narrative is just as significant as in the previous example, with bright colours and beautiful animals. What we can see in this scene only focuses on one moment from the narrative. However, it is a key element of the story, the moment when Mayar achieves the goal of reaching the oasis and filling the jar with water. It allows us to become part of the moment and enter into empathy with Mayar. The lack of traditional perspective also underlines the fact that all of the natural elements in the story are as important and beautiful as each other. Although the text simply gives a sequence of events in the present simple tense, what we learn about the story from the text gives us information round the illustrated event, filling us in on details before before and after it happened. The text on the right side of the double points behind the scene, making the reader curious to turn the page and find out if there are really king and people waiting in the palace.

→ Activity tip: Ask students to to tell you what they can see in the pictures that the text does not say.

Football fury

Double page from Football fury written by Herbert Puchta and Gavin Biggs. Illustrated by Manuela Scarfò. © Helbling Languages

In this double page, the image focuses on a single moment in the narrated event, but it also tells us a lot more about the main character’s feelings. The illustration shows him kicking a ball, and we also learn from the text that when Jin kicks the ball, he is angry, and he discovers that he is a strong kicker, much stronger then other children. What we see in the image is his loneliness in a huge field (something that has been introduced in the previous pages) and the newly discovered skill. As we move our eyes from the left side to the right, we can follow the trajectory of the kicked ball and its landing in the net: the ball is big and with the net it takes up a quarter of the double page. The children in the background, although not clearly visible, are cheering. The image tells us a lot more here than the text alone. The text says that Jin is angry, but this anger is felt more powerfully in the context of the large and empty football pitch. The image also gives importance to the power of the kick and how it can deform the ball (and how anger can deform those who are subjected to it) and it tells us about the children’s reactions.

→ Activity tip: Ask students to tell you how the space in the picture makes them feel and how they think Jin is feeling.


These examples show us that image and text together always create a different meaning than the two separately. Sometimes the image tells us more, and sometimes the text specifies the image. In other cases, the text points forward or gives us more information about the background, the events or what comes next. However, the image has the power to tell us about emotions through the character’s gestures and details about their immediate environment. It also shows us more details about how people are related to each other and how the place where they are makes them feel and it has the power to fix our attention on one key event and get us to think about it.

As teachers, we can help students explore these new meanings through guided reading sessions and asking students to find specific information or talk about things they observe in the images. The activity boxes (Look, Point, Speak, Think and Act) will help teachers to access these meanings and use them in the classroom.

Reading list

