Using a Google Search Algorithm to Make Sense of the Bible

by Justin O. Barber

Human beings have a difficult time making sense of large corpora such as the Bible. No one can possibly maintain in their awareness at a single point in time all the different themes and points of view expressed therein. How should a reader make sense of it all? Which texts or statements are most important? What should a reader do when conflicting viewpoints arise? An algorithm used by Google may help answer these questions (and simultaneously provide a sort of CliffsNotes to this sometimes daunting corpus!).

This plot shows the most important sentences when TextRank is run against the New Testament as a whole. Of course, people usually read one text at a time. Still, for communities that tend to read the entire collection as a single work, the above plot might prove informative.

This plot shows the most important sentences when TextRank is run against the New Testament as a whole. Of course, people usually read one text at a time. Still, for communities that tend to read the entire collection as a single work, the above plot might prove informative.

In 1996, Google co-founders Sergey Brin and Lawrence Page developed an algorithm called PageRank to determine the importance of a web page. In essence, it determines the importance of a web page based upon “votes”—in the form of links—it receives from other web pages. When page A links to page B, it in effect casts a vote for page B. A variation on this algorithm, called TextRank, has proved useful for summarizing texts. It analyzes sentences and words instead of web pages and links.

In the images that follow, I will use the TextRank algorithm to rank the importance of each sentence in the (Greek) New Testament. (I will leave aside the Hebrew Bible for now, although I have worked with that corpus as well.) The pink labels and data points indicate the most important sentences in each book. The blue lines through the middle of these data points illustrate the average importance of the sentences in each part of a book. The zenith of that blue line, then, roughly corresponds to the most important part of a book, whereas the nadir correlates with the least important.

I want to limit my comments to a couple observations pertaining to the plot above. First, perhaps the most explicit statement of purpose of any of the New Testament texts shows up among the most important sentences in the Gospel of John:

. . . ταῦτα δὲ γέγραπται ἵνα πιστεύ[σ]ητε ὅτι Ἰησοῦς ἐστιν ὁ χριστὸς ὁ υἱὸς τοῦ θεοῦ, καὶ ἵνα πιστεύοντες ζωὴν ἔχητε ἐν τῷ ὀνόματι αὐτοῦ. (20:30-31)
. . . Now these things have been written in order that you might believe that Jesus is the Christ, the son of God, and in order that—by believing—you might have life in his name. (Translation my own.)

That seems like a success. Although this statement of purpose is explicit enough that most readers will not need help identifying it, it would have presented a major problem for this algorithm had it shown up among the less important sentences. (That was a close one! Google may be on to something here.)

Second, the important sentences for many of the non-narrative texts have received a lot of discussion for their importance within the thought of their respective authors. For example, both Galatians 2:15-16 and Romans 3:21-22 show up among the most important sentences for their respective texts. For those of you who do not know, those verses have generated a tremendous amount of literature over a particular Greek construction. The question basically comes down to the following: does Paul understand a person to be justified through faith in Christ or through the faith of Christ (that is, Christ’s faith)? (Thanks Richard B. Hays!)

By way of one final example, consider the important sentences in the Epistle of James. Readers of James often have difficulty identifying a cogent structure for the epistle; it appears to be a collection of admonitions. In the face of this lack of structure, the TextRank algorithm might offer some aid. The most important sentences appear in 2:14a and 2:18.


As you can see from the blue line, the most important part of the letter (again, according to our algorithm) also corresponds to section of the letter with these verses. The sentence with the highest TextRank score is this one:

Τί ὄφελος, ἀδελφοί μου, ἐὰν πίστιν λέγῃ τις ἔχειν ἔργα δὲ μὴ ἔχῃ; (2:14a)

What use is it, my brothers, if someone claims to have faith but does not have actions? (Translation my own.)

This section, of course, contrasts with Paul’s view of faith, which maintains that a person is justified by faith apart from actions (compare Romans 3:28; 4:1, 6). In fact, it prompted Martin Luther at one point to claim that James was an epistle of straw (an assertion he elucidates here). Is it possible that this dissimilarity is the very heart of the Epistle of James?

My point here is that computers help us encounter texts more fully than we can encounter them when left to our own devices. They can help us take in huge swaths of data all at once, and they can help us analyze that data in sophisticated ways. They are, in short, excellent conversation partners that come to texts with presuppositions of a kind entirely different from our own. No one has taught them to read texts from a liberal or conservative standpoint (although one theoretically could) or any other point of view. They offer us the opportunity to read with something wholly other.

You can find a more detailed version of this post here, where I elucidate some of my methodology.