Documentation

IBF -- Tagging Specification

The Iapetos Bible Format (IBF) is a tagging specification for marking elements of Bible texts such as books, chapters, verses and paragraphs.

IBF is Semantic

An important consideration in the design of IBF was the decision to make IBF semantic. By "semantic" we mean tags mark what something is, not how you may choose to render it in your web page or application.

A classic example is the old formatting trick used by publishers of printed Bibles. They commonly italicize words in the translation that were not present in the source language, but added to help the translation's readability. The convention of denoting inserted words with italicized text is useful to readers and a good tagging specification should be able to communicate the same information.

A non-semantic approach would tag inserted words as "italicize," with the intention of italicizing the words in the final product. The semantic approach, used by IBF, tags these words as "inserted," thus allowing the editorial decision of how the words are displayed to be made later. You might decide to turn inserted words grey or render them with a smaller font instead of italicizing them. The point of a semantic approach is they are "inserted" words regardless of how you communicate that fact.

IBF Files

IBF files are UTF-8 encoded text files, with no limits on line lengths. Lines are terminated with the standard Linux new line character. ( ) The GBF format used DOS CR-LF pairs, which added considerable byte counts to large files. GBF also used a DOS code page which has been obsoleted by industry wide adoption of UTF-8 text encoding. Use of UTF-8 allows Bible texts from any natural language to be easily marked using IBF.

IBF tags have been selected, and have conventions on their use, so that IBF files can be efficiently converted into browser-ready HTML in a single pass. This allows IBF files to support heavily loaded web servers without the need for additional format conversions. The standard iapetos/cms tools do perform a compilation step from .ibf files to .pyc files as a convenience to further speed run-time processing, though this step is not required by the format. This compilation phase is also single-pass in its design, and creates HTML generation code which is also an efficient single-pass system.

As an historical note, we considered, and used, an XML file format for our own internal use, and considered OSIS as a potential format. OSIS is also XML. As many now realize, XML cannot be parsed efficiently, and the files themselves cannot be easily edited by humans directly. (Even though human readability was a specific original design feature of XML.)

XML has lost a certain amount of favor, being replaced by protocol buffers and/or .json files in many contexts, but neither of these common alternatives to XML were applicable to tagging Bible texts. It was the potential for efficient processing, and human readability, that took us back to a second look at GBF and why we eventually developed IBF as a better designed replacement.

IBF files are simple enough that human editors can realistically edit IBF files directly. The major reason why this works is the IBF tags are short, usally only 4 letters in length and they do not severely hamper readability. Using a powerful editor, such as Eclipse, with its extensive regular expression support, makes conversion into IBF format from other formats readily possible. Eclipse also makes maintenance of IBF texts relatively easy.

Split Files

The IBF format makes no demands on the use of a single large file or multiple smaller files to contain the entire text of a Bible. Originally, GBF required a single large monolithic file that contained the entire text of the Bible. Typically this meant that the file sizes ran over 5,000,000 bytes.

We found when editing files of this size that getting lost was easy. Most file editors can open more than one file at a time and they provide navigation aids for keeping track of multiple files. Those aids provide handy navigation between Bible books but only when those books are in separate files. So, as a convention, an entire Bible in IBF format is a collection of individual files, typically named 01.ibf through 66.ibf, which form an entire Bible.

The iapetos/cms tools typically expect their input as a collection of separate files and not one large monolithic file. Logically, though, the Bible text can be thought of as the concatination of all of the specific Bible books.

Tags

There are several types of tags in IBF files including heading tags, paragraph tags, quote tags, text tags and sync marks. All tags start with the less-than symbol "<" and two letters that identify the tag and end with the greater-than symbol ">". Tags are case sensitive. In tags with opening and closing versions, the second letter of the tag is upper case in the opening tag, and lower case in the closing tag. Some tags take a parameter, such as sync marks. This parameter is inserted just before the ending ">".

Heading Tags

Heading tags mark different types of headings in Bible texts. Heading tags come in pairs with an opening tag and closing tag that surround the heading.

Type Opening Tag Closing Tag Comment
Heading Book <HB> <Hb> Marks book name, such as <HB>Genesis<Hb>.
Heading Psalms <HP> <Hp> Marks the five books in Psalms, such as <HP>Book One<Hp>.
Heading Hebrew <HH> <Hh> Marks Hebrew letters, such as <HH>Aleph<Hh>, in Psalm 119.
Heading Section <HS> <Hs> Marks translator's or publisher's section headings (optional).

Paragraph Tags

Paragraph tags cover the needs of poetry in the Bible as well as regular prose paragraphs. Paragraphs begin with a paragraph tag, <PP>, as do stanzas of poetry, <PV>. Open tags close when a new paragraph tag is used, so there's no need for closing paragraph tags. The exception to this rule are the <VS> and <VE> tags which begin and end a section of poetry inside a surrounding prose paragraph.

Type Tag Comment
Paragraph Bullet <PB> Starts a paragraph that is a bullet in a series of paragraphs. <PB> is to a series of paragraphs as <LI> is to items in a list.
Paragraph Intro <PI> Starts a paragraph introducing a passage. Intended for Psalm Intros.
Paragraph List <PL> Starts a paragraph that forms a list. This tag is always followed by a <LI> tag defining how deep the list begins.
List Item <LIn> Starts a list item inside either a <PL> tag or a <PP> tag. The n can be replaced with a number from 0 through 1000 for nesting.
List End <LE> Ends a list and returns to the surrounding prose paragraph.
Paragraph Nest <PN> Starts a paragraph that is nested inside a bullet paragraph.
Paragraph Prose <PP> Starts a paragraph of prose.
Paragraph Salutation <PS> Starts a paragraph containing salutations.
Salutation Item <SI> Starts a salutation item inside a salutation paragraph.
Paragraph Verse <PV> Starts a stanza of poetry.
Verse Stanza <VS> Starts a stanza of poetry inside a prose paragraph. Use a <VS> for each stanza.
Verse Lyric <VL> Starts a new line of poetry inside either a <PV> tag or a <VS> tag. (Typically aligned left.)
Verse Refrain <VR> Starts a second or third line of poetry inside either a <PV> tag or a <VS> tag. (Typically indented.)
Verse End <VE> Ends a section of poetry and returns to the surrounding prose paragraph.

Quote Tags

This is the section of quote tags.

Type Opening Tag Closing Tag Comment
Quote Source <QS> <Qs> Marks quotes of documents, records, letters, genealogies, prayers and other source material in the Bible. (Typically formatted as a blockquote with margin on all sides.)
Quote(d) Text <QTfrom> <Qtto> Marks both ends of scripture quotes. The opening tag takes a paramenter that is a series of three numbers, separated by commas, that identify the address(es) for the scripture being quoted. Similarly, the closing tag takes the same numeric parameter, but supplies the address(es) where the source is eventually quoted.

Text Tags

Text tags mark high value words in the text like digits, measurements, references to time, names of people, places and words of God. Text tags have an opening and closing tag surrounding the word(s) in question.

Type Opening Tag Closing Tag Comment
Text Currency <TC> <Tc> Marks currency like shekel and <TC>denarius<Tc>.
Text Digit <TD> <Td> Marks digits like 5, 10, 15 and <TD>1335<Td>.
Text Inserted <TI> <Ti> Marks text inserted by the translators. (Typically italicized.)
Text Language <TL> <Tl> Marks words in another language like "Mene Mene Tekel Parsin" in Daniel 5.
Text Measure <TM> <Tm> Marks measurements like distance, size and weight, <TM>ephah<Tm>.
Text Name <TN> <Tn> Marks a name like Abraham, Moses and <TN>Jesus<Tn>.
Text Place <TP> <Tp> Marks a place like Babylon, Egypt, Ashdod and <TP>Jerusalem<Tp>.
Text Selah <TS> <Ts> Marks "Selah" in a passage. Selah is common in Psalms, but also occurs in Habakkuk 3. (Typically right justified and italicized.)
Text Time <TT> <Tt> Marks units of time like hour, day, month and <TT>year<Tt>.
Text Word of God <TW> <Tw> Marks words spoken by God. (Typically used for the words of Jesus and formatted red, but can be used anywhere God speaks.)

Sync Marks

Sync marks are placed at the beginning of books, chapters and verses and take a parameter, which is a number, identifying the current book, chapter or verse. Depending upon the versification scheme you use the verse numbers may differ from text to text. We recommend using verse 0 for Psalm Introductions.

Type Tag Comment
Sync Book <SBn> All books are synced by number as per the table below. John is rendered, <SB40>.
Sync Chapter <SCn> Chapter 3, <SC3>.
Sync Verse <SVn> Verse 16, <SV16>.

Sync marks for books take the books' number from the following table. The following book order may be new to you. It's our book order of choice, based on recent research. You can rearrange the books differently if necessary.

Book Number Book Name
1 Genesis
2 Exodus
3 Leviticus
4 Numbers
5 Deuteronomy
6 Joshua
7 Judges
8 First Samuel
9 Second Samuel
10 First Kings
11 Second Kings
12 Ezekiel
13 Isaiah
14 Jeremiah
15 Lamentations
16 Ezra
17 Nehemiah
18 Esther
19 Zechariah
20 Micah
21 Jonah
22 Amos
23 Hosea
24 Haggai
25 Zephaniah
26 Psalms
27 Job
28 Ecclesiastes
29 Proverbs of Solomon
30 Ruth
31 Song of Solomon
32 Joel
33 Obadiah
34 Malachi
35 Nahum
36 Habakkuk
37 First Chronicles
38 Second Chronicles
39 Daniel
40 John
41 Matthew
42 Mark
43 Luke
44 Acts
45 Philippians
46 First Thessalonians
47 Second Thessalonians
48 Second John
49 Second Timothy
50 First Corinthians
51 Galatians
52 Ephesians
53 Romans
54 Hebrews
55 Jacob
56 First Timothy
57 Jude
58 Second Corinthians
59 Philemon
60 Colossians
61 First John
62 First Peter
63 Second Peter
64 Titus
65 Third John
66 Revelation

If you are working with the Deuterocanonical Books use this additional table to map the numbers to the books.

Conventional File and Tag Layout

IBF files have a specific style that helps human editors look at and easily understand the text. The style is readily apparent. Here is a look at the first 4 lines of text found in 01.ibf, the book of Genesis:

<SB1>
<HB>Genesis<Hb>

<SC1><PP>
<SV1>In the beginning God created the heavens and the earth.

The <SB1> tag is a "Sync Book" tag that marks this as the start of book number 1.

The <HB> tag is a "Heading Book" tag that marks the heading for this book. This is the name of the book as used in this translation, which may differ from the conventional names for this book.

The <SC1> tag is a "Sync Chapter" tag that marks the start of a chapter. The chapter number is contained inside the tag. There is usally a blank line placed above <SC#> tags to make visually spotting the chapter break easy.

The <PP> tag is the "Paragraph Prose" tag which marks the start of a paragrah which should be formatted as prose.

The <SV1> tag is the "Sync Verse" tag which marks the start of verse 1. The text following this tag is the text for verse number 1.

Note that the Sync tags are always placed at the start of a new line. The format does not demand this, but it helps considerably with letting humans find their way around in the file. As a consequence of this style, the Paragraph family of tags are usually placed at the end of the preceding line. This is why the <PP> is located in what looks like an odd position.