![]() |
|||||
|
|
|||||
|
XML Tagging Specifications |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ACLS Humanities E-Book - XML Tagging Specifications (last update: 2/20/08) The following specifications include detailed instructions on preparing files for ACLS Humanities E-Book. Publishers should provide these specifications to their conversion or composition vendors. ACLS can also provide a list of experienced vendors. 1. Files and Resources1.1. DTD/Character Entities 1.2. XML Template 1.3. XSLT Style Sheet 2. Encoding, Special Characters, Styling, Hyphenation 3. Tagging Text A General Note on XML Formatting 3.1. Text Structure 3.1.1. Front, Body, Back 3.1.2. Divisions 3.1.3. Text Chunks 3.1.4. Milestone Section Breaks 3.1.5. Heads 3.2. Front Matter 3.2.1. Series Title List 3.2.2. Title Page 3.2.3. Copyright and Permissions 3.2.4. Table of Contents 3.2.5. List of Illustrations 3.2.6. List of Audio / Film Clips 3.2.7. Dedication, Acknowledgments, etc. 3.2.8. Preface to the Electronic Edition 3.3. General Elements 3.3.1. Paragraphs 3.3.2. Page Breaks 3.3.3. Links: Notes, Internal Links, URLs 3.3.4. Extracts: Quotations, Epigraphs 3.3.5. Figures 3.3.6. Audio and Video Files 3.3.7. Tables and Inserts 3.3.8. Lists 3.3.9. Salute, Signed, Dateline 3.4. Back Matter 3.4.1. Notes 3.4.2. Bibliography 3.4.3. Index 3.4.4. About the Author 4. Proofing and Quality Control 5. List of Elements and Attribute Chart 6. Cover Image 7. Technical Contact at ACLS 8. ACLS Specifications—Log of Specifications Updates |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
1. Files and Resources 1.1 DTD/Character Entities All books must be tagged in XML using the acls-hebook.dtd. A list of all elements and attributes are defined below. The DTD and specifications are periodically updated, so always check for the latest versions before beginning a new book. For examples on tagging and usage, refer to the XML template, 1.3 XSLT Style Sheet An HEB-specific XSLT style sheet may be used to transform an XML file into HTML to check formatting, links, etc. (The resulting single-page HTML document differs slightly from the version that will ultimately appear online, but will be identical in all relevant aspects.) We strongly recommend using this tool for e-book testing prior to submitting the completed XML file. The file Please contact the HEB staff with any questions. XSLT: acls-hebook.xsl ver. 1.3; date: 02-20-2008 (view | download | download zipped)2. Encoding, Special Characters, Styling, Hyphenation Encoding Save files in US-ASCII encoding. Add the encoding type to the XML declaration: <?xml version="1.0" encoding="us-ascii"?>Special Characters In order to ensure that our e-books are properly displayed across a range of standard web browsers, our system can currently only index and display the following characters:
If your book includes special or foreign characters outside this range, please add additional entities at the top of your XML file. Example of added entity at top of an XML file: <!ENTITY oelig "œ"> <!-- LATIN SMALL LIGATURE OE -->We will need to work with the publisher and our programmers on how to index (for searching) and render the additional characters. Note that this should not affect the way you tag or prepare your text for print or other electronic versions. We just want to make it clear that our system can currently handle the entities listed above, and any additional characters will need to be dealt with as needed. Styling To tag italic, bold, struck out, underlined, superscript, or subscripted type, use the <hi1> tag with the appropriate rend attribute value (see list below). Please note that note reference numbers should NOT be tagged using superscipt (see 3.3.3. Links: Notes, Internal Links, URLs for tagging instructions). <hi1 rend="italic">text</hi1>When adding styles to words/phrases followed by punctuation (e.g., "served as editor of the <hi1 rend="italic">Times</hi1>."), please make an effort to be consistent in either including or excluding punctuation from <hi1></hi1> tag—provided this doesn't conflict with the print version or publishing house style. Small caps cannot be easily rendered in HTML browsers, so you must set small cap text in all caps, or convert text to title case. Publisher should advise conversion vendor on how to handle small caps. (It is usually preferable to convert to title case for <div> heads that will appear in the TOC.) HyphenationWhen converting titles from the print version, hyphens inserted to break a word at the end of a line as well as forced end of line breaks must be removed. For example, if back to top]
3. Tagging Text A General Note on XML Formatting When tagging titles, please avoid elaborate formatting involving indents, tabs, and extra spaces, if possible. Please also avoid forced line breaks within tags. (This will facilitate further processing of files at HEB.) 3.1. Text Structure3.1.1. Front, Body, Back Break down the text using <front>, <body>, and <back> tags. Add mandatory HEB number heb900xx (last two digits are specific to title) and ISBN (E-Book) attributes to <text> tag. (Get this info from publisher/production editor.) <text id="heb90001" isbn="1-234-5678-9">3.1.2. Divisions Use division tags (<div1>, <div2>, <div3>, <div4>) to subdivide text within <front>, <body>, and <back>. For example, text with chapters and sections can be broken down as follows: <div1 type="chapter" id="div1_c01"> Every division must have a type and id attribute, and include a <head> tag (see below). Note that division tags can be used for any type of text subdivision, not necessarily just the traditional chapter and section. For many "born-digital" works or new projects that are developed simultaneously with print editions, the publisher may choose to break down text in other ways (e.g., smaller units). 3.1.3. Text ChunksE-books in our system will be delivered in text chunks by the lowest available division level. For example: <div1 type="chapter" id="div1_c01" status="hidden">(To check if each <div> head has been properly tagged, transform XML documents using XSLT. All heads will be listed sequentially in the TOC. Those <div>s including the status="hidden" attribute will not be hyperlinked, whereas those without the attribute will appear linked.) 2. For books without clear section breaks, you can break the chapter into smaller chunks (e.g., by tagging every 10 paragraphs in a separate division). <div1 type="chapter" id="div1_c01" status="hidden">3.1.4. Milestone Section Breaks If you want to add a separation between a series of paragraphs with a simple skipped line space or asterisks, you can put a milestone tag between paragraphs. <p>[text text text]</p>3.1.5. Heads Every division must include a <head> tag. Subparts of heads should be broken down by type in <bibl> tags. Paragraph number ranges for text chunks must also be placed inside a <bibl> tag. It is only necessary to add paragraph-number ranges for the division level at which text chunks will be delivered, not for higher-level divisions that include the status="hidden" attribute (i.e., if text in e-book will be delivered by section, only add paragraph-number ranges to section heads, not part or chapter heads). For sections that include a byline containing specific author information (e.g., Introduction by ...), this info should also be added to the head in a separate <bibl> tag. See example: <div1 type="chapter" id="div1_c01" status="hidden"> (Note: No period or other punctuation should be added after the final number appearing in <bibl type="number"> before the </bibl> closing tag; a colon is added here automatically by our processing script, and adding anything else will result in redundant punctuation. For example, tag as <bibl type="number">1.1</bibl>, not <bibl type="number">1.1.</bibl>.) Heads, bylines, and paragraph-number ranges (e.g., [para 1-10]) will appear in a hyperlinked Table of Contents. For new e-books in development, the publisher must make sure that all divisions have heads. In print version books, some sections do not have heads (e.g., first section at the beginning of a chapter, dedication page, etc.). Publishers should inform their vendors of the text to be inserted, for example, [Dedication], [Intro], or [No head in print version]. [ back to top]
3.2. Front Matter 3.2.1. Series Title List If your book is one in a series of titles, and the print version contains a list of already published titles in the series, please do NOT include this list in the front matter. (Instead, if desired, add information on the series to the Copyright and Permissions section—see below.) 3.2.2. Title PageBegin tagging the <front> section with <titlepage> information. (The information contained within the <titlepage></titlepage> tags is metadata and will not appear as a separate section within the book.) Do not tag the blank and half-title pages. <front>This should be followed by a regular <div1> titlepage section (which may be derived from the print version) that will be visible to readers in the TOC. Tag as follows: <div1 type="titlepage" id="div1_tpg">3.2.3. Copyright and Permissions Continue tagging all other front matter material in <div1> sections. Create a section with copyright and permissions information. Include:3.2.4. Table of Contents The Table of Contents does not need to be tagged because a TOC will be generated and linked dynamically from the heads within each division. 3.2.5. List of IllustrationsA List of Illustrations is NOT generated dynamically, so you must tag this in a <div1> section. Publishers can choose which text portions to include here (we recommend using the <bibl type="figcap"> portion of each <figure> only, omitting text tagged as <bibl type="figsrc">, and urge consideration of abbreviated descriptions for long figure captions). To create a basic List of Illustrations with links to each individual figure, tag as follows: <div1> This section should follow Copyright and Permissions by default, unless there is a pre-existing List of Illustrations which appears in a different order in the print version of the book, and the publisher wishes to retain this order. If the figures are not interspersed in the book text, but rather displayed as a separate section (plates), you do not need to create a separate List of Illustrations. You can simply create a <div1> with the head "Illustrations" that contains all the figures. (See 3.3.5. Figures for specific tagging information.) 3.2.6. List of Audio / Film ClipsPublishers may choose to provide a List of Audio or Film Clips for titles including such media. Formatting should match the List of Illustrations, but link to a clip's container element (e.g., <p>) rather than the clip's (generally id-less) <ref>. (For more on tagging audio and video, see 3.3.6. Audio and Video Files.) Example: <div1>3.2.7. Dedication, Acknowledgments, etc. Tag other front-matter text in separate <div1> sections within the <front> section. For example: <div1 type="dedication" id="div1_ded">3.2.8. Preface to the Electronic Edition Including a brief Preface to the Electronic Edition, containing such information as a description of elements specific to the e-book, links to external resources associated with the e-book, etc., may be helpful to readers. (Samples provided by HEB upon request.) [ back to top]
3.3. General Elements 3.3.1. Paragraphs Use <p> to tag paragraphs. Paragraphs in the text should generally be assigned unique number and id values. In the e-book, the paragraph number (n value) will appear in the left margin next to the paragraph and will be used for identification and citation. Number each paragraph sequentially, beginning with the first paragraph of Acknowledgments (or the first significant front-matter text chunk) and continue numbering throughout the main text. You can also continue numbering through the back matter if it contains paragraphs in sections such as Appendices or About the Author. Unnumbered <p> tags should be used in the following contexts: <p n="40" id="p_40"></p> Numbered paragraphs will be rendered with justified margins and a text block width of 530 pixels. 3.3.2. Page BreaksFor titles that are also published in print, tag print-version page breaks with <pb> tags. Tagging page breaks gives readers a way to find citations based on the print-version pages. It is normally preferable to leave out <pb> tags for blank print pages, unless that page happens to be referenced elsewhere in the book. Note that page breaks must be placed within the main text, not in or above head tags. Generally, a <pb> tag should placed as if to open a new text-portion rather than close the preceding one (e.g., place tag at the beginning of first <p> of new page rather than at end of last <p> of previous page). In some cases, proper line flow may be disrupted by the insertion of page numbers, and placement of <pb> tags occurring within/preceding certain elements—such as epigraphs or tables—might have to be adjusted, e.g., on/in a separate verse line, table row, or unnumbered paragraph (<p><pb n="10" id="pb_10"></p>). Please confer with the HEB staff at ACLS if any questions arise about <pb> placement. <div1 type="chapter" id="div1_c01" status="hidden">Page breaks within text should follow this format: Begining of sentence[space]<pb n="121" id="pb_121">rest of sentence.3.3.3. Links: Notes, Internal Links, URLs Our system currently features the following types of links. Link to note pop-up windowTo tag a note number that will link to a note pop-up window, use the empty <ptr> tag, and add the note-reference number in an n attribute and the note id in a target attribute. Our system will take the n attribute value and put brackets around the number and hyperlink it to a pop-up window that will contain the text of the targeted note. <ptr n="1" target="nt_c01.n1"/> Note: Do not style note reference numbers as superscript. See 3.4.1. Notes for more information about tagging the text of a note. Internal linksPresses should consider taking advantage of the opportunity to insert links to cross-referenced elements (such as chapters or figures) in the electronic version of their title. To tag a link to a specific element, use the empty <ptr> tag, and add "txt" to a type attribute, the number in an n attribute, and the id in a target attribute. Our system will take the n attribute value and hyperlink it to the targeted element. You can link to the following elements: <p>, <pb>, <div1>, <div2>, <div3>, <div4>, <figure>, <table>, <list>, <bibl> (bibliography items). Please note that it is not possible to link to divisions including the status="hidden" attribute. (Instead, link to a <div> at the next level down.) To link to a figure, tag as follows:To wrap a link around a specific word or phrase, use a <ref> tag instead of <ptr>. To link to a div, tag as follows:Link to external URL To link to external URLs, use <ref> tags with the following attributes: <ref type="url" url="http://www.url.com">http://www.url.com</ref> Multiple links for the same term can be handled as follows: Several sites feature an in-depth discussion of insomnia (<ref type="url" url="http://www.url1.com">link 1</ref> | <ref type="url" url="http://www.url2.com">link 2</ref>).3.3.4. Extracts: Quotations, Epigraphs Use <q1> tags to tag extracts formatted as block quotations within a paragraph. Paragraphs within <q1> tags should NOT be numbered. <p n="20" id="p_20">Paragraph text.If quote is formatted in lines (e.g., verse), use <l> or <lg> with <l>: <q1> To tag epigraphs, place <epigraph> around the <q1> tag and add <bibl type="epi"> to tag epigraph author and source. Epigraphs should follow opening tags of whichever divisional level text has been chunked by; e.g., if text is delivered by section, an epigraph opening a chapter should appear within the first <div2> (section-level) tags rather than <div1> (chapter-level) tags. <div1 type="chapter" id="div1_c01" status="hidden">3.3.5. Figures Image files should be named in accordance with the title's HEB number, followed by the figure number. (Note that the letters "heb" must appear in lower case in figure entity names in order to be processed by our system. Other letters appearing in entities—e.g., heb90001.001a—will NOT be processed by our system and should be avoided.) Keep in mind that figure entity names don't necessarily correspond with figure numbers as they appear in the text (e.g., Figure 2.1, the first figure appearing in Chapter 2 and 12th figure total, might be heb90001.0012 or heb90001.0201). Please submit all of the following image files: 1. High-Resolution Tiff files Small jpegs will appear embedded within the text wherever a <figure> tag is inserted. Users will be able to click on these to open a pop-up window showing a larger version of the image. For each title, publishers can select several ways for users to enlarge an image: 1. Simple pop-up: Pop-up window brings up a larger version of the image (large jpeg). Recommended for most images. 2. Image viewer: Pop-up window that shows an image viewer that allows users to zoom in and pan on images (tiff option). Recommended for titles with high-resolution art images or detailed line drawings or maps. 3. External image: Pop-up window opens external URL showing enlarged image. (Used for images housed within other online collections.) Tagging Figures in XMLPlace figure entity declarations at the top of the XML file: <!ENTITY heb90001.0001 SYSTEM "heb90001.0001.jpg" NDATA jpeg> Tag figures within the text as described below. All figure tags must be placed within <p> tags. Break down caption information by figure number, caption, and source/permissions; in some cases, source-information will be incorporated into the main caption text, and it may be desirable to omit <bibl type="figsrc"> altogether. <p>[text text text] Note: Please make sure each <figure> includes a <head> tag. This is necessary for further processing at HEB, but the tag may be left blank if inclusion of a caption is not desired. In addition to the above, for images to be viewed with the image-viewer tool, type attribute type="ic" should be added to the <figure> tag (<figure entity="heb90001.0001" id="fg_heb90001.0001" type="ic">). For external images, type attribute type="ext" should be added (<figure entity="heb90001.0001" id="fg_heb90001.0001" type="ext">). All external images will also require delivery to HEB of target URLs in a spreadsheet. (This information is not tagged in the XML file itself, but housed in a separate database.) Please contact HEB for more information if external images will be used in your e-book. (A third attribute that may be added to the <figure> tag is type="imagemap" (<figure entity="heb90001.0001" id="fg_heb90001.0001" type="imagemap">). This applies only to figures classified as interactive images; an additional requirement for these is supplying HEB with coordinates for points in the image from which interactive links will originate. Please contact HEB for details before proceeding with this type of figure.) If figures appear in a separate section (plates), then just tag as a <div1> section and tag each figure in a <p> tag. It is recommended to move such sections to the front matter (where normally the List of Illustrations would appear) for organizational purposes. Note: jpg extensions should NOT be included for figure entity names listed within <figure> tags. 3.3.6. Audio and Video FilesAudio and video clips in several standard formats may be included (.mp3 and .mov are currently considered optimal for HEB and are preferred). HEB recommends file size on individual clips be kept to 20 MB and under to minimize download times (longer clips may need to be broken down into subcomponents prior to submission). Dimensions for video should ideally be 320 X 240 pixels to match the default pop-up window in which clips will be displayed. (If these guidelines present production problems, please contact HEB.) Clips formatted to the above specifications should be named in accordance with the title's unique HEB number. Tag clips using <ref> tags, as follows: Listen to music clip 1, <ref type="audio" filename="heb90001.0001.mp3">"Title of Clip"</ref>. In order to reference clips within a text, link to the container-element (e.g., <p> or <table>). (Also see film clip <ptr type="txt" target="p_158" n="2"/>.) The same principle applies to creating a List of Audio or Film Clips (see 3.2.6. List of Audio / Film Clips). 3.3.7. Tables and InsertsUse <table> tags for actual tables, or for text formatted as a table. (Adding heads is optional; the id attribute is necessary if a table will be subsequently referenced/linked.) The border attribute may be used for formatting if desired, as well as colspan and rowspan attributes for cells. Do not repeat column or row header cells if table spans several pages (as they often are in the print version in such cases). <table id="tb_1" border="1"> (id and border optional) The attribute type="insert" may be used with the <table> element to section off a specific text portion (e.g., letter, historical document, or text box) from the main text as a text block with a border around it. The insert text should be placed within a single <cell> with attribute type="letter". Often, using extracts or regular paragraphs may be preferable to using this type of formatting. <table type="insert" id="in_1"> (id optional)3.3.8. Lists Use <list> tag for lists. (Heads are optional; the id attribute is required if a list will be subsequently referenced/linked.) You can also nest lists within list items. Items in a list will be formatted with a hanging indent. <list id="ls_1"> (id optional)3.3.9. Salute, Signed, Dateline Text constituting a salutation or signature (e.g., a greeting prefixed to a letter or the closing salutation appended to a foreword) should be tagged using the elements <salute> and <signed>. The element <dateline> should be used to tag the date and/or location prefixed or appended to a letter, transcript, or other document, as well as any text appearing outside this context that serves a similar purpose (e.g., a location/date functioning as a heading of sorts but which is NOT tagged as an actual <head>). The attributes align="center" or align="right" may be added to all three elements for formatting purposes (left alignment is default). <dateline>Location, Date</dateline>[ back to top]
3.4. Back Matter 3.4.1. Notes In <back> section, create a <div1> with type="notes". Place notes for each chapter in separate <div2> . Each note id should follow this format: "[xxx].n[notenumber]" where xxx is [int] or [c01], [c02], etc. <div1 type="notes" id="div1_nts">Handling "Ibid." in Notes Since end notes will appear as pop-up windows, for notes that include the word "Ibid." users will not see the referenced note in the pop-up window. We suggest you replace the word "Ibid." with the referenced text, commenting out "Ibid." and commenting where inserted text begins and ends. Note that often it's not such a clear-cut copy and paste replacement, because the referenced note can include lengthy text or multiple books. The question the publisher will need to work out is which portion of the previous note should replace the "Ibid." (If this is too difficult, it may be preferable to leave "Ibid." in place.) <note1 n="10" id="nt_c01.10"> <p>Jones and Smith, <hi1 rend="italic">History of the United States</hi1>, Chapters 1 and 2.</p></note1> For new online titles, publishers may want to consider ending the usage of terms such as "Ibid." and "Op. cit." in their house style, so that notes can be more efficiently processed in the electronic version. 3.4.2. BibliographyWithin <back> section, create a <div1> section with type="bibliography". If there are multiple sections in the bibliography, create subsections using <div2>, etc. All <bibl> tags (starting with the Bibliography, not those appearing in epigraphs) should be sequentially id'd. Remove any 3-em dashes used in print instead of repeated authors' names and repeat actual names instead. <div1 type="bibliography" id="div1_bib">3.4.3. Index To tag an index, create a <div1> section with type="index". Create a main <list>, then put each letter into a separate sub-nested list (placed within the main list's individual <item> tags). Sub-nested lists for terms should be placed within yet another <list> under the term's <item> tag. To link a page number to a specific page in the text, use the empty <ptr> tag, and add "txt" in a type attribute, the page number in an n attribute, and the page break id in a target attribute. (If an index is being created for a born-digital book, paragraph links can be used instead.) Our system will take the n attribute value, put brackets around the number and hyperlink it to the targeted page break in the text. For page ranges (e.g., "30-35"), only the first page should be tagged. Note references in Index: For note references, the page number rather than the note number should be tagged. (There may be some exceptions to this rule, such as e-books derived from print titles using footnotes rather than endnotes, in which case page numbers for notes are rendered obsolete. In this event, please confer with HEB staff prior to tagging.) Figure references in Index: Instead of <hi1 rend="italic">, use [fig.] after a page link to designate a figure reference. Alternatively, such page links may be converted to direct figure links (see 3.2.5. List of Illustrations for tagging instructions). <div1 type="index" id="div1_ind">NOTE: Indices containing a large number of <ptr> links may lead to very long load times. It is therefore advisable to break overlong indices down into several <div2> subsections (one option is to create a new subsection roughly every 1000 <ptr>s, by letter: e.g., the result might be a 3-section index with the sub-headings "A-G", "H-O","P-Z"). 3.4.4. About the Author As the final <back> matter section, create a <div1> section with information about the author(s). (An author's photo may be included, if desired.) <div1 type="aboutauthor" id="div1_aut">[ back to top]
4. Proofing and Quality Control XML files must be quality checked and proofread before submission to ACLS Humanities E-Book. We have provided a proofing XSLT style sheet to help view XML in a format closer to the final online version. 5. List of ElementsThe following list shows all the elements defined for the ACLS Humanities E-Book acls-hebook.dtd. This list is an edited subset of the elements in the TEI Lite XML DTD.
Attribute Chart For processing purposes, it is useful to make the order in which attributes appear for any given element consistent throughout the text. Note: For id attribute values, prefixes with underscores (e.g., p_, pb_) must be used, as listed below; our system will recognize only id's with these prefixes.
6. Cover Image You must submit a high-resolution TIFF file and two JPEG files for the cover image, which will appear on the Title Record Page of each book. 1. High-resolution tiff image: size: variable; format: TIFF, 300 dpi; image quality: high7. Technical Contact at ACLS Nina Gielen (ngielen@hebook.org) 8. ACLS Specifications—Log of Specifications Updates02-20-08 Updates: New rules and additions: 04-30-07 Updates: New rules and additions: 03-12-07 Update: 1. Copy throughout updated from History E-Book to Humanities E-Book. 09-05-06 Updates: New rules and additions: 12-01-04 Updates: 1. Updated DTD, template, XSLT. 09-22-04 Updates: New rules and additions: 09-30-03 Updates: 1. Cover specs: Submission of cover image as TIFF file (in addition to JPEGs) now required. 09-18-03 Updates: New rules and additions: 05-13-03 Updates: 1. Epigraph source—bibl tag in epigraph source should include attribute type="epi"<bibl type="epi"> 02-20-03 Updates: 1. Figures—clarify image size specs. Small images, maximum size 530.[ back to top]
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||