annotate src/org/nwoca/ssdt/tools/html2wiki/Html2Wiki.java @ 6:99f293bd507f

Add "reflow" transformer to reflow paragraphs, list items, etc.
author smith@nwoca.org
date Thu, 27 Jan 2011 16:37:27 -0500
parents d34f4d408ef9
children a634b4d554d4
rev   line source
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
1 package org.nwoca.ssdt.tools.html2wiki;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
2 /*
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
3 * Html2Wiki.java
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
4 *
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
5 * Created on May 9, 2006, 3:22 PM
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
6 *
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
7 */
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
8
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
9 import java.io.*;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
10 import java.util.Collection;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
11 import java.util.ArrayList;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
12 import java.util.List;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
13 import org.apache.commons.io.FileUtils;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
14 import java.util.regex.*;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
15 import org.apache.commons.io.FilenameUtils;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
16
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
17 /**
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
18 * Converter to convert HTML documents into MediaWiki test.
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
19 *
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
20 * Heavily customized to handle HTML produced by DEC DOCUMENT
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
21 * SOFTARE doctype. Breaks file into Chapters in the manner done
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
22 * by Document. Needs modification to work with other HTML files.
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
23 *
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
24 * @author SMITH
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
25 */
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
26 public class Html2Wiki {
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
27
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
28 private StringBuffer buffer;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
29 private Collection<Transformer> transformers;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
30 private boolean converted = false;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
31 private static String category;
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
32
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
33 /** Creates a new instance of Html2Wiki. */
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
34 public Html2Wiki(String html) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
35 buffer = new StringBuffer(html);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
36 transformers = new ArrayList<Transformer>();
4
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
37 // transformers.add(new PreTagTransformer());
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
38 // transformers.add(new DeleteTransformer("^\\s",true));
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
39 transformers.add(new DeleteTransformer("<html>|</html>|<body>|</body>"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
40 transformers.add(new DeleteTransformer("<!--.*-->(\\n|\\r)*",true));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
41 transformers.add(new DeleteTransformer("<a .*?>|</a>"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
42 transformers.add(new DeleteTransformer("(?m)^\\*"));
4
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
43 // transformers.add(new DeleteTransformer("<blockquote>|</blockquote>"));
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
44 transformers.add(new DeleteTransformer("(?m)<br>$"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
45 transformers.add(new DeleteTransformer("<font .*?>|</font>"));
4
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
46 transformers.add(new CloseTagTransformer("<li>","(\n|\r)*(<li>|</ul>|</ol>|<ul>|<ol>)","</li>"));
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
47 transformers.add(new BadTableDataTransformer());
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
48 transformers.add(new BadTableRowTransformer());
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
49 transformers.add(new ReflowTransformer());
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
50 transformers.add(new DeleteTransformer("<p>"));
4
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
51 // transformers.add(new ReplaceTransformer("</td>","\n</td>"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
52 transformers.add(new ReplaceTransformer("\\{","\\{"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
53 transformers.add(new ReplaceTransformer("\\}","\\}"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
54 // transformers.add(new ReplaceTransformer("\\[","\\["));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
55 // transformers.add(new ReplaceTransformer("\\]","\\]"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
56 transformers.add(new ReplaceTransformer("<br>","\\\\"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
57 transformers.add(new ReplaceTransformer("<table.*?>|</table>","{table}"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
58 transformers.add(new ReplaceTransformer("<tr>|</tr>","{tr}"));
5
d34f4d408ef9 [no commit message]
ferrall@nwoca.org
parents: 4
diff changeset
59 transformers.add(new ReplaceTransformer("<td.*?>|</td>","{td}"));
d34f4d408ef9 [no commit message]
ferrall@nwoca.org
parents: 4
diff changeset
60 transformers.add(new ReplaceTransformer("<th.*?>|</th>","{th}"));
4
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
61 transformers.add(new ReplaceTransformer("<ol.*?>|</ol>","{ol}"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
62 transformers.add(new ReplaceTransformer("<ul.*?>|</ul>","{ul}"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
63 transformers.add(new ReplaceTransformer("<li>","{li}"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
64 transformers.add(new ReplaceTransformer("</li>","{li}\n"));
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
65
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
66 transformers.add(new ChapterTransformer(category));
4
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
67 transformers.add(new TagTransformer("<pre>(.*?)</pre>", true, "{code}","{code}"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
68 transformers.add(new TagTransformer("<center>(.*?)</center>", true, "{center}","{center}"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
69 transformers.add(new TagTransformer("<em>(.*?)</em>", "*","*"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
70 transformers.add(new TagTransformer("<strong>(.*?)</strong>", "*","*"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
71 transformers.add(new TagTransformer("(?s)<kbd>(.*?)</kbd>", "{{", "}}"));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
72 transformers.add(new TagTransformer("<h1>(.*)</h1>", "h1. ", ""));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
73 transformers.add(new TagTransformer("<h2>(.*)</h2>", "h2. ", ""));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
74 transformers.add(new TagTransformer("<h3>(accessing the program|sample run|sample screens?|sample reports?)</[h|H]3>","h3.", ""));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
75 transformers.add(new TagTransformer("<h3>(.*)</H3>", "h3. ", ""));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
76 transformers.add(new TagTransformer("<h3>(.*)</h3>", "h3. ", ""));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
77 transformers.add(new TagTransformer("<h4>(.*)</h4>", "h4. ", ""));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
78 transformers.add(new TagTransformer("<h5>(.*)</h5>", "h5. ", ""));
22ed6d93442c Start modifying transformers to Confluence wiki syntax
smith@nwoca.org
parents: 2
diff changeset
79 transformers.add(new TagTransformer("<h6>(.*)</h6>", "h6. ", ""));
5
d34f4d408ef9 [no commit message]
ferrall@nwoca.org
parents: 4
diff changeset
80 transformers.add(new ReplaceTransformer("\\{center}\\n\\{table}\\n\\{tr\\}\\n\\s{2}\\{td\\}\\{center\\}\\*Note\\*\\{center\\}","{note}"));
d34f4d408ef9 [no commit message]
ferrall@nwoca.org
parents: 4
diff changeset
81 transformers.add(new ReplaceTransformer("\\{td\\}\\n\\s{2}\\{tr\\}\\n\\{table\\}\\n\\{center\\}","{note}"));
d34f4d408ef9 [no commit message]
ferrall@nwoca.org
parents: 4
diff changeset
82
d34f4d408ef9 [no commit message]
ferrall@nwoca.org
parents: 4
diff changeset
83 // transformers.add(new TagTransformer("\\{center}\\n\\{table}\\n\\{tr\\}\\n\\s{2}\\{td\\}\\{center\\}\\*Note\\*\\{center\\}(.*?)\\s\\{td\\}\\n\\s{2}\\{tr\\}\\{table\\}", "{note}", "{note}"));
d34f4d408ef9 [no commit message]
ferrall@nwoca.org
parents: 4
diff changeset
84 // transformers.add(new TagTransformer("(\\S)\\s\\n", "", " "));
d34f4d408ef9 [no commit message]
ferrall@nwoca.org
parents: 4
diff changeset
85 transformers.add(new TagTransformer("<blockquote>(.*)</blockquote>", "{quote}", "{quote}"));
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
86 transformers.add(new DeleteTransformer("(?s)<hr.*?>"));
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
87 transformers.add(new ReflowTransformer("(\\{note\\})([^\\{]*)(\\{note\\})"));
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
88
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
89 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
90
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
91 /**
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
92 * @param args the command line arguments
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
93 */
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
94 public static void main(String[] args) throws IOException {
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
95
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
96 if (args.length == 0) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
97 System.out.println("Usage:");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
98 System.out.println(" Html2Wiki {inputDirectory} [Category]");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
99 System.out.println(" default is current directory");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
100 System.out.println(" Processes all *.html files. ");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
101 System.out.println(" Each 'chapter' written to *.wiki");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
102 return;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
103 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
104
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
105 File inputs = new File(args[0]);
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
106
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
107 if (args.length > 1) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
108 category = args[1];
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
109 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
110
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
111 File[] inputFiles = inputs.listFiles(new HtmlFileFilter());
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
112 for (int i = 0; i < inputFiles.length; i++) {
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
113
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
114 process(inputFiles[i]);
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
115
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
116 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
117
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
118 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
119
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
120 protected static void process(File input) throws IOException {
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
121
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
122 System.out.println(input.getAbsoluteFile());
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
123
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
124 Html2Wiki converter = new Html2Wiki(FileUtils.readFileToString(input, null));
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
125
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
126 WikiChapter[] chapters = converter.getWikiChapters();
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
127
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
128 System.out.format("Writing %d wiki files...\n", chapters.length);
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
129
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
130
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
131 for (int i = 0; i < chapters.length; i++) {
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
132
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
133 FileUtils.writeStringToFile(new File(input.getParent(),
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
134 generateFilename(chapters[i].getChapterName()) + ".wiki"),
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
135 chapters[i].getContents().toString(),
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
136 null);
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
137
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
138 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
139
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
140 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
141
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
142 public static String generateFilename(String input) {
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
143 return input.replaceAll("\\\\|/|:|\\(|\\)", "-").replace("<br>", "");
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
144
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
145 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
146
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
147 public String getWikiText() {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
148 convert();
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
149 return buffer.toString();
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
150 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
151
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
152 public WikiChapter[] getWikiChapters() {
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
153
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
154 convert();
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
155
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
156 List<WikiChapter> chapters = new ArrayList<WikiChapter>();
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
157
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
158 Pattern chapterPat = Pattern.compile("<chapter>");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
159 Matcher begin = chapterPat.matcher(buffer);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
160 Matcher end = chapterPat.matcher(buffer);
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
161
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
162 while (begin.find()) {
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
163
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
164
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
165 end.find(begin.end());
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
166
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
167 Pattern chapterNamePat = Pattern.compile("<chapter>(.*?)</chapter>");
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
168
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
169 Matcher chapterNameMatcher = chapterNamePat.matcher(buffer);
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
170
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
171 String chapterName = chapterNameMatcher.find(begin.start()) ? chapterNameMatcher.group(1) : null;
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
172
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
173 CharSequence contents = buffer.subSequence(chapterName == null ? begin.start() : chapterNameMatcher.end(), end.hitEnd() ? buffer.length() : end.start());
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
174
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
175 chapters.add(new WikiChapter(chapterName, contents));
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
176
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
177 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
178 return (WikiChapter[]) chapters.toArray(new WikiChapter[]{});
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
179 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
180
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
181 private void convert() {
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
182
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
183 if (!converted) {
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
184 for (Transformer t : transformers) {
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
185
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
186 System.out.println(".Applying: " + t);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
187 t.apply(buffer);
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
188
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
189 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
190 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
191 converted = true;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
192 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
193
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
194 private static class HtmlFileFilter implements FileFilter {
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
195
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
196 public boolean accept(File pathname) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
197 return pathname.getName().toLowerCase().matches("^.*\\.html$");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
198 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
199 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
200
2
5da2e67620f9 Upgrade to Ivy configuration and begin clean up of tests. Added FreeBSD license.
smith@nwoca.org
parents: 0
diff changeset
201 protected static class WikiChapter {
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
202
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
203 private String chapterName;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
204 private CharSequence contents;
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
205
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
206 public WikiChapter(String chapterName, CharSequence contents) {
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
207 this.chapterName = chapterName.replaceAll("\\\\|/|:|\\(|\\)", "-").replaceAll("\\s+", " ").replaceAll("&amp;", "and");
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
208
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
209 this.contents = contents;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
210 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
211
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
212 public String getChapterName() {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
213 return chapterName;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
214 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
215
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
216 public CharSequence getContents() {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
217 return contents;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
218 }
6
99f293bd507f Add "reflow" transformer to reflow paragraphs, list items, etc.
smith@nwoca.org
parents: 5
diff changeset
219
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
220 public String toString() {
2
5da2e67620f9 Upgrade to Ivy configuration and begin clean up of tests. Added FreeBSD license.
smith@nwoca.org
parents: 0
diff changeset
221 return "Chapter: " + chapterName + " Content length: " + contents.length();
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
222 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
223 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
224 }