annotate src/org/nwoca/ssdt/tools/html2wiki/Html2Wiki.java @ 0:f8b1ea49d065

Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
author smith@nwoca.org
date Fri, 12 May 2006 16:45:42 -0400
parents
children 5da2e67620f9
rev   line source
0
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
1 package org.nwoca.ssdt.tools.html2wiki;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
2 /*
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
3 * Html2Wiki.java
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
4 *
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
5 * Created on May 9, 2006, 3:22 PM
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
6 *
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
7 */
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
8
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
9 import java.io.*;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
10 import java.util.Collection;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
11 import java.util.ArrayList;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
12 import java.util.List;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
13 import java.util.Iterator;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
14 import org.apache.commons.io.FileSystemUtils;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
15 import org.apache.commons.io.FileUtils;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
16 import java.util.regex.*;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
17 import org.apache.commons.io.FilenameUtils;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
18
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
19 /**
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
20 * Converter to convert HTML documents into MediaWiki test.
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
21 *
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
22 * Heavily customized to handle HTML produced by DEC DOCUMENT
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
23 * SOFTARE doctype. Breaks file into Chapters in the manner done
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
24 * by Document. Needs modification to work with other HTML files.
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
25 *
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
26 * @author SMITH
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
27 */
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
28 public class Html2Wiki {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
29
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
30 private StringBuffer buffer;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
31 private Collection<Transformer> transformers;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
32 private boolean converted = false;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
33 private static String category;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
34
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
35 /** Creates a new instance of Html2Wiki. */
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
36 public Html2Wiki(String html) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
37 buffer = new StringBuffer(html);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
38 transformers = new ArrayList<Transformer>();
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
39 transformers.add(new PreTagTransformer());
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
40 transformers.add(new DeleteTransformer("^\\s",true));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
41 transformers.add(new DeleteTransformer("<html>|</html>|<body>|</body>"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
42 transformers.add(new DeleteTransformer("<!--.*-->(\\n|\\r)*",true));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
43 transformers.add(new DeleteTransformer("<a .*?>|</a>"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
44 transformers.add(new DeleteTransformer("(?m)^\\*"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
45 transformers.add(new DeleteTransformer("<blockquote>|</blockquote>"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
46 transformers.add(new DeleteTransformer("<p>"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
47 transformers.add(new DeleteTransformer("(?m)<br>$"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
48 transformers.add(new DeleteTransformer("<font .*?>|</font>"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
49 transformers.add(new CloseTagTransformer("<li>","(\n|\r)*(<li>|</ul>|</ol>|<ul>|<ol>)","\n</li>"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
50 transformers.add(new BadTableDataTransformer());
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
51 transformers.add(new BadTableRowTransformer());
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
52 transformers.add(new ReplaceTransformer("</td>","\n</td>"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
53 transformers.add(new ChapterTransformer(category));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
54 transformers.add(new TagTransformer("<em>(.*?)</em>", "''"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
55 transformers.add(new TagTransformer("<strong>(.*?)</strong>", "'''"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
56 transformers.add(new TagTransformer("(?s)<kbd>(.*?)</kbd>", "<tt>", "</tt>"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
57 transformers.add(new TagTransformer("<h1>(.*)</h1>", "== ", " =="));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
58 transformers.add(new TagTransformer("<h2>(.*)</h2>", "=== ", " ==="));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
59 transformers.add(new TagTransformer("<h3>(accessing the program|sample run|sample screens?|sample reports?)</[h|H]3>","=== ", " ==="));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
60 transformers.add(new TagTransformer("<h3>(.*)</H3>", "", ""));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
61 transformers.add(new TagTransformer("<h3>(.*)</h3>", "==== ", " ===="));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
62 transformers.add(new TagTransformer("<h4>(.*)</h4>", "===== ", " ====="));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
63 transformers.add(new TagTransformer("<h5>(.*)</h5>", "====== ", " ======"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
64 transformers.add(new TagTransformer("<h6>(.*)</h6>", "======= ", " ======="));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
65 transformers.add(new DeleteTransformer("(?s)<hr.*?>"));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
66
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
67 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
68
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
69 /**
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
70 * @param args the command line arguments
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
71 */
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
72 public static void main(String[] args) throws IOException {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
73
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
74 if (args.length == 0) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
75 System.out.println("Usage:");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
76 System.out.println(" Html2Wiki {inputDirectory} [Category]");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
77 System.out.println(" default is current directory");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
78 System.out.println(" Processes all *.html files. ");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
79 System.out.println(" Each 'chapter' written to *.wiki");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
80 return;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
81 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
82
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
83 File inputs = new File(args[0]);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
84
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
85 if (args.length > 1) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
86 category = args[1];
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
87 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
88
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
89 File[] inputFiles = inputs.listFiles(new HtmlFileFilter());
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
90 for (int i = 0; i < inputFiles.length; i++) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
91
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
92 process(inputFiles[i]);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
93
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
94 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
95
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
96 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
97
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
98 protected static void process(File input) throws IOException {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
99
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
100 System.out.println(input.getAbsoluteFile());
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
101
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
102 Html2Wiki converter = new Html2Wiki(FileUtils.readFileToString(input,null));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
103
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
104
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
105 WikiChapter[] chapters = converter.getWikiChapters();
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
106
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
107 System.out.format("Writing %d wiki files...\n",chapters.length);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
108
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
109 StringBuffer wikiIndex = new StringBuffer();
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
110 wikiIndex.append("Contents:\n\n");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
111
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
112 for (int i = 0; i < chapters.length; i++) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
113
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
114 wikiIndex.append("# [[" + chapters[i].getChapterName() + "]]\n");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
115 FileUtils.writeStringToFile(new File(input.getParent(),
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
116 generateFilename(chapters[i].getChapterName())+".wiki"),
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
117 chapters[i].getContents().toString(),
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
118 null);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
119
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
120 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
121 System.out.println("Writing wikiIndex...");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
122
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
123 FileUtils.writeStringToFile(new File(FilenameUtils.removeExtension(input.getPath())+".wikiIndex"),wikiIndex.toString(),null);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
124 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
125
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
126 public static String generateFilename(String input) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
127 return input.replaceAll("\\\\|/|:|\\(|\\)","-");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
128
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
129 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
130 public String getWikiText() {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
131 convert();
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
132 return buffer.toString();
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
133 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
134
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
135 public WikiChapter[] getWikiChapters() {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
136
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
137 convert();
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
138
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
139 List<WikiChapter> chapters = new ArrayList<WikiChapter>();
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
140
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
141 Pattern chapterPat = Pattern.compile("<chapter>");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
142 Matcher begin = chapterPat.matcher(buffer);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
143 Matcher end = chapterPat.matcher(buffer);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
144
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
145 while(begin.find()) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
146
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
147
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
148 end.find(begin.end());
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
149
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
150 Pattern chapterNamePat = Pattern.compile("<chapter>(.*?)</chapter>");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
151
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
152 Matcher chapterNameMatcher = chapterNamePat.matcher(buffer);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
153
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
154 String chapterName = chapterNameMatcher.find(begin.start()) ? chapterNameMatcher.group(1) : null;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
155
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
156 CharSequence contents = buffer.subSequence(chapterName == null ? begin.start() : chapterNameMatcher.end()
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
157 ,end.hitEnd() ? buffer.length() : end.start());
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
158
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
159 chapters.add(new WikiChapter(chapterName,contents));
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
160
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
161 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
162 return (WikiChapter[])chapters.toArray(new WikiChapter[]{});
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
163 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
164
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
165 private void convert() {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
166
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
167 if(!converted) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
168 for (Transformer t : transformers) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
169
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
170 System.out.println(".Applying: " + t);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
171 t.apply(buffer);
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
172
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
173 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
174 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
175 converted = true;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
176 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
177
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
178 private static class HtmlFileFilter implements FileFilter {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
179 public boolean accept(File pathname) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
180 return pathname.getName().toLowerCase().matches("^.*\\.html$");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
181 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
182
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
183 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
184 private static class WikiChapter {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
185 private String chapterName;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
186 private CharSequence contents;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
187
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
188 public WikiChapter(String chapterName, CharSequence contents) {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
189 this.chapterName = chapterName.replaceAll("\\\\|/|:|\\(|\\)","-").replaceAll("\\s+"," ").replaceAll("&amp;","and");
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
190
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
191 this.contents = contents;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
192 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
193
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
194 public String getChapterName() {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
195 return chapterName;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
196 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
197
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
198 public CharSequence getContents() {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
199 return contents;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
200 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
201
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
202 public String toString() {
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
203 return "Chapter: " + chapterName + "\nContents: " + contents;
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
204 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
205 }
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
206
f8b1ea49d065 Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff changeset
207 }