Mercurial > public > html2wiki
annotate src/org/nwoca/ssdt/tools/html2wiki/Html2Wiki.java @ 0:f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
author | smith@nwoca.org |
---|---|
date | Fri, 12 May 2006 16:45:42 -0400 |
parents | |
children | 5da2e67620f9 |
rev | line source |
---|---|
0
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
1 package org.nwoca.ssdt.tools.html2wiki; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
2 /* |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
3 * Html2Wiki.java |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
4 * |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
5 * Created on May 9, 2006, 3:22 PM |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
6 * |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
7 */ |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
8 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
9 import java.io.*; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
10 import java.util.Collection; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
11 import java.util.ArrayList; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
12 import java.util.List; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
13 import java.util.Iterator; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
14 import org.apache.commons.io.FileSystemUtils; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
15 import org.apache.commons.io.FileUtils; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
16 import java.util.regex.*; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
17 import org.apache.commons.io.FilenameUtils; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
18 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
19 /** |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
20 * Converter to convert HTML documents into MediaWiki test. |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
21 * |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
22 * Heavily customized to handle HTML produced by DEC DOCUMENT |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
23 * SOFTARE doctype. Breaks file into Chapters in the manner done |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
24 * by Document. Needs modification to work with other HTML files. |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
25 * |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
26 * @author SMITH |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
27 */ |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
28 public class Html2Wiki { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
29 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
30 private StringBuffer buffer; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
31 private Collection<Transformer> transformers; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
32 private boolean converted = false; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
33 private static String category; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
34 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
35 /** Creates a new instance of Html2Wiki. */ |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
36 public Html2Wiki(String html) { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
37 buffer = new StringBuffer(html); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
38 transformers = new ArrayList<Transformer>(); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
39 transformers.add(new PreTagTransformer()); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
40 transformers.add(new DeleteTransformer("^\\s",true)); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
41 transformers.add(new DeleteTransformer("<html>|</html>|<body>|</body>")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
42 transformers.add(new DeleteTransformer("<!--.*-->(\\n|\\r)*",true)); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
43 transformers.add(new DeleteTransformer("<a .*?>|</a>")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
44 transformers.add(new DeleteTransformer("(?m)^\\*")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
45 transformers.add(new DeleteTransformer("<blockquote>|</blockquote>")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
46 transformers.add(new DeleteTransformer("<p>")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
47 transformers.add(new DeleteTransformer("(?m)<br>$")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
48 transformers.add(new DeleteTransformer("<font .*?>|</font>")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
49 transformers.add(new CloseTagTransformer("<li>","(\n|\r)*(<li>|</ul>|</ol>|<ul>|<ol>)","\n</li>")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
50 transformers.add(new BadTableDataTransformer()); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
51 transformers.add(new BadTableRowTransformer()); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
52 transformers.add(new ReplaceTransformer("</td>","\n</td>")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
53 transformers.add(new ChapterTransformer(category)); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
54 transformers.add(new TagTransformer("<em>(.*?)</em>", "''")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
55 transformers.add(new TagTransformer("<strong>(.*?)</strong>", "'''")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
56 transformers.add(new TagTransformer("(?s)<kbd>(.*?)</kbd>", "<tt>", "</tt>")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
57 transformers.add(new TagTransformer("<h1>(.*)</h1>", "== ", " ==")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
58 transformers.add(new TagTransformer("<h2>(.*)</h2>", "=== ", " ===")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
59 transformers.add(new TagTransformer("<h3>(accessing the program|sample run|sample screens?|sample reports?)</[h|H]3>","=== ", " ===")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
60 transformers.add(new TagTransformer("<h3>(.*)</H3>", "", "")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
61 transformers.add(new TagTransformer("<h3>(.*)</h3>", "==== ", " ====")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
62 transformers.add(new TagTransformer("<h4>(.*)</h4>", "===== ", " =====")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
63 transformers.add(new TagTransformer("<h5>(.*)</h5>", "====== ", " ======")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
64 transformers.add(new TagTransformer("<h6>(.*)</h6>", "======= ", " =======")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
65 transformers.add(new DeleteTransformer("(?s)<hr.*?>")); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
66 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
67 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
68 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
69 /** |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
70 * @param args the command line arguments |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
71 */ |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
72 public static void main(String[] args) throws IOException { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
73 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
74 if (args.length == 0) { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
75 System.out.println("Usage:"); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
76 System.out.println(" Html2Wiki {inputDirectory} [Category]"); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
77 System.out.println(" default is current directory"); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
78 System.out.println(" Processes all *.html files. "); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
79 System.out.println(" Each 'chapter' written to *.wiki"); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
80 return; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
81 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
82 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
83 File inputs = new File(args[0]); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
84 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
85 if (args.length > 1) { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
86 category = args[1]; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
87 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
88 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
89 File[] inputFiles = inputs.listFiles(new HtmlFileFilter()); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
90 for (int i = 0; i < inputFiles.length; i++) { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
91 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
92 process(inputFiles[i]); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
93 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
94 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
95 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
96 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
97 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
98 protected static void process(File input) throws IOException { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
99 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
100 System.out.println(input.getAbsoluteFile()); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
101 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
102 Html2Wiki converter = new Html2Wiki(FileUtils.readFileToString(input,null)); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
103 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
104 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
105 WikiChapter[] chapters = converter.getWikiChapters(); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
106 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
107 System.out.format("Writing %d wiki files...\n",chapters.length); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
108 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
109 StringBuffer wikiIndex = new StringBuffer(); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
110 wikiIndex.append("Contents:\n\n"); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
111 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
112 for (int i = 0; i < chapters.length; i++) { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
113 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
114 wikiIndex.append("# [[" + chapters[i].getChapterName() + "]]\n"); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
115 FileUtils.writeStringToFile(new File(input.getParent(), |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
116 generateFilename(chapters[i].getChapterName())+".wiki"), |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
117 chapters[i].getContents().toString(), |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
118 null); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
119 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
120 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
121 System.out.println("Writing wikiIndex..."); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
122 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
123 FileUtils.writeStringToFile(new File(FilenameUtils.removeExtension(input.getPath())+".wikiIndex"),wikiIndex.toString(),null); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
124 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
125 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
126 public static String generateFilename(String input) { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
127 return input.replaceAll("\\\\|/|:|\\(|\\)","-"); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
128 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
129 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
130 public String getWikiText() { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
131 convert(); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
132 return buffer.toString(); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
133 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
134 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
135 public WikiChapter[] getWikiChapters() { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
136 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
137 convert(); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
138 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
139 List<WikiChapter> chapters = new ArrayList<WikiChapter>(); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
140 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
141 Pattern chapterPat = Pattern.compile("<chapter>"); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
142 Matcher begin = chapterPat.matcher(buffer); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
143 Matcher end = chapterPat.matcher(buffer); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
144 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
145 while(begin.find()) { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
146 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
147 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
148 end.find(begin.end()); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
149 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
150 Pattern chapterNamePat = Pattern.compile("<chapter>(.*?)</chapter>"); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
151 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
152 Matcher chapterNameMatcher = chapterNamePat.matcher(buffer); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
153 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
154 String chapterName = chapterNameMatcher.find(begin.start()) ? chapterNameMatcher.group(1) : null; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
155 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
156 CharSequence contents = buffer.subSequence(chapterName == null ? begin.start() : chapterNameMatcher.end() |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
157 ,end.hitEnd() ? buffer.length() : end.start()); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
158 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
159 chapters.add(new WikiChapter(chapterName,contents)); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
160 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
161 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
162 return (WikiChapter[])chapters.toArray(new WikiChapter[]{}); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
163 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
164 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
165 private void convert() { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
166 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
167 if(!converted) { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
168 for (Transformer t : transformers) { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
169 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
170 System.out.println(".Applying: " + t); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
171 t.apply(buffer); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
172 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
173 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
174 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
175 converted = true; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
176 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
177 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
178 private static class HtmlFileFilter implements FileFilter { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
179 public boolean accept(File pathname) { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
180 return pathname.getName().toLowerCase().matches("^.*\\.html$"); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
181 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
182 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
183 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
184 private static class WikiChapter { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
185 private String chapterName; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
186 private CharSequence contents; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
187 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
188 public WikiChapter(String chapterName, CharSequence contents) { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
189 this.chapterName = chapterName.replaceAll("\\\\|/|:|\\(|\\)","-").replaceAll("\\s+"," ").replaceAll("&","and"); |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
190 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
191 this.contents = contents; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
192 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
193 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
194 public String getChapterName() { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
195 return chapterName; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
196 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
197 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
198 public CharSequence getContents() { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
199 return contents; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
200 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
201 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
202 public String toString() { |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
203 return "Chapter: " + chapterName + "\nContents: " + contents; |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
204 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
205 } |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
206 |
f8b1ea49d065
Initial version of crude HTML to WikiText converter. Customized for converting HTML files from DEC Document into Wiki markup.
smith@nwoca.org
parents:
diff
changeset
|
207 } |