1. /**************************************************************
  2. JLex: A Lexical Analyzer Generator for Java(TM)
  3. Written by Elliot Berk <ejberk@cs.princeton.edu>. Copyright 1996.
  4. Maintained by C. Scott Ananian <cananian@alumni.princeton.edu>.
  5. See below for copyright notice, license, and disclaimer.
  6. New releases from http://www.cs.princeton.edu/~appel/modern/java/JLex/
  7. Version 1.2.5, 7/25/99-5/16/00, [C. Scott Ananian]
  8. Stomped on one more 8-bit character bug. Should work now (really!).
  9. Added unicode support, including unicode escape sequences.
  10. Rewrote internal JavaLexBitSet class as SparseBitSet for efficient
  11. unicoding.
  12. Added an NFA character class simplification pass for unicode efficiency.
  13. Changed byte- and stream-oriented I/O routines to use characters and
  14. java.io.Reader and java.io.Writer instead --- which means we read in
  15. unicode specifications correctly and write out a proper unicode java
  16. source file. As a happy side-effect, the output java file is written
  17. with your platform's preferred newline character(s).
  18. Rewrote CInput to fix bugs with line-counting in the specification file
  19. and "unusual behaviour" when the last line of the specification wasn't
  20. terminated with a newline. Thanks to Matt Hanna <mhanna@cs.caltech.edu>
  21. for pointing out the bug.
  22. Fixed a bug that would cause JLex not to terminate given certain input
  23. specifications. Thanks to Mark Greenstreet <mrg@cs.ubc.ca> and
  24. Frank B. Brokken <frank@suffix.icce.rug.nl> for reporting this.
  25. CUP parser integration improved according to suggestions made by
  26. David MacMahon <davidm@smartsc.com>. The %cup directive now tells
  27. JLex to generate a parser conforming to the java_cup.runtime.Scanner
  28. interface; see manual for more details.
  29. Fixed bug with null string literals ("") in regexps. Reported by
  30. Charles Fischer <fischer@cs.wisc.edu>.
  31. Rewrote start-of-line and end-of-line handling, closing active bug #5.
  32. Also fixed line-counting code, closing active bug #12. All
  33. new-line handling is now platform-independent.
  34. Used unpackFromString more extensively to allow larger cmap, etc,
  35. tables. This helps unicode support work reliably. It's also
  36. prettier now if you happen to read the source to the generated
  37. lexer.
  38. Generated lexer now accepts unicode LS (U+2028) and PS (U+2029) as
  39. line separators for strict unicode compliance; see
  40. http://www.unicode.org/unicode/reports/tr18/
  41. Fixed bug with character constants in action strings. Reported by
  42. Andrew Appel against 1.2.5b3.
  43. Fixed bug with illegal \^C-style escape sequences. Reported by
  44. Toshiya Iwai <iwai@isdnet.co.jp> against 1.2.5b4.
  45. Fixed "newline in quoted string" error when unpaired single- or
  46. double-quotes were present in comments in the action phrase.
  47. Reported by Stephen Ostermiller <1010JLex@ostermiller.com>
  48. against 1.2.5b4. Reported by Eric Esposito <eric.esposito@unh.edu>
  49. against 1.2.4 and 1.2.5b2.
  50. Fixed "newline in quoted string" error when /* or // appeared
  51. in quoted strings in the action phrase. Reported by
  52. David Eichmann <david-eichmann@uiowa.edu> against 1.2.5b5.
  53. Fixed 'illegal constant' errors in case statements caused by
  54. Sun's JDK 1.3 more closely adhering to the Java Language
  55. Specification. Reported by a number of people, but
  56. Harold Grovesteen <hgrovesteen@home.com> was the first to direct me to
  57. a Sun bug report (4119776) which quoted the relevant section of the
  58. JLS (15.27) to convince me that the JLex construction actually was
  59. illegal. Reported against 1.2.5b6, but this bit of code has been
  60. present since the very first version of JLex (1.1.1).
  61. Version 1.2.4, 7/24/99, [C. Scott Ananian]
  62. Correct the parsing of '-' in character classes, closing active
  63. bug #1. Behaviour follows egrep: leading and trailing dashes in
  64. a character class lose their special meaning, so [-+] and [+-] do
  65. what you would expect them to.
  66. New %ignorecase directive for generating case-insensitive lexers by
  67. expanding matched character classes in a unicode-friendly way.
  68. Handle unmatched braces in quoted strings or comments within
  69. action code blocks.
  70. Fixed input lexer to allow whitespace in character classes, closing
  71. active bug #9. Whitespace in quotes had been previously fixed.
  72. Made Yylex.YYEOF and %yyeof work like the manual says they should.
  73. Version 1.2.3, 6/26/97, [Raimondas Lencevicius]
  74. Fixed the yy_nxt[][] assignment that has generated huge code
  75. exceeding 64K method size limit. Now the assignment
  76. is handled by unpacking a string encoding of integer array.
  77. To achieve that, added
  78. "private int [][] unpackFromString(int size1, int size2, String st)"
  79. function and coded the yy_nxt[][] values into a string
  80. by printing integers into a string and representing
  81. integer sequences as "value:length" pairs.
  82. Improvement: generated .java file reduced 2 times, .class file
  83. reduced 6 times for sample grammar. No 64K errors.
  84. Possible negatives: Some editors and OSs may not be able to handle
  85. the huge one-line generated string. String unpacking may be slower
  86. than direct array initialization.
  87. Version 1.2.2, 10/24/97, [Martin Dirichs]
  88. Notes:
  89. Changed yy_instream to yy_reader of type BufferedReader. This reflects
  90. the improvements in the JDK 1.1 concerning InputStreams. As a
  91. consequence, changed yy_buffer from byte[] to char[].
  92. The lexer can now be initialized with either an InputStream
  93. or a Reader. A third, private constructor is called by the other
  94. two to execute user specified constructor code.
  95. Version 1.2.1, 9/15/97 [A. Appel]
  96. Fixed bugs 6 (character codes > 127) and 10 (deprecated String constructor).
  97. Version 1.2, 5/5/97, [Elliot Berk]
  98. Notes:
  99. Simply changed the name from JavaLex to JLex. No other changes.
  100. Version 1.1.5, 2/25/97, [Elliot Berk]
  101. Notes:
  102. Simple optimization to the creation of the source files.
  103. Added a BufferedOutputStream in the creation of the DataOutputStream
  104. field m_outstream of the class CLexGen. This helps performance by
  105. doing some buffering, and was suggested by Max Hailperin,
  106. Associate Professor of Computer Science, Gustavus Adolphus College.
  107. Version 1.1.4, 12/12/96, [Elliot Berk]
  108. Notes:
  109. Added %public directive to make generated class public.
  110. Version 1.1.3, 12/11/96, [Elliot Berk]
  111. Notes:
  112. Converted assertion failure on invalid character class
  113. when a dash '-' is not preceded with a start-of-range character.
  114. Converted this into parse error E_DASH.
  115. Version 1.1.2, October 30, 1996 [Elliot Berk]
  116. Fixed BitSet bugs by installing a BitSet class of my own,
  117. called JavaLexBitSet. Fixed support for '\r', non-UNIX
  118. sequences. Added try/catch block around lexer generation
  119. in main routine to moderate error information presented
  120. to user. Fixed macro expansion, so that macros following
  121. quotes are expanded correctly in regular expressions.
  122. Fixed dynamic reallocation of accept action buffers.
  123. Version 1.1.1, September 3, 1996 [Andrew Appel]
  124. Made the class "Main" instead of "JavaLex",
  125. improved the installation instructions to reflect this.
  126. Version 1.1, August 15, 1996 [Andrew Appel]
  127. Made yychar, yyline, yytext global to the lexer so that
  128. auxiliary functions can access them.
  129. **************************************************************/
  130. /***************************************************************
  131. JLEX COPYRIGHT NOTICE, LICENSE, AND DISCLAIMER
  132. Copyright 1996-2000 by Elliot Joel Berk and C. Scott Ananian
  133. Permission to use, copy, modify, and distribute this software and its
  134. documentation for any purpose and without fee is hereby granted,
  135. provided that the above copyright notice appear in all copies and that
  136. both the copyright notice and this permission notice and warranty
  137. disclaimer appear in supporting documentation, and that the name of
  138. the authors or their employers not be used in advertising or publicity
  139. pertaining to distribution of the software without specific, written
  140. prior permission.
  141. The authors and their employers disclaim all warranties with regard to
  142. this software, including all implied warranties of merchantability and
  143. fitness. In no event shall the authors or their employers be liable
  144. for any special, indirect or consequential damages or any damages
  145. whatsoever resulting from loss of use, data or profits, whether in an
  146. action of contract, negligence or other tortious action, arising out
  147. of or in connection with the use or performance of this software.
  148. **************************************************************/
  149. /***************************************************************
  150. Package Declaration
  151. **************************************************************/
  152. package com.sun.jlex.internal;
  153. /***************************************************************
  154. Imported Packages
  155. **************************************************************/
  156. import java.lang.System;
  157. import java.lang.Integer;
  158. import java.lang.Character;
  159. import java.util.Enumeration;
  160. import java.util.Stack;
  161. import java.util.Hashtable;
  162. import java.util.Vector;
  163. /******************************
  164. Questions:
  165. 2) How should I use the Java package system
  166. to make my tool more modularized and
  167. coherent?
  168. Unimplemented:
  169. !) Fix BitSet issues -- expand only when necessary.
  170. 2) Repeated accept rules.
  171. 6) Clean up the CAlloc class and use buffered
  172. allocation.
  173. 9) Add to spec about extending character set.
  174. 11) m_verbose -- what should be done with it?
  175. 12) turn lexical analyzer into a coherent
  176. Java package
  177. 13) turn lexical analyzer generator into a
  178. coherent Java package
  179. 16) pretty up generated code
  180. 17) make it possible to have white space in
  181. regular expressions
  182. 18) clean up all of the class files the lexer
  183. generator produces when it is compiled,
  184. and reduce this number in some way.
  185. 24) character format to and from file: writeup
  186. and implementation
  187. 25) Debug by testing all arcane regular expression cases.
  188. 26) Look for and fix all UNDONE comments below.
  189. 27) Fix package system.
  190. 28) Clean up unnecessary classes.
  191. *****************************/
  192. /***************************************************************
  193. Class: CSpec
  194. **************************************************************/
  195. class CSpec
  196. {
  197. /***************************************************************
  198. Member Variables
  199. **************************************************************/
  200. /* Lexical States. */
  201. Hashtable m_states; /* Hashtable taking state indices (Integer)
  202. to state name (String). */
  203. /* Regular Expression Macros. */
  204. Hashtable m_macros; /* Hashtable taking macro name (String)
  205. to corresponding char buffer that
  206. holds macro definition. */
  207. /* NFA Machine. */
  208. CNfa m_nfa_start; /* Start state of NFA machine. */
  209. Vector m_nfa_states; /* Vector of states, with index
  210. corresponding to label. */
  211. Vector m_state_rules[]; /* An array of Vectors of Integers.
  212. The ith Vector represents the lexical state
  213. with index i. The contents of the ith
  214. Vector are the indices of the NFA start
  215. states that can be matched while in
  216. the ith lexical state. */
  217. int m_state_dtrans[];
  218. /* DFA Machine. */
  219. Vector m_dfa_states; /* Vector of states, with index
  220. corresponding to label. */
  221. Hashtable m_dfa_sets; /* Hashtable taking set of NFA states
  222. to corresponding DFA state,
  223. if the latter exists. */
  224. /* Accept States and Corresponding Anchors. */
  225. Vector m_accept_vector;
  226. int m_anchor_array[];
  227. /* Transition Table. */
  228. Vector m_dtrans_vector;
  229. int m_dtrans_ncols;
  230. int m_row_map[];
  231. int m_col_map[];
  232. /* Special pseudo-characters for beginning-of-line and end-of-file. */
  233. static final int NUM_PSEUDO=2;
  234. int BOL; // beginning-of-line
  235. int EOF; // end-of-line
  236. /** NFA character class minimization map. */
  237. int m_ccls_map[];
  238. /* Regular expression token variables. */
  239. int m_current_token;
  240. char m_lexeme;
  241. boolean m_in_quote;
  242. boolean m_in_ccl;
  243. /* Verbose execution flag. */
  244. boolean m_verbose;
  245. /* JLex directives flags. */
  246. boolean m_integer_type;
  247. boolean m_intwrap_type;
  248. boolean m_yyeof;
  249. boolean m_count_chars;
  250. boolean m_count_lines;
  251. boolean m_cup_compatible;
  252. boolean m_unix;
  253. boolean m_public;
  254. boolean m_ignorecase;
  255. char m_init_code[];
  256. int m_init_read;
  257. char m_init_throw_code[];
  258. int m_init_throw_read;
  259. char m_class_code[];
  260. int m_class_read;
  261. char m_eof_code[];
  262. int m_eof_read;
  263. char m_eof_value_code[];
  264. int m_eof_value_read;
  265. char m_eof_throw_code[];
  266. int m_eof_throw_read;
  267. char m_yylex_throw_code[];
  268. int m_yylex_throw_read;
  269. /* Class, function, type names. */
  270. char m_class_name[] = {
  271. 'Y', 'y', 'l',
  272. 'e', 'x'
  273. };
  274. char m_implements_name[] = {};
  275. char m_function_name[] = {
  276. 'y', 'y', 'l',
  277. 'e', 'x'
  278. };
  279. char m_type_name[] = {
  280. 'Y', 'y', 't',
  281. 'o', 'k', 'e',
  282. 'n'
  283. };
  284. /* Lexical Generator. */
  285. private CLexGen m_lexGen;
  286. /***************************************************************
  287. Constants
  288. ***********************************************************/
  289. static final int NONE = 0;
  290. static final int START = 1;
  291. static final int END = 2;
  292. /***************************************************************
  293. Function: CSpec
  294. Description: Constructor.
  295. **************************************************************/
  296. CSpec
  297. (
  298. CLexGen lexGen
  299. )
  300. {
  301. m_lexGen = lexGen;
  302. /* Initialize regular expression token variables. */
  303. m_current_token = m_lexGen.EOS;
  304. m_lexeme = '\0';
  305. m_in_quote = false;
  306. m_in_ccl = false;
  307. /* Initialize hashtable for lexer states. */
  308. m_states = new Hashtable();
  309. m_states.put(new String("YYINITIAL"),new Integer(m_states.size()));
  310. /* Initialize hashtable for lexical macros. */
  311. m_macros = new Hashtable();
  312. /* Initialize variables for lexer options. */
  313. m_integer_type = false;
  314. m_intwrap_type = false;
  315. m_count_lines = false;
  316. m_count_chars = false;
  317. m_cup_compatible = false;
  318. m_unix = true;
  319. m_public = false;
  320. m_yyeof = false;
  321. m_ignorecase = false;
  322. /* Initialize variables for JLex runtime options. */
  323. m_verbose = true;
  324. m_nfa_start = null;
  325. m_nfa_states = new Vector();
  326. m_dfa_states = new Vector();
  327. m_dfa_sets = new Hashtable();
  328. m_dtrans_vector = new Vector();
  329. m_dtrans_ncols = CUtility.MAX_SEVEN_BIT + 1;
  330. m_row_map = null;
  331. m_col_map = null;
  332. m_accept_vector = null;
  333. m_anchor_array = null;
  334. m_init_code = null;
  335. m_init_read = 0;
  336. m_init_throw_code = null;
  337. m_init_throw_read = 0;
  338. m_yylex_throw_code = null;
  339. m_yylex_throw_read = 0;
  340. m_class_code = null;
  341. m_class_read = 0;
  342. m_eof_code = null;
  343. m_eof_read = 0;
  344. m_eof_value_code = null;
  345. m_eof_value_read = 0;
  346. m_eof_throw_code = null;
  347. m_eof_throw_read = 0;
  348. m_state_dtrans = null;
  349. m_state_rules = null;
  350. }
  351. }
  352. /***************************************************************
  353. Class: CEmit
  354. **************************************************************/
  355. class CEmit
  356. {
  357. /***************************************************************
  358. Member Variables
  359. **************************************************************/
  360. private CSpec m_spec;
  361. private java.io.PrintWriter m_outstream;
  362. /***************************************************************
  363. Constants: Anchor Types
  364. **************************************************************/
  365. private final int START = 1;
  366. private final int END = 2;
  367. private final int NONE = 4;
  368. /***************************************************************
  369. Constants
  370. **************************************************************/
  371. private final boolean EDBG = true;
  372. private final boolean NOT_EDBG = false;
  373. /***************************************************************
  374. Function: CEmit
  375. Description: Constructor.
  376. **************************************************************/
  377. CEmit
  378. (
  379. )
  380. {
  381. reset();
  382. }
  383. /***************************************************************
  384. Function: reset
  385. Description: Clears member variables.
  386. **************************************************************/
  387. private void reset
  388. (
  389. )
  390. {
  391. m_spec = null;
  392. m_outstream = null;
  393. }
  394. /***************************************************************
  395. Function: set
  396. Description: Initializes member variables.
  397. **************************************************************/
  398. private void set
  399. (
  400. CSpec spec,
  401. java.io.PrintWriter outstream
  402. )
  403. {
  404. if (CUtility.DEBUG)
  405. {
  406. CUtility._assert(null != spec);
  407. CUtility._assert(null != outstream);
  408. }
  409. m_spec = spec;
  410. m_outstream = outstream;
  411. }
  412. /***************************************************************
  413. Function: emit_imports
  414. Description: Emits import packages at top of
  415. generated source file.
  416. **************************************************************/
  417. /*void emit_imports
  418. (
  419. CSpec spec,
  420. OutputStream outstream
  421. )
  422. throws java.io.IOException
  423. {
  424. set(spec,outstream);
  425. if (CUtility.DEBUG)
  426. {
  427. CUtility._assert(null != m_spec);
  428. CUtility._assert(null != m_outstream);
  429. }*/
  430. /*m_outstream.println("import java.lang.String;");
  431. m_outstream.println("import java.lang.System;");
  432. m_outstream.println("import java.io.BufferedReader;");
  433. m_outstream.println("import java.io.InputStream;");*/
  434. /*
  435. reset();
  436. }*/
  437. /***************************************************************
  438. Function: print_details
  439. Description: Debugging output.
  440. **************************************************************/
  441. private void print_details
  442. (
  443. )
  444. {
  445. int i;
  446. int j;
  447. int next;
  448. int state;
  449. CDTrans dtrans;
  450. CAccept accept;
  451. boolean tr;
  452. System.out.println("---------------------- Transition Table "
  453. + "----------------------");
  454. for (i = 0; i < m_spec.m_row_map.length; ++i)
  455. {
  456. System.out.print("State " + i);
  457. accept = (CAccept) m_spec.m_accept_vector.elementAt(i);
  458. if (null == accept)
  459. {
  460. System.out.println(" [nonaccepting]");
  461. }
  462. else
  463. {
  464. System.out.println(" [accepting, line "
  465. + accept.m_line_number
  466. + " <"
  467. + (new java.lang.String(accept.m_action,0,
  468. accept.m_action_read))
  469. + ">]");
  470. }
  471. dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(m_spec.m_row_map[i]);
  472. tr = false;
  473. state = dtrans.m_dtrans[m_spec.m_col_map[0]];
  474. if (CDTrans.F != state)
  475. {
  476. tr = true;
  477. System.out.print("\tgoto " + state + " on [" + ((char) 0));
  478. }
  479. for (j = 1; j < m_spec.m_dtrans_ncols; ++j)
  480. {
  481. next = dtrans.m_dtrans[m_spec.m_col_map[j]];
  482. if (state == next)
  483. {
  484. if (CDTrans.F != state)
  485. {
  486. System.out.print((char) j);
  487. }
  488. }
  489. else
  490. {
  491. state = next;
  492. if (tr)
  493. {
  494. System.out.println("]");
  495. tr = false;
  496. }
  497. if (CDTrans.F != state)
  498. {
  499. tr = true;
  500. System.out.print("\tgoto " + state + " on [" + ((char) j));
  501. }
  502. }
  503. }
  504. if (tr)
  505. {
  506. System.out.println("]");
  507. }
  508. }
  509. System.out.println("---------------------- Transition Table "
  510. + "----------------------");
  511. }
  512. /***************************************************************
  513. Function: emit
  514. Description: High-level access function to module.
  515. **************************************************************/
  516. void emit
  517. (
  518. CSpec spec,
  519. java.io.PrintWriter outstream
  520. )
  521. throws java.io.IOException
  522. {
  523. set(spec,outstream);
  524. if (CUtility.DEBUG)
  525. {
  526. CUtility._assert(null != m_spec);
  527. CUtility._assert(null != m_outstream);
  528. }
  529. if (CUtility.OLD_DEBUG) {
  530. print_details();
  531. }
  532. emit_header();
  533. emit_construct();
  534. emit_helpers();
  535. emit_driver();
  536. emit_footer();
  537. reset();
  538. }
  539. /***************************************************************
  540. Function: emit_construct
  541. Description: Emits constructor, member variables,
  542. and constants.
  543. **************************************************************/
  544. private void emit_construct
  545. (
  546. )
  547. throws java.io.IOException
  548. {
  549. if (CUtility.DEBUG)
  550. {
  551. CUtility._assert(null != m_spec);
  552. CUtility._assert(null != m_outstream);
  553. }
  554. /* Constants */
  555. m_outstream.println("\tprivate final int YY_BUFFER_SIZE = 512;");
  556. m_outstream.println("\tprivate final int YY_F = -1;");
  557. m_outstream.println("\tprivate final int YY_NO_STATE = -1;");
  558. m_outstream.println("\tprivate final int YY_NOT_ACCEPT = 0;");
  559. m_outstream.println("\tprivate final int YY_START = 1;");
  560. m_outstream.println("\tprivate final int YY_END = 2;");
  561. m_outstream.println("\tprivate final int YY_NO_ANCHOR = 4;");
  562. // internal
  563. m_outstream.println("\tprivate final int YY_BOL = "+m_spec.BOL+";");
  564. m_outstream.println("\tprivate final int YY_EOF = "+m_spec.EOF+";");
  565. // external
  566. if (m_spec.m_integer_type || true == m_spec.m_yyeof)
  567. m_outstream.println("\tpublic final int YYEOF = -1;");
  568. /* User specified class code. */
  569. if (null != m_spec.m_class_code)
  570. {
  571. m_outstream.print(new String(m_spec.m_class_code,0,
  572. m_spec.m_class_read));
  573. }
  574. /* Member Variables */
  575. m_outstream.println("\tprivate java.io.BufferedReader yy_reader;");
  576. m_outstream.println("\tprivate int yy_buffer_index;");
  577. m_outstream.println("\tprivate int yy_buffer_read;");
  578. m_outstream.println("\tprivate int yy_buffer_start;");
  579. m_outstream.println("\tprivate int yy_buffer_end;");
  580. m_outstream.println("\tprivate char yy_buffer[];");
  581. if (m_spec.m_count_chars)
  582. {
  583. m_outstream.println("\tprivate int yychar;");
  584. }
  585. if (m_spec.m_count_lines)
  586. {
  587. m_outstream.println("\tprivate int yyline;");
  588. }
  589. m_outstream.println("\tprivate boolean yy_at_bol;");
  590. m_outstream.println("\tprivate int yy_lexical_state;");
  591. /*if (m_spec.m_count_lines || true == m_spec.m_count_chars)
  592. {
  593. m_outstream.println("\tprivate int yy_buffer_prev_start;");
  594. }*/
  595. m_outstream.println();
  596. /* Function: first constructor (Reader) */
  597. m_outstream.print("\t");
  598. if (true == m_spec.m_public) {
  599. m_outstream.print("public ");
  600. }
  601. m_outstream.print(new String(m_spec.m_class_name));
  602. m_outstream.print(" (java.io.Reader reader)");
  603. if (null != m_spec.m_init_throw_code)
  604. {
  605. m_outstream.println();
  606. m_outstream.print("\t\tthrows ");
  607. m_outstream.print(new String(m_spec.m_init_throw_code,0,
  608. m_spec.m_init_throw_read));
  609. m_outstream.println();
  610. m_outstream.println("\t\t{");
  611. }
  612. else
  613. {
  614. m_outstream.println(" {");
  615. }
  616. m_outstream.println("\t\tthis ();");
  617. m_outstream.println("\t\tif (null == reader) {");
  618. m_outstream.println("\t\t\tthrow (new Error(\"Error: Bad input "
  619. + "stream initializer.\"));");
  620. m_outstream.println("\t\t}");
  621. m_outstream.println("\t\tyy_reader = new java.io.BufferedReader(reader);");
  622. m_outstream.println("\t}");
  623. m_outstream.println();
  624. /* Function: second constructor (InputStream) */
  625. m_outstream.print("\t");
  626. if (true == m_spec.m_public) {
  627. m_outstream.print("public ");
  628. }
  629. m_outstream.print(new String(m_spec.m_class_name));
  630. m_outstream.print(" (java.io.InputStream instream)");
  631. if (null != m_spec.m_init_throw_code)
  632. {
  633. m_outstream.println();
  634. m_outstream.print("\t\tthrows ");
  635. m_outstream.println(new String(m_spec.m_init_throw_code,0,
  636. m_spec.m_init_throw_read));
  637. m_outstream.println("\t\t{");
  638. }
  639. else
  640. {
  641. m_outstream.println(" {");
  642. }
  643. m_outstream.println("\t\tthis ();");
  644. m_outstream.println("\t\tif (null == instream) {");
  645. m_outstream.println("\t\t\tthrow (new Error(\"Error: Bad input "
  646. + "stream initializer.\"));");
  647. m_outstream.println("\t\t}");
  648. m_outstream.println("\t\tyy_reader = new java.io.BufferedReader(new java.io.InputStreamReader(instream));");
  649. m_outstream.println("\t}");
  650. m_outstream.println();
  651. /* Function: third, private constructor - only for internal use */
  652. m_outstream.print("\tprivate ");
  653. m_outstream.print(new String(m_spec.m_class_name));
  654. m_outstream.print(" ()");
  655. if (null != m_spec.m_init_throw_code)
  656. {
  657. m_outstream.println();
  658. m_outstream.print("\t\tthrows ");
  659. m_outstream.println(new String(m_spec.m_init_throw_code,0,
  660. m_spec.m_init_throw_read));
  661. m_outstream.println("\t\t{");
  662. }
  663. else
  664. {
  665. m_outstream.println(" {");
  666. }
  667. m_outstream.println("\t\tyy_buffer = new char[YY_BUFFER_SIZE];");
  668. m_outstream.println("\t\tyy_buffer_read = 0;");
  669. m_outstream.println("\t\tyy_buffer_index = 0;");
  670. m_outstream.println("\t\tyy_buffer_start = 0;");
  671. m_outstream.println("\t\tyy_buffer_end = 0;");
  672. if (m_spec.m_count_chars)
  673. {
  674. m_outstream.println("\t\tyychar = 0;");
  675. }
  676. if (m_spec.m_count_lines)
  677. {
  678. m_outstream.println("\t\tyyline = 0;");
  679. }
  680. m_outstream.println("\t\tyy_at_bol = true;");
  681. m_outstream.println("\t\tyy_lexical_state = YYINITIAL;");
  682. /*if (m_spec.m_count_lines || true == m_spec.m_count_chars)
  683. {
  684. m_outstream.println("\t\tyy_buffer_prev_start = 0;");
  685. }*/
  686. /* User specified constructor code. */
  687. if (null != m_spec.m_init_code)
  688. {
  689. m_outstream.print(new String(m_spec.m_init_code,0,
  690. m_spec.m_init_read));
  691. }
  692. m_outstream.println("\t}");
  693. m_outstream.println();
  694. }
  695. /***************************************************************
  696. Function: emit_states
  697. Description: Emits constants that serve as lexical states,
  698. including YYINITIAL.
  699. **************************************************************/
  700. private void emit_states
  701. (
  702. )
  703. throws java.io.IOException
  704. {
  705. Enumeration states;
  706. String state;
  707. int index;
  708. states = m_spec.m_states.keys();
  709. /*index = 0;*/
  710. while (states.hasMoreElements())
  711. {
  712. state = (String) states.nextElement();
  713. if (CUtility.DEBUG)
  714. {
  715. CUtility._assert(null != state);
  716. }
  717. m_outstream.println("\tprivate final int "
  718. + state
  719. + " = "
  720. + (m_spec.m_states.get(state)).toString()
  721. + ";");
  722. /*++index;*/
  723. }
  724. m_outstream.println("\tprivate final int yy_state_dtrans[] = {");
  725. for (index = 0; index < m_spec.m_state_dtrans.length; ++index)
  726. {
  727. m_outstream.print("\t\t" + m_spec.m_state_dtrans[index]);
  728. if (index < m_spec.m_state_dtrans.length - 1)
  729. {
  730. m_outstream.println(",");
  731. }
  732. else
  733. {
  734. m_outstream.println();
  735. }
  736. }
  737. m_outstream.println("\t};");
  738. }
  739. /***************************************************************
  740. Function: emit_helpers
  741. Description: Emits helper functions, particularly
  742. error handling and input buffering.
  743. **************************************************************/
  744. private void emit_helpers
  745. (
  746. )
  747. throws java.io.IOException
  748. {
  749. if (CUtility.DEBUG)
  750. {
  751. CUtility._assert(null != m_spec);
  752. CUtility._assert(null != m_outstream);
  753. }
  754. /* Function: yy_do_eof */
  755. m_outstream.println("\tprivate boolean yy_eof_done = false;");
  756. if (null != m_spec.m_eof_code)
  757. {
  758. m_outstream.print("\tprivate void yy_do_eof ()");
  759. if (null != m_spec.m_eof_throw_code)
  760. {
  761. m_outstream.println();
  762. m_outstream.print("\t\tthrows ");
  763. m_outstream.println(new String(m_spec.m_eof_throw_code,0,
  764. m_spec.m_eof_throw_read));
  765. m_outstream.println("\t\t{");
  766. }
  767. else
  768. {
  769. m_outstream.println(" {");
  770. }
  771. m_outstream.println("\t\tif (false == yy_eof_done) {");
  772. m_outstream.print(new String(m_spec.m_eof_code,0,
  773. m_spec.m_eof_read));
  774. m_outstream.println("\t\t}");
  775. m_outstream.println("\t\tyy_eof_done = true;");
  776. m_outstream.println("\t}");
  777. }
  778. emit_states();
  779. /* Function: yybegin */
  780. m_outstream.println("\tprivate void yybegin (int state) {");
  781. m_outstream.println("\t\tyy_lexical_state = state;");
  782. m_outstream.println("\t}");
  783. /* Function: yy_initial_dtrans */
  784. /*m_outstream.println("\tprivate int yy_initial_dtrans (int state) {");
  785. m_outstream.println("\t\treturn yy_state_dtrans[state];");
  786. m_outstream.println("\t}");*/
  787. /* Function: yy_advance */
  788. m_outstream.println("\tprivate int yy_advance ()");
  789. m_outstream.println("\t\tthrows java.io.IOException {");
  790. /*m_outstream.println("\t\t{");*/
  791. m_outstream.println("\t\tint next_read;");
  792. m_outstream.println("\t\tint i;");
  793. m_outstream.println("\t\tint j;");
  794. m_outstream.println();
  795. m_outstream.println("\t\tif (yy_buffer_index < yy_buffer_read) {");
  796. m_outstream.println("\t\t\treturn yy_buffer[yy_buffer_index++];");
  797. /*m_outstream.println("\t\t\t++yy_buffer_index;");*/
  798. m_outstream.println("\t\t}");
  799. m_outstream.println();
  800. m_outstream.println("\t\tif (0 != yy_buffer_start) {");
  801. m_outstream.println("\t\t\ti = yy_buffer_start;");
  802. m_outstream.println("\t\t\tj = 0;");
  803. m_outstream.println("\t\t\twhile (i < yy_buffer_read) {");
  804. m_outstream.println("\t\t\t\tyy_buffer[j] = yy_buffer[i];");
  805. m_outstream.println("\t\t\t\t++i;");
  806. m_outstream.println("\t\t\t\t++j;");
  807. m_outstream.println("\t\t\t}");
  808. m_outstream.println("\t\t\tyy_buffer_end = yy_buffer_end - yy_buffer_start;");
  809. m_outstream.println("\t\t\tyy_buffer_start = 0;");
  810. m_outstream.println("\t\t\tyy_buffer_read = j;");
  811. m_outstream.println("\t\t\tyy_buffer_index = j;");
  812. m_outstream.println("\t\t\tnext_read = yy_reader.read(yy_buffer,");
  813. m_outstream.println("\t\t\t\t\tyy_buffer_read,");
  814. m_outstream.println("\t\t\t\t\tyy_buffer.length - yy_buffer_read);");
  815. m_outstream.println("\t\t\tif (-1 == next_read) {");
  816. m_outstream.println("\t\t\t\treturn YY_EOF;");
  817. m_outstream.println("\t\t\t}");
  818. m_outstream.println("\t\t\tyy_buffer_read = yy_buffer_read + next_read;");
  819. m_outstream.println("\t\t}");
  820. m_outstream.println();
  821. m_outstream.println("\t\twhile (yy_buffer_index >= yy_buffer_read) {");
  822. m_outstream.println("\t\t\tif (yy_buffer_index >= yy_buffer.length) {");
  823. m_outstream.println("\t\t\t\tyy_buffer = yy_double(yy_buffer);");
  824. m_outstream.println("\t\t\t}");
  825. m_outstream.println("\t\t\tnext_read = yy_reader.read(yy_buffer,");
  826. m_outstream.println("\t\t\t\t\tyy_buffer_read,");
  827. m_outstream.println("\t\t\t\t\tyy_buffer.length - yy_buffer_read);");
  828. m_outstream.println("\t\t\tif (-1 == next_read) {");
  829. m_outstream.println("\t\t\t\treturn YY_EOF;");
  830. m_outstream.println("\t\t\t}");
  831. m_outstream.println("\t\t\tyy_buffer_read = yy_buffer_read + next_read;");
  832. m_outstream.println("\t\t}");
  833. m_outstream.println("\t\treturn yy_buffer[yy_buffer_index++];");
  834. m_outstream.println("\t}");
  835. /* Function: yy_move_end */
  836. m_outstream.println("\tprivate void yy_move_end () {");
  837. m_outstream.println("\t\tif (yy_buffer_end > yy_buffer_start &&");
  838. m_outstream.println("\t\t '\\n' == yy_buffer[yy_buffer_end-1])");
  839. m_outstream.println("\t\t\tyy_buffer_end--;");
  840. m_outstream.println("\t\tif (yy_buffer_end > yy_buffer_start &&");
  841. m_outstream.println("\t\t '\\r' == yy_buffer[yy_buffer_end-1])");
  842. m_outstream.println("\t\t\tyy_buffer_end--;");
  843. m_outstream.println("\t}");
  844. /* Function: yy_mark_start */
  845. m_outstream.println("\tprivate boolean yy_last_was_cr=false;");
  846. m_outstream.println("\tprivate void yy_mark_start () {");
  847. if (m_spec.m_count_lines || true == m_spec.m_count_chars)
  848. {
  849. if (m_spec.m_count_lines)
  850. {
  851. m_outstream.println("\t\tint i;");
  852. m_outstream.println("\t\tfor (i = yy_buffer_start; "
  853. + "i < yy_buffer_index; ++i) {");
  854. m_outstream.println("\t\t\tif ('\\n' == yy_buffer[i] && !yy_last_was_cr) {");
  855. m_outstream.println("\t\t\t\t++yyline;");
  856. m_outstream.println("\t\t\t}");
  857. m_outstream.println("\t\t\tif ('\\r' == yy_buffer[i]) {");
  858. m_outstream.println("\t\t\t\t++yyline;");
  859. m_outstream.println("\t\t\t\tyy_last_was_cr=true;");
  860. m_outstream.println("\t\t\t} else yy_last_was_cr=false;");
  861. m_outstream.println("\t\t}");
  862. }
  863. if (m_spec.m_count_chars)
  864. {
  865. m_outstream.println("\t\tyychar = yychar");
  866. m_outstream.println("\t\t\t+ yy_buffer_index - yy_buffer_start;");
  867. }
  868. }
  869. m_outstream.println("\t\tyy_buffer_start = yy_buffer_index;");
  870. m_outstream.println("\t}");
  871. /* Function: yy_mark_end */
  872. m_outstream.println("\tprivate void yy_mark_end () {");
  873. m_outstream.println("\t\tyy_buffer_end = yy_buffer_index;");
  874. m_outstream.println("\t}");
  875. /* Function: yy_to_mark */
  876. m_outstream.println("\tprivate void yy_to_mark () {");
  877. m_outstream.println("\t\tyy_buffer_index = yy_buffer_end;");
  878. m_outstream.println("\t\tyy_at_bol = "+
  879. "(yy_buffer_end > yy_buffer_start) &&");
  880. m_outstream.println("\t\t "+
  881. "('\\r' == yy_buffer[yy_buffer_end-1] ||");
  882. m_outstream.println("\t\t "+
  883. " '\\n' == yy_buffer[yy_buffer_end-1] ||");
  884. m_outstream.println("\t\t "+ /* unicode LS */
  885. " 2028/*LS*/ == yy_buffer[yy_buffer_end-1] ||");
  886. m_outstream.println("\t\t "+ /* unicode PS */
  887. " 2029/*PS*/ == yy_buffer[yy_buffer_end-1]);");
  888. m_outstream.println("\t}");
  889. /* Function: yytext */
  890. m_outstream.println("\tprivate java.lang.String yytext () {");
  891. m_outstream.println("\t\treturn (new java.lang.String(yy_buffer,");
  892. m_outstream.println("\t\t\tyy_buffer_start,");
  893. m_outstream.println("\t\t\tyy_buffer_end - yy_buffer_start));");
  894. m_outstream.println("\t}");
  895. /* Function: yylength */
  896. m_outstream.println("\tprivate int yylength () {");
  897. m_outstream.println("\t\treturn yy_buffer_end - yy_buffer_start;");
  898. m_outstream.println("\t}");
  899. /* Function: yy_double */
  900. m_outstream.println("\tprivate char[] yy_double (char buf[]) {");
  901. m_outstream.println("\t\tint i;");
  902. m_outstream.println("\t\tchar newbuf[];");
  903. m_outstream.println("\t\tnewbuf = new char[2*buf.length];");
  904. m_outstream.println("\t\tfor (i = 0; i < buf.length; ++i) {");
  905. m_outstream.println("\t\t\tnewbuf[i] = buf[i];");
  906. m_outstream.println("\t\t}");
  907. m_outstream.println("\t\treturn newbuf;");
  908. m_outstream.println("\t}");
  909. /* Function: yy_error */
  910. m_outstream.println("\tprivate final int YY_E_INTERNAL = 0;");
  911. m_outstream.println("\tprivate final int YY_E_MATCH = 1;");
  912. m_outstream.println("\tprivate java.lang.String yy_error_string[] = {");
  913. m_outstream.println("\t\t\"Error: Internal error.\\n\",");
  914. m_outstream.println("\t\t\"Error: Unmatched input.\\n\"");
  915. m_outstream.println("\t};");
  916. m_outstream.println("\tprivate void yy_error (int code,boolean fatal) {");
  917. m_outstream.println("\t\tjava.lang.System.out.print(yy_error_string[code]);");
  918. m_outstream.println("\t\tjava.lang.System.out.flush();");
  919. m_outstream.println("\t\tif (fatal) {");
  920. m_outstream.println("\t\t\tthrow new Error(\"Fatal Error.\\n\");");
  921. m_outstream.println("\t\t}");
  922. m_outstream.println("\t}");
  923. /* Function: yy_next */
  924. /*m_outstream.println("\tprivate int yy_next (int current,char lookahead) {");
  925. m_outstream.println("\t\treturn yy_nxt[yy_rmap[current]][yy_cmap[lookahead]];");
  926. m_outstream.println("\t}");*/
  927. /* Function: yy_accept */
  928. /*m_outstream.println("\tprivate int yy_accept (int current) {");
  929. m_outstream.println("\t\treturn yy_acpt[current];");
  930. m_outstream.println("\t}");*/
  931. // Function: private int [][] unpackFromString(int size1, int size2, String st)
  932. // Added 6/24/98 Raimondas Lencevicius
  933. // May be made more efficient by replacing String operations
  934. // Assumes correctly formed input String. Performs no error checking
  935. if (Main.staticFlag) { // S.M.Pericas
  936. m_outstream.println("\tstatic private int[][] unpackFromString"+
  937. "(int size1, int size2, String st) {");
  938. }
  939. else {
  940. m_outstream.println("\tprivate int[][] unpackFromString"+
  941. "(int size1, int size2, String st) {");
  942. }
  943. m_outstream.println("\t\tint colonIndex = -1;");
  944. m_outstream.println("\t\tString lengthString;");
  945. m_outstream.println("\t\tint sequenceLength = 0;");
  946. m_outstream.println("\t\tint sequenceInteger = 0;");
  947. m_outstream.println();
  948. m_outstream.println("\t\tint commaIndex;");
  949. m_outstream.println("\t\tString workString;");
  950. m_outstream.println();
  951. m_outstream.println("\t\tint res[][] = new int[size1][size2];");
  952. m_outstream.println("\t\tfor (int i= 0; i < size1; i++) {");
  953. m_outstream.println("\t\t\tfor (int j= 0; j < size2; j++) {");
  954. m_outstream.println("\t\t\t\tif (sequenceLength != 0) {");
  955. m_outstream.println("\t\t\t\t\tres[i][j] = sequenceInteger;");
  956. m_outstream.println("\t\t\t\t\tsequenceLength--;");
  957. m_outstream.println("\t\t\t\t\tcontinue;");
  958. m_outstream.println("\t\t\t\t}");
  959. m_outstream.println("\t\t\t\tcommaIndex = st.indexOf(',');");
  960. m_outstream.println("\t\t\t\tworkString = (commaIndex==-1) ? st :");
  961. m_outstream.println("\t\t\t\t\tst.substring(0, commaIndex);");
  962. m_outstream.println("\t\t\t\tst = st.substring(commaIndex+1);");
  963. m_outstream.println("\t\t\t\tcolonIndex = workString.indexOf(':');");
  964. m_outstream.println("\t\t\t\tif (colonIndex == -1) {");
  965. m_outstream.println("\t\t\t\t\tres[i][j]=Integer.parseInt(workString);");
  966. m_outstream.println("\t\t\t\t\tcontinue;");
  967. m_outstream.println("\t\t\t\t}");
  968. m_outstream.println("\t\t\t\tlengthString =");
  969. m_outstream.println("\t\t\t\t\tworkString.substring(colonIndex+1);");
  970. m_outstream.println("\t\t\t\tsequenceLength="+
  971. "Integer.parseInt(lengthString);");
  972. m_outstream.println("\t\t\t\tworkString="+
  973. "workString.substring(0,colonIndex);");
  974. m_outstream.println("\t\t\t\tsequenceInteger="+
  975. "Integer.parseInt(workString);");
  976. m_outstream.println("\t\t\t\tres[i][j] = sequenceInteger;");
  977. m_outstream.println("\t\t\t\tsequenceLength--;");
  978. m_outstream.println("\t\t\t}");
  979. m_outstream.println("\t\t}");
  980. m_outstream.println("\t\treturn res;");
  981. m_outstream.println("\t}");
  982. }
  983. /***************************************************************
  984. Function: emit_header
  985. Description: Emits class header.
  986. **************************************************************/
  987. private void emit_header
  988. (
  989. )
  990. throws java.io.IOException
  991. {
  992. if (CUtility.DEBUG)
  993. {
  994. CUtility._assert(null != m_spec);
  995. CUtility._assert(null != m_outstream);
  996. }
  997. m_outstream.println();
  998. m_outstream.println();
  999. if (true == m_spec.m_public) {
  1000. m_outstream.print("public ");
  1001. }
  1002. m_outstream.print("class ");
  1003. m_outstream.print(new String(m_spec.m_class_name,0,
  1004. m_spec.m_class_name.length));
  1005. if (m_spec.m_implements_name.length > 0) {
  1006. m_outstream.print(" implements ");
  1007. m_outstream.print(new String(m_spec.m_implements_name,0,
  1008. m_spec.m_implements_name.length));
  1009. }
  1010. m_outstream.println(" {");
  1011. }
  1012. /***************************************************************
  1013. Function: emit_table
  1014. Description: Emits transition table.
  1015. **************************************************************/
  1016. private void emit_table
  1017. (
  1018. )
  1019. throws java.io.IOException
  1020. {
  1021. int i;
  1022. int elem;
  1023. int size;
  1024. CDTrans dtrans;
  1025. boolean is_start;
  1026. boolean is_end;
  1027. CAccept accept;
  1028. if (CUtility.DEBUG)
  1029. {
  1030. CUtility._assert(null != m_spec);
  1031. CUtility._assert(null != m_outstream);
  1032. }
  1033. m_outstream.println("\tprivate int yy_acpt[] = {");
  1034. size = m_spec.m_accept_vector.size();
  1035. for (elem = 0; elem < size; ++elem)
  1036. {
  1037. accept = (CAccept) m_spec.m_accept_vector.elementAt(elem);
  1038. m_outstream.print("\t\t/* "+elem+" */ ");
  1039. if (null != accept)
  1040. {
  1041. is_start = (0 != (m_spec.m_anchor_array[elem] & CSpec.START));
  1042. is_end = (0 != (m_spec.m_anchor_array[elem] & CSpec.END));
  1043. if (is_start && true == is_end)
  1044. {
  1045. m_outstream.print("YY_START | YY_END");
  1046. }
  1047. else if (is_start)
  1048. {
  1049. m_outstream.print("YY_START");
  1050. }
  1051. else if (is_end)
  1052. {
  1053. m_outstream.print("YY_END");
  1054. }
  1055. else
  1056. {
  1057. m_outstream.print("YY_NO_ANCHOR");
  1058. }
  1059. }
  1060. else
  1061. {
  1062. m_outstream.print("YY_NOT_ACCEPT");
  1063. }
  1064. if (elem < size - 1)
  1065. {
  1066. m_outstream.print(",");
  1067. }
  1068. m_outstream.println();
  1069. }
  1070. m_outstream.println("\t};");
  1071. // CSA: modified yy_cmap to use string packing 9-Aug-1999
  1072. int[] yy_cmap = new int[m_spec.m_ccls_map.length];
  1073. for (i = 0; i < m_spec.m_ccls_map.length; ++i)
  1074. yy_cmap[i] = m_spec.m_col_map[m_spec.m_ccls_map[i]];
  1075. if (Main.staticFlag) {
  1076. m_outstream.print("\tstatic private int yy_cmap[] = unpackFromString(");
  1077. }
  1078. else {
  1079. m_outstream.print("\tprivate int yy_cmap[] = unpackFromString(");
  1080. }
  1081. emit_table_as_string(new int[][] { yy_cmap });
  1082. m_outstream.println(")[0];");
  1083. m_outstream.println();
  1084. // CSA: modified yy_rmap to use string packing 9-Aug-1999
  1085. if (Main.staticFlag) {
  1086. m_outstream.print("\tstatic private int yy_rmap[] = unpackFromString(");
  1087. }
  1088. else {
  1089. m_outstream.print("\tprivate int yy_rmap[] = unpackFromString(");
  1090. }
  1091. emit_table_as_string(new int[][] { m_spec.m_row_map });
  1092. m_outstream.println(")[0];");
  1093. m_outstream.println();
  1094. // 6/24/98 Raimondas Lencevicius
  1095. // modified to use
  1096. // int[][] unpackFromString(int size1, int size2, String st)
  1097. size = m_spec.m_dtrans_vector.size();
  1098. int[][] yy_nxt = new int[size][];
  1099. for (elem=0; elem<size; elem++) {
  1100. dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(elem);
  1101. CUtility._assert(dtrans.m_dtrans.length==m_spec.m_dtrans_ncols);
  1102. yy_nxt[elem] = dtrans.m_dtrans;
  1103. }
  1104. if (Main.staticFlag) {
  1105. m_outstream.print
  1106. ("\tstatic private int yy_nxt[][] = unpackFromString(");
  1107. }
  1108. else {
  1109. m_outstream.print
  1110. ("\tprivate int yy_nxt[][] = unpackFromString(");
  1111. }
  1112. emit_table_as_string(yy_nxt);
  1113. m_outstream.println(");");
  1114. m_outstream.println();
  1115. }
  1116. /***************************************************************
  1117. Function: emit_driver
  1118. Description: Output an integer table as a string. Written by
  1119. Raimondas Lencevicius 6/24/98; reorganized by CSA 9-Aug-1999.
  1120. From his original comments:
  1121. yy_nxt[][] values are coded into a string
  1122. by printing integers and representing
  1123. integer sequences as "value:length" pairs.
  1124. **************************************************************/
  1125. private void emit_table_as_string(int[][] ia) {
  1126. int sequenceLength = 0; // RL - length of the number sequence
  1127. boolean sequenceStarted = false; // RL - has number sequence started?
  1128. int previousInt = -20; // RL - Bogus -20 state.
  1129. // RL - Output matrix size
  1130. m_outstream.print(ia.length);
  1131. m_outstream.print(",");
  1132. m_outstream.print(ia.length>0?ia[0].length:0);
  1133. m_outstream.println(",");
  1134. StringBuffer outstr = new StringBuffer();
  1135. // RL - Output matrix
  1136. for (int elem = 0; elem < ia.length; ++elem)
  1137. {
  1138. for (int i = 0; i < ia[elem].length; ++i)
  1139. {
  1140. int writeInt = ia[elem][i];
  1141. if (writeInt == previousInt) // RL - sequence?
  1142. {
  1143. if (sequenceStarted)
  1144. {
  1145. sequenceLength++;
  1146. }
  1147. else
  1148. {
  1149. outstr.append(writeInt);
  1150. outstr.append(":");
  1151. sequenceLength = 2;
  1152. sequenceStarted = true;
  1153. }
  1154. }
  1155. else // RL - no sequence or end sequence
  1156. {
  1157. if (sequenceStarted)
  1158. {
  1159. outstr.append(sequenceLength);
  1160. outstr.append(",");
  1161. sequenceLength = 0;
  1162. sequenceStarted = false;
  1163. }
  1164. else
  1165. {
  1166. if (previousInt != -20)
  1167. {
  1168. outstr.append(previousInt);
  1169. outstr.append(",");
  1170. }
  1171. }
  1172. }
  1173. previousInt = writeInt;
  1174. // CSA: output in 75 character chunks.
  1175. if (outstr.length() > 75) {
  1176. String s = outstr.toString();
  1177. m_outstream.println("\""+s.substring(0,75)+"\" +");
  1178. outstr = new StringBuffer(s.substring(75));
  1179. }
  1180. }
  1181. }
  1182. if (sequenceStarted)
  1183. {
  1184. outstr.append(sequenceLength);
  1185. }
  1186. else
  1187. {
  1188. outstr.append(previousInt);
  1189. }
  1190. // CSA: output in 75 character chunks.
  1191. if (outstr.length() > 75) {
  1192. String s = outstr.toString();
  1193. m_outstream.println("\""+s.substring(0,75)+"\" +");
  1194. outstr = new StringBuffer(s.substring(75));
  1195. }
  1196. m_outstream.print("\""+outstr+"\"");
  1197. }
  1198. /***************************************************************
  1199. Function: emit_driver
  1200. Description:
  1201. **************************************************************/
  1202. private void emit_driver
  1203. (
  1204. )
  1205. throws java.io.IOException
  1206. {
  1207. if (CUtility.DEBUG)
  1208. {
  1209. CUtility._assert(null != m_spec);
  1210. CUtility._assert(null != m_outstream);
  1211. }
  1212. emit_table();
  1213. if (m_spec.m_integer_type)
  1214. {
  1215. m_outstream.print("\tpublic int ");
  1216. m_outstream.print(new String(m_spec.m_function_name));
  1217. m_outstream.println(" ()");
  1218. }
  1219. else if (m_spec.m_intwrap_type)
  1220. {
  1221. m_outstream.print("\tpublic java.lang.Integer ");
  1222. m_outstream.print(new String(m_spec.m_function_name));
  1223. m_outstream.println(" ()");
  1224. }
  1225. else
  1226. {
  1227. m_outstream.print("\tpublic ");
  1228. m_outstream.print(new String(m_spec.m_type_name));
  1229. m_outstream.print(" ");
  1230. m_outstream.print(new String(m_spec.m_function_name));
  1231. m_outstream.println(" ()");
  1232. }
  1233. /*m_outstream.println("\t\tthrows java.io.IOException {");*/
  1234. m_outstream.print("\t\tthrows java.io.IOException");
  1235. if (null != m_spec.m_yylex_throw_code)
  1236. {
  1237. m_outstream.print(", ");
  1238. m_outstream.print(new String(m_spec.m_yylex_throw_code,0,
  1239. m_spec.m_yylex_throw_read));
  1240. m_outstream.println();
  1241. m_outstream.println("\t\t{");
  1242. }
  1243. else
  1244. {
  1245. m_outstream.println(" {");
  1246. }
  1247. m_outstream.println("\t\tint yy_lookahead;");
  1248. m_outstream.println("\t\tint yy_anchor = YY_NO_ANCHOR;");
  1249. /*m_outstream.println("\t\tint yy_state "
  1250. + "= yy_initial_dtrans(yy_lexical_state);");*/
  1251. m_outstream.println("\t\tint yy_state "
  1252. + "= yy_state_dtrans[yy_lexical_state];");
  1253. m_outstream.println("\t\tint yy_next_state = YY_NO_STATE;");
  1254. /*m_outstream.println("\t\tint yy_prev_stave = YY_NO_STATE;");*/
  1255. m_outstream.println("\t\tint yy_last_accept_state = YY_NO_STATE;");
  1256. m_outstream.println("\t\tboolean yy_initial = true;");
  1257. m_outstream.println("\t\tint yy_this_accept;");
  1258. m_outstream.println();
  1259. m_outstream.println("\t\tyy_mark_start();");
  1260. /*m_outstream.println("\t\tyy_this_accept = yy_accept(yy_state);");*/
  1261. m_outstream.println("\t\tyy_this_accept = yy_acpt[yy_state];");
  1262. m_outstream.println("\t\tif (YY_NOT_ACCEPT != yy_this_accept) {");
  1263. m_outstream.println("\t\t\tyy_last_accept_state = yy_state;");
  1264. m_outstream.println("\t\t\tyy_mark_end();");
  1265. m_outstream.println("\t\t}");
  1266. if (NOT_EDBG)
  1267. {
  1268. m_outstream.println("\t\tjava.lang.System.out.println(\"Begin\");");
  1269. }
  1270. m_outstream.println("\t\twhile (true) {");
  1271. m_outstream.println("\t\t\tif (yy_initial && yy_at_bol) "+
  1272. "yy_lookahead = YY_BOL;");
  1273. m_outstream.println("\t\t\telse yy_lookahead = yy_advance();");
  1274. m_outstream.println("\t\t\tyy_next_state = YY_F;");
  1275. /*m_outstream.println("\t\t\t\tyy_next_state = "
  1276. + "yy_next(yy_state,yy_lookahead);");*/
  1277. m_outstream.println("\t\t\tyy_next_state = "
  1278. + "yy_nxt[yy_rmap[yy_state]][yy_cmap[yy_lookahead]];");
  1279. if (NOT_EDBG)
  1280. {
  1281. m_outstream.println("java.lang.System.out.println(\"Current state: \""
  1282. + " + yy_state");
  1283. m_outstream.println("+ \"\tCurrent input: \"");
  1284. m_outstream.println(" + ((char) yy_lookahead));");
  1285. }
  1286. if (NOT_EDBG)
  1287. {
  1288. m_outstream.println("\t\t\tjava.lang.System.out.println(\"State = \""
  1289. + "+ yy_state);");
  1290. m_outstream.println("\t\t\tjava.lang.System.out.println(\"Accepting status = \""
  1291. + "+ yy_this_accept);");
  1292. m_outstream.println("\t\t\tjava.lang.System.out.println(\"Last accepting state = \""
  1293. + "+ yy_last_accept_state);");
  1294. m_outstream.println("\t\t\tjava.lang.System.out.println(\"Next state = \""
  1295. + "+ yy_next_state);");
  1296. m_outstream.println("\t\t\tjava.lang.System.out.println(\"Lookahead input = \""
  1297. + "+ ((char) yy_lookahead));");
  1298. }
  1299. // handle bare EOF.
  1300. m_outstream.println("\t\t\tif (YY_EOF == yy_lookahead "
  1301. + "&& true == yy_initial) {");
  1302. if (null != m_spec.m_eof_code)
  1303. {
  1304. m_outstream.println("\t\t\t\tyy_do_eof();");
  1305. }
  1306. if (true == m_spec.m_integer_type)
  1307. {
  1308. m_outstream.println("\t\t\t\treturn YYEOF;");
  1309. }
  1310. else if (null != m_spec.m_eof_value_code)
  1311. {
  1312. m_outstream.print(new String(m_spec.m_eof_value_code,0,
  1313. m_spec.m_eof_value_read));
  1314. }
  1315. else
  1316. {
  1317. m_outstream.println("\t\t\t\treturn null;");
  1318. }
  1319. m_outstream.println("\t\t\t}");
  1320. m_outstream.println("\t\t\tif (YY_F != yy_next_state) {");
  1321. m_outstream.println("\t\t\t\tyy_state = yy_next_state;");
  1322. m_outstream.println("\t\t\t\tyy_initial = false;");
  1323. /*m_outstream.println("\t\t\t\tyy_this_accept = yy_accept(yy_state);");*/
  1324. m_outstream.println("\t\t\t\tyy_this_accept = yy_acpt[yy_state];");
  1325. m_outstream.println("\t\t\t\tif (YY_NOT_ACCEPT != yy_this_accept) {");
  1326. m_outstream.println("\t\t\t\t\tyy_last_accept_state = yy_state;");
  1327. m_outstream.println("\t\t\t\t\tyy_mark_end();");
  1328. m_outstream.println("\t\t\t\t}");
  1329. /*m_outstream.println("\t\t\t\tyy_prev_state = yy_state;");*/
  1330. /*m_outstream.println("\t\t\t\tyy_state = yy_next_state;");*/
  1331. m_outstream.println("\t\t\t}");
  1332. m_outstream.println("\t\t\telse {");
  1333. m_outstream.println("\t\t\t\tif (YY_NO_STATE == yy_last_accept_state) {");
  1334. /*m_outstream.println("\t\t\t\t\tyy_error(YY_E_MATCH,false);");
  1335. m_outstream.println("\t\t\t\t\tyy_initial = true;");
  1336. m_outstream.println("\t\t\t\t\tyy_state "
  1337. + "= yy_state_dtrans[yy_lexical_state];");
  1338. m_outstream.println("\t\t\t\t\tyy_next_state = YY_NO_STATE;");*/
  1339. /*m_outstream.println("\t\t\t\t\tyy_prev_state = YY_NO_STATE;");*/
  1340. /*m_outstream.println("\t\t\t\t\tyy_last_accept_state = YY_NO_STATE;");
  1341. m_outstream.println("\t\t\t\t\tyy_mark_start();");*/
  1342. /*m_outstream.println("\t\t\t\t\tyy_this_accept = yy_accept(yy_state);");*/
  1343. /*m_outstream.println("\t\t\t\t\tyy_this_accept = yy_acpt[yy_state];");
  1344. m_outstream.println("\t\t\t\t\tif (YY_NOT_ACCEPT != yy_this_accept) {");
  1345. m_outstream.println("\t\t\t\t\t\tyy_last_accept_state = yy_state;");
  1346. m_outstream.println("\t\t\t\t\t}");*/
  1347. m_outstream.println("\t\t\t\t\tthrow (new Error(\"Lexical Error: Unmatched Input.\"));");
  1348. m_outstream.println("\t\t\t\t}");
  1349. m_outstream.println("\t\t\t\telse {");
  1350. m_outstream.println("\t\t\t\t\tyy_anchor = yy_acpt[yy_last_accept_state];");
  1351. /*m_outstream.println("\t\t\t\t\tyy_anchor "
  1352. + "= yy_accept(yy_last_accept_state);");*/
  1353. m_outstream.println("\t\t\t\t\tif (0 != (YY_END & yy_anchor)) {");
  1354. m_outstream.println("\t\t\t\t\t\tyy_move_end();");
  1355. m_outstream.println("\t\t\t\t\t}");
  1356. m_outstream.println("\t\t\t\t\tyy_to_mark();");
  1357. m_outstream.println("\t\t\t\t\tswitch (yy_last_accept_state) {");
  1358. emit_actions("\t\t\t\t\t");
  1359. m_outstream.println("\t\t\t\t\tdefault:");
  1360. m_outstream.println("\t\t\t\t\t\tyy_error(YY_E_INTERNAL,false);");
  1361. /*m_outstream.println("\t\t\t\t\t\treturn null;");*/
  1362. m_outstream.println("\t\t\t\t\tcase -1:");
  1363. m_outstream.println("\t\t\t\t\t}");
  1364. m_outstream.println("\t\t\t\t\tyy_initial = true;");
  1365. m_outstream.println("\t\t\t\t\tyy_state "
  1366. + "= yy_state_dtrans[yy_lexical_state];");
  1367. m_outstream.println("\t\t\t\t\tyy_next_state = YY_NO_STATE;");
  1368. /*m_outstream.println("\t\t\t\t\tyy_prev_state = YY_NO_STATE;");*/
  1369. m_outstream.println("\t\t\t\t\tyy_last_accept_state = YY_NO_STATE;");
  1370. m_outstream.println("\t\t\t\t\tyy_mark_start();");
  1371. /*m_outstream.println("\t\t\t\t\tyy_this_accept = yy_accept(yy_state);");*/
  1372. m_outstream.println("\t\t\t\t\tyy_this_accept = yy_acpt[yy_state];");
  1373. m_outstream.println("\t\t\t\t\tif (YY_NOT_ACCEPT != yy_this_accept) {");
  1374. m_outstream.println("\t\t\t\t\t\tyy_last_accept_state = yy_state;");
  1375. m_outstream.println("\t\t\t\t\t\tyy_mark_end();");
  1376. m_outstream.println("\t\t\t\t\t}");
  1377. m_outstream.println("\t\t\t\t}");
  1378. m_outstream.println("\t\t\t}");
  1379. m_outstream.println("\t\t}");
  1380. m_outstream.println("\t}");
  1381. /*m_outstream.println("\t\t\t\t");
  1382. m_outstream.println("\t\t\t");
  1383. m_outstream.println("\t\t\t");
  1384. m_outstream.println("\t\t\t");
  1385. m_outstream.println("\t\t\t");
  1386. m_outstream.println("\t\t}");*/
  1387. }
  1388. /***************************************************************
  1389. Function: emit_actions
  1390. Description:
  1391. **************************************************************/
  1392. private void emit_actions
  1393. (
  1394. String tabs
  1395. )
  1396. throws java.io.IOException
  1397. {
  1398. int elem;
  1399. int size;
  1400. int bogus_index;
  1401. CAccept accept;
  1402. if (CUtility.DEBUG)
  1403. {
  1404. CUtility._assert(m_spec.m_accept_vector.size()
  1405. == m_spec.m_anchor_array.length);
  1406. }
  1407. bogus_index = -2;
  1408. size = m_spec.m_accept_vector.size();
  1409. for (elem = 0; elem < size; ++elem)
  1410. {
  1411. accept = (CAccept) m_spec.m_accept_vector.elementAt(elem);
  1412. if (null != accept)
  1413. {
  1414. m_outstream.println(tabs + "case " + elem
  1415. + ":");
  1416. m_outstream.print(tabs + "\t");
  1417. m_outstream.print(new String(accept.m_action,0,
  1418. accept.m_action_read));
  1419. m_outstream.println();
  1420. m_outstream.println(tabs + "case " + bogus_index + ":");
  1421. m_outstream.println(tabs + "\tbreak;");
  1422. --bogus_index;
  1423. }
  1424. }
  1425. }
  1426. /***************************************************************
  1427. Function: emit_footer
  1428. Description:
  1429. **************************************************************/
  1430. private void emit_footer
  1431. (
  1432. )
  1433. throws java.io.IOException
  1434. {
  1435. if (CUtility.DEBUG)
  1436. {
  1437. CUtility._assert(null != m_spec);
  1438. CUtility._assert(null != m_outstream);
  1439. }
  1440. m_outstream.println("}");
  1441. }
  1442. }
  1443. /***************************************************************
  1444. Class: CBunch
  1445. **************************************************************/
  1446. class CBunch
  1447. {
  1448. /***************************************************************
  1449. Member Variables
  1450. **************************************************************/
  1451. Vector m_nfa_set; /* Vector of CNfa states in dfa state. */
  1452. SparseBitSet m_nfa_bit; /* BitSet representation of CNfa labels. */
  1453. CAccept m_accept; /* Accepting actions, or null if nonaccepting state. */
  1454. int m_anchor; /* Anchors on regular expression. */
  1455. int m_accept_index; /* CNfa index corresponding to accepting actions. */
  1456. /***************************************************************
  1457. Function: CBunch
  1458. Description: Constructor.
  1459. **************************************************************/
  1460. CBunch
  1461. (
  1462. )
  1463. {
  1464. m_nfa_set = null;
  1465. m_nfa_bit = null;
  1466. m_accept = null;
  1467. m_anchor = CSpec.NONE;
  1468. m_accept_index = -1;
  1469. }
  1470. }
  1471. /***************************************************************
  1472. Class: CMakeNfa
  1473. **************************************************************/
  1474. class CMakeNfa
  1475. {
  1476. /***************************************************************
  1477. Member Variables
  1478. **************************************************************/
  1479. private CSpec m_spec;
  1480. private CLexGen m_lexGen;
  1481. private CInput m_input;
  1482. /***************************************************************
  1483. Function: CMakeNfa
  1484. Description: Constructor.
  1485. **************************************************************/
  1486. CMakeNfa
  1487. (
  1488. )
  1489. {
  1490. reset();
  1491. }
  1492. /***************************************************************
  1493. Function: reset
  1494. Description: Resets CMakeNfa member variables.
  1495. **************************************************************/
  1496. private void reset
  1497. (
  1498. )
  1499. {
  1500. m_input = null;
  1501. m_lexGen = null;
  1502. m_spec = null;
  1503. }
  1504. /***************************************************************
  1505. Function: set
  1506. Description: Sets CMakeNfa member variables.
  1507. **************************************************************/
  1508. private void set
  1509. (
  1510. CLexGen lexGen,
  1511. CSpec spec,
  1512. CInput input
  1513. )
  1514. {
  1515. if (CUtility.DEBUG)
  1516. {
  1517. CUtility._assert(null != input);
  1518. CUtility._assert(null != lexGen);
  1519. CUtility._assert(null != spec);
  1520. }
  1521. m_input = input;
  1522. m_lexGen = lexGen;
  1523. m_spec = spec;
  1524. }
  1525. /***************************************************************
  1526. Function: allocate_BOL_EOF
  1527. Description: Expands character class to include special BOL and
  1528. EOF characters. Puts numeric index of these characters in
  1529. input CSpec.
  1530. **************************************************************/
  1531. void allocate_BOL_EOF
  1532. (
  1533. CSpec spec
  1534. )
  1535. {
  1536. CUtility._assert(CSpec.NUM_PSEUDO==2);
  1537. spec.BOL = spec.m_dtrans_ncols++;
  1538. spec.EOF = spec.m_dtrans_ncols++;
  1539. }
  1540. /***************************************************************
  1541. Function: thompson
  1542. Description: High level access function to module.
  1543. Deposits result in input CSpec.
  1544. **************************************************************/
  1545. void thompson
  1546. (
  1547. CLexGen lexGen,
  1548. CSpec spec,
  1549. CInput input
  1550. )
  1551. throws java.io.IOException
  1552. {
  1553. int i;
  1554. CNfa elem;
  1555. int size;
  1556. /* Set member variables. */
  1557. reset();
  1558. set(lexGen,spec,input);
  1559. size = m_spec.m_states.size();
  1560. m_spec.m_state_rules = new Vector[size];
  1561. for (i = 0; i < size; ++i)
  1562. {
  1563. m_spec.m_state_rules[i] = new Vector();
  1564. }
  1565. /* Initialize current token variable
  1566. and create nfa. */
  1567. /*m_spec.m_current_token = m_lexGen.EOS;
  1568. m_lexGen.advance();*/
  1569. m_spec.m_nfa_start = machine();
  1570. /* Set labels in created nfa machine. */
  1571. size = m_spec.m_nfa_states.size();
  1572. for (i = 0; i < size; ++i)
  1573. {
  1574. elem = (CNfa) m_spec.m_nfa_states.elementAt(i);
  1575. elem.m_label = i;
  1576. }
  1577. /* Debugging output. */
  1578. if (CUtility.DO_DEBUG)
  1579. {
  1580. m_lexGen.print_nfa();
  1581. }
  1582. if (m_spec.m_verbose)
  1583. {
  1584. System.out.println("NFA comprised of "
  1585. + (m_spec.m_nfa_states.size() + 1)
  1586. + " states.");
  1587. }
  1588. reset();
  1589. }
  1590. /***************************************************************
  1591. Function: discardCNfa
  1592. Description:
  1593. **************************************************************/
  1594. private void discardCNfa
  1595. (
  1596. CNfa nfa
  1597. )
  1598. {
  1599. m_spec.m_nfa_states.removeElement(nfa);
  1600. }
  1601. /***************************************************************
  1602. Function: processStates
  1603. Description:
  1604. **************************************************************/
  1605. private void processStates
  1606. (
  1607. SparseBitSet states,
  1608. CNfa current
  1609. )
  1610. {
  1611. int size;
  1612. int i;
  1613. size = m_spec.m_states.size();
  1614. for (i = 0; i < size; ++i)
  1615. {
  1616. if (states.get(i))
  1617. {
  1618. m_spec.m_state_rules[i].addElement(current);
  1619. }
  1620. }
  1621. }
  1622. /***************************************************************
  1623. Function: machine
  1624. Description: Recursive descent regular expression parser.
  1625. **************************************************************/
  1626. private CNfa machine
  1627. (
  1628. )
  1629. throws java.io.IOException
  1630. {
  1631. CNfa start;
  1632. CNfa p;
  1633. SparseBitSet states;
  1634. if (CUtility.DESCENT_DEBUG)
  1635. {
  1636. CUtility.enter("machine",m_spec.m_lexeme,m_spec.m_current_token);
  1637. }
  1638. start = CAlloc.newCNfa(m_spec);
  1639. p = start;
  1640. states = m_lexGen.getStates();
  1641. /* Begin: Added for states. */
  1642. m_spec.m_current_token = m_lexGen.EOS;
  1643. m_lexGen.advance();
  1644. /* End: Added for states. */
  1645. if (m_lexGen.END_OF_INPUT != m_spec.m_current_token) // CSA fix.
  1646. {
  1647. p.m_next = rule();
  1648. processStates(states,p.m_next);
  1649. }
  1650. while (m_lexGen.END_OF_INPUT != m_spec.m_current_token)
  1651. {
  1652. /* Make state changes HERE. */
  1653. states = m_lexGen.getStates();
  1654. /* Begin: Added for states. */
  1655. m_lexGen.advance();
  1656. if (m_lexGen.END_OF_INPUT == m_spec.m_current_token)
  1657. {
  1658. break;
  1659. }
  1660. /* End: Added for states. */
  1661. p.m_next2 = CAlloc.newCNfa(m_spec);
  1662. p = p.m_next2;
  1663. p.m_next = rule();
  1664. processStates(states,p.m_next);
  1665. }
  1666. // CSA: add pseudo-rules for BOL and EOF
  1667. SparseBitSet all_states = new SparseBitSet();
  1668. for (int i = 0; i < m_spec.m_states.size(); ++i)
  1669. all_states.set(i);
  1670. p.m_next2 = CAlloc.newCNfa(m_spec);
  1671. p = p.m_next2;
  1672. p.m_next = CAlloc.newCNfa(m_spec);
  1673. p.m_next.m_edge = CNfa.CCL;
  1674. p.m_next.m_next = CAlloc.newCNfa(m_spec);
  1675. p.m_next.m_set = new CSet();
  1676. p.m_next.m_set.add(m_spec.BOL);
  1677. p.m_next.m_set.add(m_spec.EOF);
  1678. p.m_next.m_next.m_accept = // do-nothing accept rule
  1679. new CAccept(new char[0], 0, m_input.m_line_number+1);
  1680. processStates(all_states,p.m_next);
  1681. // CSA: done.
  1682. if (CUtility.DESCENT_DEBUG)
  1683. {
  1684. CUtility.leave("machine",m_spec.m_lexeme,m_spec.m_current_token);
  1685. }
  1686. return start;
  1687. }
  1688. /***************************************************************
  1689. Function: rule
  1690. Description: Recursive descent regular expression parser.
  1691. **************************************************************/
  1692. private CNfa rule
  1693. (
  1694. )
  1695. throws java.io.IOException
  1696. {
  1697. CNfaPair pair;
  1698. CNfa p;
  1699. CNfa start = null;
  1700. CNfa end = null;
  1701. int anchor = CSpec.NONE;
  1702. if (CUtility.DESCENT_DEBUG)
  1703. {
  1704. CUtility.enter("rule",m_spec.m_lexeme,m_spec.m_current_token);
  1705. }
  1706. pair = CAlloc.newCNfaPair();
  1707. if (m_lexGen.AT_BOL == m_spec.m_current_token)
  1708. {
  1709. anchor = anchor | CSpec.START;
  1710. m_lexGen.advance();
  1711. expr(pair);
  1712. // CSA: fixed beginning-of-line operator. 8-aug-1999
  1713. start = CAlloc.newCNfa(m_spec);
  1714. start.m_edge = m_spec.BOL;
  1715. start.m_next = pair.m_start;
  1716. end = pair.m_end;
  1717. }
  1718. else
  1719. {
  1720. expr(pair);
  1721. start = pair.m_start;
  1722. end = pair.m_end;
  1723. }
  1724. if (m_lexGen.AT_EOL == m_spec.m_current_token)
  1725. {
  1726. m_lexGen.advance();
  1727. // CSA: fixed end-of-line operator. 8-aug-1999
  1728. CNfaPair nlpair = CAlloc.newNLPair(m_spec);
  1729. end.m_next = CAlloc.newCNfa(m_spec);
  1730. end.m_next.m_next = nlpair.m_start;
  1731. end.m_next.m_next2 = CAlloc.newCNfa(m_spec);
  1732. end.m_next.m_next2.m_edge = m_spec.EOF;
  1733. end.m_next.m_next2.m_next = nlpair.m_end;
  1734. end = nlpair.m_end;
  1735. anchor = anchor | CSpec.END;
  1736. }
  1737. /* Check for null rules. Charles Fischer found this bug. [CSA] */
  1738. if (end==null)
  1739. CError.parse_error(CError.E_ZERO, m_input.m_line_number);
  1740. /* Handle end of regular expression. See page 103. */
  1741. end.m_accept = m_lexGen.packAccept();
  1742. end.m_anchor = anchor;
  1743. /* Begin: Removed for states. */
  1744. /*m_lexGen.advance();*/
  1745. /* End: Removed for states. */
  1746. if (CUtility.DESCENT_DEBUG)
  1747. {
  1748. CUtility.leave("rule",m_spec.m_lexeme,m_spec.m_current_token);
  1749. }
  1750. return start;
  1751. }
  1752. /***************************************************************
  1753. Function: expr
  1754. Description: Recursive descent regular expression parser.
  1755. **************************************************************/
  1756. private void expr
  1757. (
  1758. CNfaPair pair
  1759. )
  1760. throws java.io.IOException
  1761. {
  1762. CNfaPair e2_pair;
  1763. CNfa p;
  1764. if (CUtility.DESCENT_DEBUG)
  1765. {
  1766. CUtility.enter("expr",m_spec.m_lexeme,m_spec.m_current_token);
  1767. }
  1768. if (CUtility.DEBUG)
  1769. {
  1770. CUtility._assert(null != pair);
  1771. }
  1772. e2_pair = CAlloc.newCNfaPair();
  1773. cat_expr(pair);
  1774. while (m_lexGen.OR == m_spec.m_current_token)
  1775. {
  1776. m_lexGen.advance();
  1777. cat_expr(e2_pair);
  1778. p = CAlloc.newCNfa(m_spec);
  1779. p.m_next2 = e2_pair.m_start;
  1780. p.m_next = pair.m_start;
  1781. pair.m_start = p;
  1782. p = CAlloc.newCNfa(m_spec);
  1783. pair.m_end.m_next = p;
  1784. e2_pair.m_end.m_next = p;
  1785. pair.m_end = p;
  1786. }
  1787. if (CUtility.DESCENT_DEBUG)
  1788. {
  1789. CUtility.leave("expr",m_spec.m_lexeme,m_spec.m_current_token);
  1790. }
  1791. }
  1792. /***************************************************************
  1793. Function: cat_expr
  1794. Description: Recursive descent regular expression parser.
  1795. **************************************************************/
  1796. private void cat_expr
  1797. (
  1798. CNfaPair pair
  1799. )
  1800. throws java.io.IOException
  1801. {
  1802. CNfaPair e2_pair;
  1803. if (CUtility.DESCENT_DEBUG)
  1804. {
  1805. CUtility.enter("cat_expr",m_spec.m_lexeme,m_spec.m_current_token);
  1806. }
  1807. if (CUtility.DEBUG)
  1808. {
  1809. CUtility._assert(null != pair);
  1810. }
  1811. e2_pair = CAlloc.newCNfaPair();
  1812. if (first_in_cat(m_spec.m_current_token))
  1813. {
  1814. factor(pair);
  1815. }
  1816. while (first_in_cat(m_spec.m_current_token))
  1817. {
  1818. factor(e2_pair);
  1819. /* Destroy */
  1820. pair.m_end.mimic(e2_pair.m_start);
  1821. discardCNfa(e2_pair.m_start);
  1822. pair.m_end = e2_pair.m_end;
  1823. }
  1824. if (CUtility.DESCENT_DEBUG)
  1825. {
  1826. CUtility.leave("cat_expr",m_spec.m_lexeme,m_spec.m_current_token);
  1827. }
  1828. }
  1829. /***************************************************************
  1830. Function: first_in_cat
  1831. Description: Recursive descent regular expression parser.
  1832. **************************************************************/
  1833. private boolean first_in_cat
  1834. (
  1835. int token
  1836. )
  1837. {
  1838. switch (token)
  1839. {
  1840. case CLexGen.CLOSE_PAREN:
  1841. case CLexGen.AT_EOL:
  1842. case CLexGen.OR:
  1843. case CLexGen.EOS:
  1844. return false;
  1845. case CLexGen.CLOSURE:
  1846. case CLexGen.PLUS_CLOSE:
  1847. case CLexGen.OPTIONAL:
  1848. CError.parse_error(CError.E_CLOSE,m_input.m_line_number);
  1849. return false;
  1850. case CLexGen.CCL_END:
  1851. CError.parse_error(CError.E_BRACKET,m_input.m_line_number);
  1852. return false;
  1853. case CLexGen.AT_BOL:
  1854. CError.parse_error(CError.E_BOL,m_input.m_line_number);
  1855. return false;
  1856. default:
  1857. break;
  1858. }
  1859. return true;
  1860. }
  1861. /***************************************************************
  1862. Function: factor
  1863. Description: Recursive descent regular expression parser.
  1864. **************************************************************/
  1865. private void factor
  1866. (
  1867. CNfaPair pair
  1868. )
  1869. throws java.io.IOException
  1870. {
  1871. CNfa start = null;
  1872. CNfa end = null;
  1873. if (CUtility.DESCENT_DEBUG)
  1874. {
  1875. CUtility.enter("factor",m_spec.m_lexeme,m_spec.m_current_token);
  1876. }
  1877. term(pair);
  1878. if (m_lexGen.CLOSURE == m_spec.m_current_token
  1879. || m_lexGen.PLUS_CLOSE == m_spec.m_current_token
  1880. || m_lexGen.OPTIONAL == m_spec.m_current_token)
  1881. {
  1882. start = CAlloc.newCNfa(m_spec);
  1883. end = CAlloc.newCNfa(m_spec);
  1884. start.m_next = pair.m_start;
  1885. pair.m_end.m_next = end;
  1886. if (m_lexGen.CLOSURE == m_spec.m_current_token
  1887. || m_lexGen.OPTIONAL == m_spec.m_current_token)
  1888. {
  1889. start.m_next2 = end;
  1890. }
  1891. if (m_lexGen.CLOSURE == m_spec.m_current_token
  1892. || m_lexGen.PLUS_CLOSE == m_spec.m_current_token)
  1893. {
  1894. pair.m_end.m_next2 = pair.m_start;
  1895. }
  1896. pair.m_start = start;
  1897. pair.m_end = end;
  1898. m_lexGen.advance();
  1899. }
  1900. if (CUtility.DESCENT_DEBUG)
  1901. {
  1902. CUtility.leave("factor",m_spec.m_lexeme,m_spec.m_current_token);
  1903. }
  1904. }
  1905. /***************************************************************
  1906. Function: term
  1907. Description: Recursive descent regular expression parser.
  1908. **************************************************************/
  1909. private void term
  1910. (
  1911. CNfaPair pair
  1912. )
  1913. throws java.io.IOException
  1914. {
  1915. CNfa start;
  1916. boolean isAlphaL;
  1917. int c;
  1918. if (CUtility.DESCENT_DEBUG)
  1919. {
  1920. CUtility.enter("term",m_spec.m_lexeme,m_spec.m_current_token);
  1921. }
  1922. if (m_lexGen.OPEN_PAREN == m_spec.m_current_token)
  1923. {
  1924. m_lexGen.advance();
  1925. expr(pair);
  1926. if (m_lexGen.CLOSE_PAREN == m_spec.m_current_token)
  1927. {
  1928. m_lexGen.advance();
  1929. }
  1930. else
  1931. {
  1932. CError.parse_error(CError.E_SYNTAX,m_input.m_line_number);
  1933. }
  1934. }
  1935. else
  1936. {
  1937. start = CAlloc.newCNfa(m_spec);
  1938. pair.m_start = start;
  1939. start.m_next = CAlloc.newCNfa(m_spec);
  1940. pair.m_end = start.m_next;
  1941. if (m_lexGen.L == m_spec.m_current_token &&
  1942. Character.isLetter(m_spec.m_lexeme))
  1943. {
  1944. isAlphaL = true;
  1945. }
  1946. else
  1947. {
  1948. isAlphaL = false;
  1949. }
  1950. if (false == (m_lexGen.ANY == m_spec.m_current_token
  1951. || m_lexGen.CCL_START == m_spec.m_current_token
  1952. || (m_spec.m_ignorecase && isAlphaL)))
  1953. {
  1954. start.m_edge = m_spec.m_lexeme;
  1955. m_lexGen.advance();
  1956. }
  1957. else
  1958. {
  1959. start.m_edge = CNfa.CCL;
  1960. start.m_set = new CSet();
  1961. /* Match case-insensitive letters using character class. */
  1962. if (m_spec.m_ignorecase && isAlphaL)
  1963. {
  1964. start.m_set.addncase(m_spec.m_lexeme);
  1965. }
  1966. /* Match dot (.) using character class. */
  1967. else if (m_lexGen.ANY == m_spec.m_current_token)
  1968. {
  1969. start.m_set.add('\n');
  1970. start.m_set.add('\r');
  1971. // CSA: exclude BOL and EOF from character classes
  1972. start.m_set.add(m_spec.BOL);
  1973. start.m_set.add(m_spec.EOF);
  1974. start.m_set.complement();
  1975. }
  1976. else
  1977. {
  1978. m_lexGen.advance();
  1979. if (m_lexGen.AT_BOL == m_spec.m_current_token)
  1980. {
  1981. m_lexGen.advance();
  1982. // CSA: exclude BOL and EOF from character classes
  1983. start.m_set.add(m_spec.BOL);
  1984. start.m_set.add(m_spec.EOF);
  1985. start.m_set.complement();
  1986. }
  1987. if (false == (m_lexGen.CCL_END == m_spec.m_current_token))
  1988. {
  1989. dodash(start.m_set);
  1990. }
  1991. /*else
  1992. {
  1993. for (c = 0; c <= ' '; ++c)
  1994. {
  1995. start.m_set.add((byte) c);
  1996. }
  1997. }*/
  1998. }
  1999. m_lexGen.advance();
  2000. }
  2001. }
  2002. if (CUtility.DESCENT_DEBUG)
  2003. {
  2004. CUtility.leave("term",m_spec.m_lexeme,m_spec.m_current_token);
  2005. }
  2006. }
  2007. /***************************************************************
  2008. Function: dodash
  2009. Description: Recursive descent regular expression parser.
  2010. **************************************************************/
  2011. private void dodash
  2012. (
  2013. CSet set
  2014. )
  2015. throws java.io.IOException
  2016. {
  2017. int first = -1;
  2018. if (CUtility.DESCENT_DEBUG)
  2019. {
  2020. CUtility.enter("dodash",m_spec.m_lexeme,m_spec.m_current_token);
  2021. }
  2022. while (m_lexGen.EOS != m_spec.m_current_token
  2023. && m_lexGen.CCL_END != m_spec.m_current_token)
  2024. {
  2025. // DASH loses its special meaning if it is first in class.
  2026. if (m_lexGen.DASH == m_spec.m_current_token && -1 != first)
  2027. {
  2028. m_lexGen.advance();
  2029. // DASH loses its special meaning if it is last in class.
  2030. if (m_spec.m_current_token == m_lexGen.CCL_END)
  2031. {
  2032. // 'first' already in set.
  2033. set.add('-');
  2034. break;
  2035. }
  2036. for ( ; first <= m_spec.m_lexeme; ++first)
  2037. {
  2038. if (m_spec.m_ignorecase)
  2039. set.addncase((char)first);
  2040. else
  2041. set.add(first);
  2042. }
  2043. }
  2044. else
  2045. {
  2046. first = m_spec.m_lexeme;
  2047. if (m_spec.m_ignorecase)
  2048. set.addncase(m_spec.m_lexeme);
  2049. else
  2050. set.add(m_spec.m_lexeme);
  2051. }
  2052. m_lexGen.advance();
  2053. }
  2054. if (CUtility.DESCENT_DEBUG)
  2055. {
  2056. CUtility.leave("dodash",m_spec.m_lexeme,m_spec.m_current_token);
  2057. }
  2058. }
  2059. }
  2060. /**
  2061. * Extract character classes from NFA and simplify.
  2062. * @author C. Scott Ananian 25-Jul-1999
  2063. */
  2064. class CSimplifyNfa
  2065. {
  2066. private int[] ccls; // character class mapping.
  2067. private int original_charset_size; // original charset size
  2068. private int mapped_charset_size; // reduced charset size
  2069. void simplify(CSpec m_spec) {
  2070. computeClasses(m_spec); // initialize fields.
  2071. // now rewrite the NFA using our character class mapping.
  2072. for (Enumeration e=m_spec.m_nfa_states.elements(); e.hasMoreElements(); ) {
  2073. CNfa nfa = (CNfa) e.nextElement();
  2074. if (nfa.m_edge==CNfa.EMPTY || nfa.m_edge==CNfa.EPSILON)
  2075. continue; // no change.
  2076. if (nfa.m_edge==CNfa.CCL) {
  2077. CSet ncset = new CSet();
  2078. ncset.map(nfa.m_set, ccls); // map it.
  2079. nfa.m_set = ncset;
  2080. } else { // single character
  2081. nfa.m_edge = ccls[nfa.m_edge]; // map it.
  2082. }
  2083. }
  2084. // now update m_spec with the mapping.
  2085. m_spec.m_ccls_map = ccls;
  2086. m_spec.m_dtrans_ncols = mapped_charset_size;
  2087. }
  2088. /** Compute minimum set of character classes needed to disambiguate
  2089. * edges. We optimistically assume that every character belongs to
  2090. * a single character class, and then incrementally split classes
  2091. * as we see edges that require discrimination between characters in
  2092. * the class. [CSA, 25-Jul-1999] */
  2093. private void computeClasses(CSpec m_spec) {
  2094. this.original_charset_size = m_spec.m_dtrans_ncols;
  2095. this.ccls = new int[original_charset_size]; // initially all zero.
  2096. int nextcls = 1;
  2097. SparseBitSet clsA = new SparseBitSet(), clsB = new SparseBitSet();
  2098. Hashtable h = new Hashtable();
  2099. System.out.print("Working on character classes.");
  2100. for (Enumeration e=m_spec.m_nfa_states.elements(); e.hasMoreElements(); ) {
  2101. CNfa nfa = (CNfa) e.nextElement();
  2102. if (nfa.m_edge==CNfa.EMPTY || nfa.m_edge==CNfa.EPSILON)
  2103. continue; // no discriminatory information.
  2104. clsA.clearAll(); clsB.clearAll();
  2105. for (int i=0; i<ccls.length; i++)
  2106. if (nfa.m_edge==i || // edge labeled with a character
  2107. nfa.m_edge==CNfa.CCL && nfa.m_set.contains(i)) // set of characters
  2108. clsA.set(ccls[i]);
  2109. else
  2110. clsB.set(ccls[i]);
  2111. // now figure out which character classes we need to split.
  2112. clsA.and(clsB); // split the classes which show up on both sides of edge
  2113. System.out.print(clsA.size()==0?".":":");
  2114. if (clsA.size()==0) continue; // nothing to do.
  2115. // and split them.
  2116. h.clear(); // h will map old to new class name
  2117. for (int i=0; i<ccls.length; i++)
  2118. if (clsA.get(ccls[i])) // a split class
  2119. if (nfa.m_edge==i ||
  2120. nfa.m_edge==CNfa.CCL && nfa.m_set.contains(i)) { // on A side
  2121. Integer split = new Integer(ccls[i]);
  2122. if (!h.containsKey(split))
  2123. h.put(split, new Integer(nextcls++)); // make new class
  2124. ccls[i] = ((Integer)h.get(split)).intValue();
  2125. }
  2126. }
  2127. System.out.println();
  2128. System.out.println("NFA has "+nextcls+" distinct character classes.");
  2129. this.mapped_charset_size = nextcls;
  2130. }
  2131. }
  2132. /***************************************************************
  2133. Class: CMinimize
  2134. **************************************************************/
  2135. class CMinimize
  2136. {
  2137. /***************************************************************
  2138. Member Variables
  2139. **************************************************************/
  2140. CSpec m_spec;
  2141. Vector m_group;
  2142. int m_ingroup[];
  2143. /***************************************************************
  2144. Function: CMinimize
  2145. Description: Constructor.
  2146. **************************************************************/
  2147. CMinimize
  2148. (
  2149. )
  2150. {
  2151. reset();
  2152. }
  2153. /***************************************************************
  2154. Function: reset
  2155. Description: Resets member variables.
  2156. **************************************************************/
  2157. private void reset
  2158. (
  2159. )
  2160. {
  2161. m_spec = null;
  2162. m_group = null;
  2163. m_ingroup = null;
  2164. }
  2165. /***************************************************************
  2166. Function: set
  2167. Description: Sets member variables.
  2168. **************************************************************/
  2169. private void set
  2170. (
  2171. CSpec spec
  2172. )
  2173. {
  2174. if (CUtility.DEBUG)
  2175. {
  2176. CUtility._assert(null != spec);
  2177. }
  2178. m_spec = spec;
  2179. m_group = null;
  2180. m_ingroup = null;
  2181. }
  2182. /***************************************************************
  2183. Function: min_dfa
  2184. Description: High-level access function to module.
  2185. **************************************************************/
  2186. void min_dfa
  2187. (
  2188. CSpec spec
  2189. )
  2190. {
  2191. set(spec);
  2192. /* Remove redundant states. */
  2193. minimize();
  2194. /* Column and row compression.
  2195. Save accept states in auxilary vector. */
  2196. reduce();
  2197. reset();
  2198. }
  2199. /***************************************************************
  2200. Function: col_copy
  2201. Description: Copies source column into destination column.
  2202. **************************************************************/
  2203. private void col_copy
  2204. (
  2205. int dest,
  2206. int src
  2207. )
  2208. {
  2209. int n;
  2210. int i;
  2211. CDTrans dtrans;
  2212. n = m_spec.m_dtrans_vector.size();
  2213. for (i = 0; i < n; ++i)
  2214. {
  2215. dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(i);
  2216. dtrans.m_dtrans[dest] = dtrans.m_dtrans[src];
  2217. }
  2218. }
  2219. /***************************************************************
  2220. Function: trunc_col
  2221. Description: Truncates each column to the 'correct' length.
  2222. **************************************************************/
  2223. private void trunc_col
  2224. (
  2225. )
  2226. {
  2227. int n;
  2228. int i;
  2229. CDTrans dtrans;
  2230. n = m_spec.m_dtrans_vector.size();
  2231. for (i = 0; i < n; ++i)
  2232. {
  2233. int[] ndtrans = new int[m_spec.m_dtrans_ncols];
  2234. dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(i);
  2235. System.arraycopy(dtrans.m_dtrans, 0, ndtrans, 0, ndtrans.length);
  2236. dtrans.m_dtrans = ndtrans;
  2237. }
  2238. }
  2239. /***************************************************************
  2240. Function: row_copy
  2241. Description: Copies source row into destination row.
  2242. **************************************************************/
  2243. private void row_copy
  2244. (
  2245. int dest,
  2246. int src
  2247. )
  2248. {
  2249. CDTrans dtrans;
  2250. dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(src);
  2251. m_spec.m_dtrans_vector.setElementAt(dtrans,dest);
  2252. }
  2253. /***************************************************************
  2254. Function: col_equiv
  2255. Description:
  2256. **************************************************************/
  2257. private boolean col_equiv
  2258. (
  2259. int col1,
  2260. int col2
  2261. )
  2262. {
  2263. int n;
  2264. int i;
  2265. CDTrans dtrans;
  2266. n = m_spec.m_dtrans_vector.size();
  2267. for (i = 0; i < n; ++i)
  2268. {
  2269. dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(i);
  2270. if (dtrans.m_dtrans[col1] != dtrans.m_dtrans[col2])
  2271. {
  2272. return false;
  2273. }
  2274. }
  2275. return true;
  2276. }
  2277. /***************************************************************
  2278. Function: row_equiv
  2279. Description:
  2280. **************************************************************/
  2281. private boolean row_equiv
  2282. (
  2283. int row1,
  2284. int row2
  2285. )
  2286. {
  2287. int i;
  2288. CDTrans dtrans1;
  2289. CDTrans dtrans2;
  2290. dtrans1 = (CDTrans) m_spec.m_dtrans_vector.elementAt(row1);
  2291. dtrans2 = (CDTrans) m_spec.m_dtrans_vector.elementAt(row2);
  2292. for (i = 0; i < m_spec.m_dtrans_ncols; ++i)
  2293. {
  2294. if (dtrans1.m_dtrans[i] != dtrans2.m_dtrans[i])
  2295. {
  2296. return false;
  2297. }
  2298. }
  2299. return true;
  2300. }
  2301. /***************************************************************
  2302. Function: reduce
  2303. Description:
  2304. **************************************************************/
  2305. private void reduce
  2306. (
  2307. )
  2308. {
  2309. int i;
  2310. int j;
  2311. int k;
  2312. int nrows;
  2313. int reduced_ncols;
  2314. int reduced_nrows;
  2315. SparseBitSet set;
  2316. CDTrans dtrans;
  2317. int size;
  2318. set = new SparseBitSet();
  2319. /* Save accept nodes and anchor entries. */
  2320. size = m_spec.m_dtrans_vector.size();
  2321. m_spec.m_anchor_array = new int[size];
  2322. m_spec.m_accept_vector = new Vector();
  2323. for (i = 0; i < size; ++i)
  2324. {
  2325. dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(i);
  2326. m_spec.m_accept_vector.addElement(dtrans.m_accept);
  2327. m_spec.m_anchor_array[i] = dtrans.m_anchor;
  2328. dtrans.m_accept = null;
  2329. }
  2330. /* Allocate column map. */
  2331. m_spec.m_col_map = new int[m_spec.m_dtrans_ncols];
  2332. for (i = 0; i < m_spec.m_dtrans_ncols; ++i)
  2333. {
  2334. m_spec.m_col_map[i] = -1;
  2335. }
  2336. /* Process columns for reduction. */
  2337. for (reduced_ncols = 0; ; ++reduced_ncols)
  2338. {
  2339. if (CUtility.DEBUG)
  2340. {
  2341. for (i = 0; i < reduced_ncols; ++i)
  2342. {
  2343. CUtility._assert(-1 != m_spec.m_col_map[i]);
  2344. }
  2345. }
  2346. for (i = reduced_ncols; i < m_spec.m_dtrans_ncols; ++i)
  2347. {
  2348. if (-1 == m_spec.m_col_map[i])
  2349. {
  2350. break;
  2351. }
  2352. }
  2353. if (i >= m_spec.m_dtrans_ncols)
  2354. {
  2355. break;
  2356. }
  2357. if (CUtility.DEBUG)
  2358. {
  2359. CUtility._assert(false == set.get(i));
  2360. CUtility._assert(-1 == m_spec.m_col_map[i]);
  2361. }
  2362. set.set(i);
  2363. m_spec.m_col_map[i] = reduced_ncols;
  2364. /* UNDONE: Optimize by doing all comparisons in one batch. */
  2365. for (j = i + 1; j < m_spec.m_dtrans_ncols; ++j)
  2366. {
  2367. if (-1 == m_spec.m_col_map[j] && true == col_equiv(i,j))
  2368. {
  2369. m_spec.m_col_map[j] = reduced_ncols;
  2370. }
  2371. }
  2372. }
  2373. /* Reduce columns. */
  2374. k = 0;
  2375. for (i = 0; i < m_spec.m_dtrans_ncols; ++i)
  2376. {
  2377. if (set.get(i))
  2378. {
  2379. ++k;
  2380. set.clear(i);
  2381. j = m_spec.m_col_map[i];
  2382. if (CUtility.DEBUG)
  2383. {
  2384. CUtility._assert(j <= i);
  2385. }
  2386. if (j == i)
  2387. {
  2388. continue;
  2389. }
  2390. col_copy(j,i);
  2391. }
  2392. }
  2393. m_spec.m_dtrans_ncols = reduced_ncols;
  2394. /* truncate m_dtrans at proper length (freeing extra) */
  2395. trunc_col();
  2396. if (CUtility.DEBUG)
  2397. {
  2398. CUtility._assert(k == reduced_ncols);
  2399. }
  2400. /* Allocate row map. */
  2401. nrows = m_spec.m_dtrans_vector.size();
  2402. m_spec.m_row_map = new int[nrows];
  2403. for (i = 0; i < nrows; ++i)
  2404. {
  2405. m_spec.m_row_map[i] = -1;
  2406. }
  2407. /* Process rows to reduce. */
  2408. for (reduced_nrows = 0; ; ++reduced_nrows)
  2409. {
  2410. if (CUtility.DEBUG)
  2411. {
  2412. for (i = 0; i < reduced_nrows; ++i)
  2413. {
  2414. CUtility._assert(-1 != m_spec.m_row_map[i]);
  2415. }
  2416. }
  2417. for (i = reduced_nrows; i < nrows; ++i)
  2418. {
  2419. if (-1 == m_spec.m_row_map[i])
  2420. {
  2421. break;
  2422. }
  2423. }
  2424. if (i >= nrows)
  2425. {
  2426. break;
  2427. }
  2428. if (CUtility.DEBUG)
  2429. {
  2430. CUtility._assert(false == set.get(i));
  2431. CUtility._assert(-1 == m_spec.m_row_map[i]);
  2432. }
  2433. set.set(i);
  2434. m_spec.m_row_map[i] = reduced_nrows;
  2435. /* UNDONE: Optimize by doing all comparisons in one batch. */
  2436. for (j = i + 1; j < nrows; ++j)
  2437. {
  2438. if (-1 == m_spec.m_row_map[j] && true == row_equiv(i,j))
  2439. {
  2440. m_spec.m_row_map[j] = reduced_nrows;
  2441. }
  2442. }
  2443. }
  2444. /* Reduce rows. */
  2445. k = 0;
  2446. for (i = 0; i < nrows; ++i)
  2447. {
  2448. if (set.get(i))
  2449. {
  2450. ++k;
  2451. set.clear(i);
  2452. j = m_spec.m_row_map[i];
  2453. if (CUtility.DEBUG)
  2454. {
  2455. CUtility._assert(j <= i);
  2456. }
  2457. if (j == i)
  2458. {
  2459. continue;
  2460. }
  2461. row_copy(j,i);
  2462. }
  2463. }
  2464. m_spec.m_dtrans_vector.setSize(reduced_nrows);
  2465. if (CUtility.DEBUG)
  2466. {
  2467. /*System.out.println("k = " + k + "\nreduced_nrows = " + reduced_nrows + "");*/
  2468. CUtility._assert(k == reduced_nrows);
  2469. }
  2470. }
  2471. /***************************************************************
  2472. Function: fix_dtrans
  2473. Description: Updates CDTrans table after minimization
  2474. using groups, removing redundant transition table states.
  2475. **************************************************************/
  2476. private void fix_dtrans
  2477. (
  2478. )
  2479. {
  2480. Vector new_vector;
  2481. int i;
  2482. int size;
  2483. Vector dtrans_group;
  2484. CDTrans first;
  2485. int c;
  2486. new_vector = new Vector();
  2487. size = m_spec.m_state_dtrans.length;
  2488. for (i = 0; i < size; ++i)
  2489. {
  2490. if (CDTrans.F != m_spec.m_state_dtrans[i])
  2491. {
  2492. m_spec.m_state_dtrans[i] = m_ingroup[m_spec.m_state_dtrans[i]];
  2493. }
  2494. }
  2495. size = m_group.size();
  2496. for (i = 0; i < size; ++i)
  2497. {
  2498. dtrans_group = (Vector) m_group.elementAt(i);
  2499. first = (CDTrans) dtrans_group.elementAt(0);
  2500. new_vector.addElement(first);
  2501. for (c = 0; c < m_spec.m_dtrans_ncols; ++c)
  2502. {
  2503. if (CDTrans.F != first.m_dtrans[c])
  2504. {
  2505. first.m_dtrans[c] = m_ingroup[first.m_dtrans[c]];
  2506. }
  2507. }
  2508. }
  2509. m_group = null;
  2510. m_spec.m_dtrans_vector = new_vector;
  2511. }
  2512. /***************************************************************
  2513. Function: minimize
  2514. Description: Removes redundant transition table states.
  2515. **************************************************************/
  2516. private void minimize
  2517. (
  2518. )
  2519. {
  2520. Vector dtrans_group;
  2521. Vector new_group;
  2522. int i;
  2523. int j;
  2524. int old_group_count;
  2525. int group_count;
  2526. CDTrans next;
  2527. CDTrans first;
  2528. int goto_first;
  2529. int goto_next;
  2530. int c;
  2531. int group_size;
  2532. boolean added;
  2533. init_groups();
  2534. group_count = m_group.size();
  2535. old_group_count = group_count - 1;
  2536. while (old_group_count != group_count)
  2537. {
  2538. old_group_count = group_count;
  2539. if (CUtility.DEBUG)
  2540. {
  2541. CUtility._assert(m_group.size() == group_count);
  2542. }
  2543. for (i = 0; i < group_count; ++i)
  2544. {
  2545. dtrans_group = (Vector) m_group.elementAt(i);
  2546. group_size = dtrans_group.size();
  2547. if (group_size <= 1)
  2548. {
  2549. continue;
  2550. }
  2551. new_group = new Vector();
  2552. added = false;
  2553. first = (CDTrans) dtrans_group.elementAt(0);
  2554. for (j = 1; j < group_size; ++j)
  2555. {
  2556. next = (CDTrans) dtrans_group.elementAt(j);
  2557. for (c = 0; c < m_spec.m_dtrans_ncols; ++c)
  2558. {
  2559. goto_first = first.m_dtrans[c];
  2560. goto_next = next.m_dtrans[c];
  2561. if (goto_first != goto_next
  2562. && (goto_first == CDTrans.F
  2563. || goto_next == CDTrans.F
  2564. || m_ingroup[goto_next] != m_ingroup[goto_first]))
  2565. {
  2566. if (CUtility.DEBUG)
  2567. {
  2568. CUtility._assert(dtrans_group.elementAt(j) == next);
  2569. }
  2570. dtrans_group.removeElementAt(j);
  2571. --j;
  2572. --group_size;
  2573. new_group.addElement(next);
  2574. if (false == added)
  2575. {
  2576. added = true;
  2577. ++group_count;
  2578. m_group.addElement(new_group);
  2579. }
  2580. m_ingroup[next.m_label] = m_group.size() - 1;
  2581. if (CUtility.DEBUG)
  2582. {
  2583. CUtility._assert(m_group.contains(new_group)
  2584. == true);
  2585. CUtility._assert(m_group.contains(dtrans_group)
  2586. == true);
  2587. CUtility._assert(dtrans_group.contains(first)
  2588. == true);
  2589. CUtility._assert(dtrans_group.contains(next)
  2590. == false);
  2591. CUtility._assert(new_group.contains(first)
  2592. == false);
  2593. CUtility._assert(new_group.contains(next)
  2594. == true);
  2595. CUtility._assert(dtrans_group.size() == group_size);
  2596. CUtility._assert(i == m_ingroup[first.m_label]);
  2597. CUtility._assert((m_group.size() - 1)
  2598. == m_ingroup[next.m_label]);
  2599. }
  2600. break;
  2601. }
  2602. }
  2603. }
  2604. }
  2605. }
  2606. System.out.println(m_group.size() + " states after removal of redundant states.");
  2607. if (m_spec.m_verbose
  2608. && true == CUtility.OLD_DUMP_DEBUG)
  2609. {
  2610. System.out.println();
  2611. System.out.println("States grouped as follows after minimization");
  2612. pgroups();
  2613. }
  2614. fix_dtrans();
  2615. }
  2616. /***************************************************************
  2617. Function: init_groups
  2618. Description:
  2619. **************************************************************/
  2620. private void init_groups
  2621. (
  2622. )
  2623. {
  2624. int i;
  2625. int j;
  2626. int group_count;
  2627. int size;
  2628. CAccept accept;
  2629. CDTrans dtrans;
  2630. Vector dtrans_group;
  2631. CDTrans first;
  2632. boolean group_found;
  2633. m_group = new Vector();
  2634. group_count = 0;
  2635. size = m_spec.m_dtrans_vector.size();
  2636. m_ingroup = new int[size];
  2637. for (i = 0; i < size; ++i)
  2638. {
  2639. group_found = false;
  2640. dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(i);
  2641. if (CUtility.DEBUG)
  2642. {
  2643. CUtility._assert(i == dtrans.m_label);
  2644. CUtility._assert(false == group_found);
  2645. CUtility._assert(group_count == m_group.size());
  2646. }
  2647. for (j = 0; j < group_count; ++j)
  2648. {
  2649. dtrans_group = (Vector) m_group.elementAt(j);
  2650. if (CUtility.DEBUG)
  2651. {
  2652. CUtility._assert(false == group_found);
  2653. CUtility._assert(0 < dtrans_group.size());
  2654. }
  2655. first = (CDTrans) dtrans_group.elementAt(0);
  2656. if (CUtility.SLOW_DEBUG)
  2657. {
  2658. CDTrans check;
  2659. int k;
  2660. int s;
  2661. s = dtrans_group.size();
  2662. CUtility._assert(0 < s);
  2663. for (k = 1; k < s; ++k)
  2664. {
  2665. check = (CDTrans) dtrans_group.elementAt(k);
  2666. CUtility._assert(check.m_accept == first.m_accept);
  2667. }
  2668. }
  2669. if (first.m_accept == dtrans.m_accept)
  2670. {
  2671. dtrans_group.addElement(dtrans);
  2672. m_ingroup[i] = j;
  2673. group_found = true;
  2674. if (CUtility.DEBUG)
  2675. {
  2676. CUtility._assert(j == m_ingroup[dtrans.m_label]);
  2677. }
  2678. break;
  2679. }
  2680. }
  2681. if (false == group_found)
  2682. {
  2683. dtrans_group = new Vector();
  2684. dtrans_group.addElement(dtrans);
  2685. m_ingroup[i] = m_group.size();
  2686. m_group.addElement(dtrans_group);
  2687. ++group_count;
  2688. }
  2689. }
  2690. if (m_spec.m_verbose
  2691. && true == CUtility.OLD_DUMP_DEBUG)
  2692. {
  2693. System.out.println("Initial grouping:");
  2694. pgroups();
  2695. System.out.println();
  2696. }
  2697. }
  2698. /***************************************************************
  2699. Function: pset
  2700. **************************************************************/
  2701. private void pset
  2702. (
  2703. Vector dtrans_group
  2704. )
  2705. {
  2706. int i;
  2707. int size;
  2708. CDTrans dtrans;
  2709. size = dtrans_group.size();
  2710. for (i = 0; i < size; ++i)
  2711. {
  2712. dtrans = (CDTrans) dtrans_group.elementAt(i);
  2713. System.out.print(dtrans.m_label + " ");
  2714. }
  2715. }
  2716. /***************************************************************
  2717. Function: pgroups
  2718. **************************************************************/
  2719. private void pgroups
  2720. (
  2721. )
  2722. {
  2723. int i;
  2724. int dtrans_size;
  2725. int group_size;
  2726. group_size = m_group.size();
  2727. for (i = 0; i < group_size; ++i)
  2728. {
  2729. System.out.print("\tGroup " + i + " {");
  2730. pset((Vector) m_group.elementAt(i));
  2731. System.out.println("}");
  2732. System.out.println();
  2733. }
  2734. System.out.println();
  2735. dtrans_size = m_spec.m_dtrans_vector.size();
  2736. for (i = 0; i < dtrans_size; ++i)
  2737. {
  2738. System.out.println("\tstate " + i
  2739. + " is in group "
  2740. + m_ingroup[i]);
  2741. }
  2742. }
  2743. }
  2744. /***************************************************************
  2745. Class: CNfa2Dfa
  2746. **************************************************************/
  2747. class CNfa2Dfa
  2748. {
  2749. /***************************************************************
  2750. Member Variables
  2751. **************************************************************/
  2752. private CSpec m_spec;
  2753. private int m_unmarked_dfa;
  2754. private CLexGen m_lexGen;
  2755. /***************************************************************
  2756. Constants
  2757. **************************************************************/
  2758. private static final int NOT_IN_DSTATES = -1;
  2759. /***************************************************************
  2760. Function: CNfa2Dfa
  2761. **************************************************************/
  2762. CNfa2Dfa
  2763. (
  2764. )
  2765. {
  2766. reset();
  2767. }
  2768. /***************************************************************
  2769. Function: set
  2770. Description:
  2771. **************************************************************/
  2772. private void set
  2773. (
  2774. CLexGen lexGen,
  2775. CSpec spec
  2776. )
  2777. {
  2778. m_lexGen = lexGen;
  2779. m_spec = spec;
  2780. m_unmarked_dfa = 0;
  2781. }
  2782. /***************************************************************
  2783. Function: reset
  2784. Description:
  2785. **************************************************************/
  2786. private void reset
  2787. (
  2788. )
  2789. {
  2790. m_lexGen = null;
  2791. m_spec = null;
  2792. m_unmarked_dfa = 0;
  2793. }
  2794. /***************************************************************
  2795. Function: make_dfa
  2796. Description: High-level access function to module.
  2797. **************************************************************/
  2798. void make_dfa
  2799. (
  2800. CLexGen lexGen,
  2801. CSpec spec
  2802. )
  2803. {
  2804. int i;
  2805. reset();
  2806. set(lexGen,spec);
  2807. make_dtrans();
  2808. free_nfa_states();
  2809. if (m_spec.m_verbose && true == CUtility.OLD_DUMP_DEBUG)
  2810. {
  2811. System.out.println(m_spec.m_dfa_states.size()
  2812. + " DFA states in original machine.");
  2813. }
  2814. free_dfa_states();
  2815. }
  2816. /***************************************************************
  2817. Function: make_dtrans
  2818. Description: Creates uncompressed CDTrans transition table.
  2819. **************************************************************/
  2820. private void make_dtrans
  2821. (
  2822. )
  2823. /* throws java.lang.CloneNotSupportedException*/
  2824. {
  2825. CDfa next;
  2826. CDfa dfa;
  2827. CBunch bunch;
  2828. int i;
  2829. int nextstate;
  2830. int size;
  2831. CDTrans dtrans;
  2832. CNfa nfa;
  2833. int istate;
  2834. int nstates;
  2835. System.out.print("Working on DFA states.");
  2836. /* Reference passing type and initializations. */
  2837. bunch = new CBunch();
  2838. m_unmarked_dfa = 0;
  2839. /* Allocate mapping array. */
  2840. nstates = m_spec.m_state_rules.length;
  2841. m_spec.m_state_dtrans = new int[nstates];
  2842. for (istate = 0; nstates > istate; ++istate)
  2843. {
  2844. /* CSA bugfix: if we skip all zero size rules, then
  2845. an specification with no rules produces an illegal
  2846. lexer (0 states) instead of a lexer that rejects
  2847. everything (1 nonaccepting state). [27-Jul-1999]
  2848. if (0 == m_spec.m_state_rules[istate].size())
  2849. {
  2850. m_spec.m_state_dtrans[istate] = CDTrans.F;
  2851. continue;
  2852. }
  2853. */
  2854. /* Create start state and initialize fields. */
  2855. bunch.m_nfa_set = (Vector) m_spec.m_state_rules[istate].clone();
  2856. sortStates(bunch.m_nfa_set);
  2857. bunch.m_nfa_bit = new SparseBitSet();
  2858. /* Initialize bit set. */
  2859. size = bunch.m_nfa_set.size();
  2860. for (i = 0; size > i; ++i)
  2861. {
  2862. nfa = (CNfa) bunch.m_nfa_set.elementAt(i);
  2863. bunch.m_nfa_bit.set(nfa.m_label);
  2864. }
  2865. bunch.m_accept = null;
  2866. bunch.m_anchor = CSpec.NONE;
  2867. bunch.m_accept_index = CUtility.INT_MAX;
  2868. e_closure(bunch);
  2869. add_to_dstates(bunch);
  2870. m_spec.m_state_dtrans[istate] = m_spec.m_dtrans_vector.size();
  2871. /* Main loop of CDTrans creation. */
  2872. while (null != (dfa = get_unmarked()))
  2873. {
  2874. System.out.print(".");
  2875. System.out.flush();
  2876. if (CUtility.DEBUG)
  2877. {
  2878. CUtility._assert(false == dfa.m_mark);
  2879. }
  2880. /* Get first unmarked node, then mark it. */
  2881. dfa.m_mark = true;
  2882. /* Allocate new CDTrans, then initialize fields. */
  2883. dtrans = new CDTrans(m_spec.m_dtrans_vector.size(),m_spec);
  2884. dtrans.m_accept = dfa.m_accept;
  2885. dtrans.m_anchor = dfa.m_anchor;
  2886. /* Set CDTrans array for each character transition. */
  2887. for (i = 0; i < m_spec.m_dtrans_ncols; ++i)
  2888. {
  2889. if (CUtility.DEBUG)
  2890. {
  2891. CUtility._assert(0 <= i);
  2892. CUtility._assert(m_spec.m_dtrans_ncols > i);
  2893. }
  2894. /* Create new dfa set by attempting character transition. */
  2895. move(dfa.m_nfa_set,dfa.m_nfa_bit,i,bunch);
  2896. if (null != bunch.m_nfa_set)
  2897. {
  2898. e_closure(bunch);
  2899. }
  2900. if (CUtility.DEBUG)
  2901. {
  2902. CUtility._assert((null == bunch.m_nfa_set
  2903. && null == bunch.m_nfa_bit)
  2904. || (null != bunch.m_nfa_set
  2905. && null != bunch.m_nfa_bit));
  2906. }
  2907. /* Create new state or set state to empty. */
  2908. if (null == bunch.m_nfa_set)
  2909. {
  2910. nextstate = CDTrans.F;
  2911. }
  2912. else
  2913. {
  2914. nextstate = in_dstates(bunch);
  2915. if (NOT_IN_DSTATES == nextstate)
  2916. {
  2917. nextstate = add_to_dstates(bunch);
  2918. }
  2919. }
  2920. if (CUtility.DEBUG)
  2921. {
  2922. CUtility._assert(nextstate < m_spec.m_dfa_states.size());
  2923. }
  2924. dtrans.m_dtrans[i] = nextstate;
  2925. }
  2926. if (CUtility.DEBUG)
  2927. {
  2928. CUtility._assert(m_spec.m_dtrans_vector.size() == dfa.m_label);
  2929. }
  2930. m_spec.m_dtrans_vector.addElement(dtrans);
  2931. }
  2932. }
  2933. System.out.println();
  2934. }
  2935. /***************************************************************
  2936. Function: free_dfa_states
  2937. **************************************************************/
  2938. private void free_dfa_states
  2939. (
  2940. )
  2941. {
  2942. m_spec.m_dfa_states = null;
  2943. m_spec.m_dfa_sets = null;
  2944. }
  2945. /***************************************************************
  2946. Function: free_nfa_states
  2947. **************************************************************/
  2948. private void free_nfa_states
  2949. (
  2950. )
  2951. {
  2952. /* UNDONE: Remove references to nfas from within dfas. */
  2953. /* UNDONE: Don't free CAccepts. */
  2954. m_spec.m_nfa_states = null;
  2955. m_spec.m_nfa_start = null;
  2956. m_spec.m_state_rules = null;
  2957. }
  2958. /***************************************************************
  2959. Function: e_closure
  2960. Description: Alters and returns input set.
  2961. **************************************************************/
  2962. private void e_closure
  2963. (
  2964. CBunch bunch
  2965. )
  2966. {
  2967. Stack nfa_stack;
  2968. int size;
  2969. int i;
  2970. CNfa state;
  2971. /* Debug checks. */
  2972. if (CUtility.DEBUG)
  2973. {
  2974. CUtility._assert(null != bunch);
  2975. CUtility._assert(null != bunch.m_nfa_set);
  2976. CUtility._assert(null != bunch.m_nfa_bit);
  2977. }
  2978. bunch.m_accept = null;
  2979. bunch.m_anchor = CSpec.NONE;
  2980. bunch.m_accept_index = CUtility.INT_MAX;
  2981. /* Create initial stack. */
  2982. nfa_stack = new Stack();
  2983. size = bunch.m_nfa_set.size();
  2984. for (i = 0; i < size; ++i)
  2985. {
  2986. state = (CNfa) bunch.m_nfa_set.elementAt(i);
  2987. if (CUtility.DEBUG)
  2988. {
  2989. CUtility._assert(bunch.m_nfa_bit.get(state.m_label));
  2990. }
  2991. nfa_stack.push(state);
  2992. }
  2993. /* Main loop. */
  2994. while (false == nfa_stack.empty())
  2995. {
  2996. state = (CNfa) nfa_stack.pop();
  2997. if (CUtility.OLD_DUMP_DEBUG)
  2998. {
  2999. if (null != state.m_accept)
  3000. {
  3001. System.out.println("Looking at accepting state " + state.m_label
  3002. + " with <"
  3003. + (new String(state.m_accept.m_action,0,
  3004. state.m_accept.m_action_read))
  3005. + ">");
  3006. }
  3007. }
  3008. if (null != state.m_accept
  3009. && state.m_label < bunch.m_accept_index)
  3010. {
  3011. bunch.m_accept_index = state.m_label;
  3012. bunch.m_accept = state.m_accept;
  3013. bunch.m_anchor = state.m_anchor;
  3014. if (CUtility.OLD_DUMP_DEBUG)
  3015. {
  3016. System.out.println("Found accepting state " + state.m_label
  3017. + " with <"
  3018. + (new String(state.m_accept.m_action,0,
  3019. state.m_accept.m_action_read))
  3020. + ">");
  3021. }
  3022. if (CUtility.DEBUG)
  3023. {
  3024. CUtility._assert(null != bunch.m_accept);
  3025. CUtility._assert(CSpec.NONE == bunch.m_anchor
  3026. || 0 != (bunch.m_anchor & CSpec.END)
  3027. || 0 != (bunch.m_anchor & CSpec.START));
  3028. }
  3029. }
  3030. if (CNfa.EPSILON == state.m_edge)
  3031. {
  3032. if (null != state.m_next)
  3033. {
  3034. if (false == bunch.m_nfa_set.contains(state.m_next))
  3035. {
  3036. if (CUtility.DEBUG)
  3037. {
  3038. CUtility._assert(false == bunch.m_nfa_bit.get(state.m_next.m_label));
  3039. }
  3040. bunch.m_nfa_bit.set(state.m_next.m_label);
  3041. bunch.m_nfa_set.addElement(state.m_next);
  3042. nfa_stack.push(state.m_next);
  3043. }
  3044. }
  3045. if (null != state.m_next2)
  3046. {
  3047. if (false == bunch.m_nfa_set.contains(state.m_next2))
  3048. {
  3049. if (CUtility.DEBUG)
  3050. {
  3051. CUtility._assert(false == bunch.m_nfa_bit.get(state.m_next2.m_label));
  3052. }
  3053. bunch.m_nfa_bit.set(state.m_next2.m_label);
  3054. bunch.m_nfa_set.addElement(state.m_next2);
  3055. nfa_stack.push(state.m_next2);
  3056. }
  3057. }
  3058. }
  3059. }
  3060. if (null != bunch.m_nfa_set)
  3061. {
  3062. sortStates(bunch.m_nfa_set);
  3063. }
  3064. return;
  3065. }
  3066. /***************************************************************
  3067. Function: move
  3068. Description: Returns null if resulting NFA set is empty.
  3069. **************************************************************/
  3070. void move
  3071. (
  3072. Vector nfa_set,
  3073. SparseBitSet nfa_bit,
  3074. int b,
  3075. CBunch bunch
  3076. )
  3077. {
  3078. int size;
  3079. int index;
  3080. CNfa state;
  3081. bunch.m_nfa_set = null;
  3082. bunch.m_nfa_bit = null;
  3083. size = nfa_set.size();
  3084. for (index = 0; index < size; ++index)
  3085. {
  3086. state = (CNfa) nfa_set.elementAt(index);
  3087. if (b == state.m_edge
  3088. || (CNfa.CCL == state.m_edge
  3089. && true == state.m_set.contains(b)))
  3090. {
  3091. if (null == bunch.m_nfa_set)
  3092. {
  3093. if (CUtility.DEBUG)
  3094. {
  3095. CUtility._assert(null == bunch.m_nfa_bit);
  3096. }
  3097. bunch.m_nfa_set = new Vector();
  3098. /*bunch.m_nfa_bit
  3099. = new SparseBitSet(m_spec.m_nfa_states.size());*/
  3100. bunch.m_nfa_bit = new SparseBitSet();
  3101. }
  3102. bunch.m_nfa_set.addElement(state.m_next);
  3103. /*System.out.println("Size of bitset: " + bunch.m_nfa_bit.size());
  3104. System.out.println("Reference index: " + state.m_next.m_label);
  3105. System.out.flush();*/
  3106. bunch.m_nfa_bit.set(state.m_next.m_label);
  3107. }
  3108. }
  3109. if (null != bunch.m_nfa_set)
  3110. {
  3111. if (CUtility.DEBUG)
  3112. {
  3113. CUtility._assert(null != bunch.m_nfa_bit);
  3114. }
  3115. sortStates(bunch.m_nfa_set);
  3116. }
  3117. return;
  3118. }
  3119. /***************************************************************
  3120. Function: sortStates
  3121. **************************************************************/
  3122. private void sortStates
  3123. (
  3124. Vector nfa_set
  3125. )
  3126. {
  3127. CNfa elem;
  3128. int begin;
  3129. int size;
  3130. int index;
  3131. int value;
  3132. int smallest_index;
  3133. int smallest_value;
  3134. CNfa begin_elem;
  3135. size = nfa_set.size();
  3136. for (begin = 0; begin < size; ++begin)
  3137. {
  3138. elem = (CNfa) nfa_set.elementAt(begin);
  3139. smallest_value = elem.m_label;
  3140. smallest_index = begin;
  3141. for (index = begin + 1; index < size; ++index)
  3142. {
  3143. elem = (CNfa) nfa_set.elementAt(index);
  3144. value = elem.m_label;
  3145. if (value < smallest_value)
  3146. {
  3147. smallest_index = index;
  3148. smallest_value = value;
  3149. }
  3150. }
  3151. begin_elem = (CNfa) nfa_set.elementAt(begin);
  3152. elem = (CNfa) nfa_set.elementAt(smallest_index);
  3153. nfa_set.setElementAt(elem,begin);
  3154. nfa_set.setElementAt(begin_elem,smallest_index);
  3155. }
  3156. if (CUtility.OLD_DEBUG)
  3157. {
  3158. System.out.print("NFA vector indices: ");
  3159. for (index = 0; index < size; ++index)
  3160. {
  3161. elem = (CNfa) nfa_set.elementAt(index);
  3162. System.out.print(elem.m_label + " ");
  3163. }
  3164. System.out.println();
  3165. }
  3166. return;
  3167. }
  3168. /***************************************************************
  3169. Function: get_unmarked
  3170. Description: Returns next unmarked DFA state.
  3171. **************************************************************/
  3172. private CDfa get_unmarked
  3173. (
  3174. )
  3175. {
  3176. int size;
  3177. CDfa dfa;
  3178. size = m_spec.m_dfa_states.size();
  3179. while (m_unmarked_dfa < size)
  3180. {
  3181. dfa = (CDfa) m_spec.m_dfa_states.elementAt(m_unmarked_dfa);
  3182. if (false == dfa.m_mark)
  3183. {
  3184. if (CUtility.OLD_DUMP_DEBUG)
  3185. {
  3186. System.out.print("*");
  3187. System.out.flush();
  3188. }
  3189. if (m_spec.m_verbose && true == CUtility.OLD_DUMP_DEBUG)
  3190. {
  3191. System.out.println("---------------");
  3192. System.out.print("working on DFA state "
  3193. + m_unmarked_dfa
  3194. + " = NFA states: ");
  3195. m_lexGen.print_set(dfa.m_nfa_set);
  3196. System.out.println();
  3197. }
  3198. return dfa;
  3199. }
  3200. ++m_unmarked_dfa;
  3201. }
  3202. return null;
  3203. }
  3204. /***************************************************************
  3205. function: add_to_dstates
  3206. Description: Takes as input a CBunch with details of
  3207. a dfa state that needs to be created.
  3208. 1) Allocates a new dfa state and saves it in
  3209. the appropriate CSpec vector.
  3210. 2) Initializes the fields of the dfa state
  3211. with the information in the CBunch.
  3212. 3) Returns index of new dfa.
  3213. **************************************************************/
  3214. private int add_to_dstates
  3215. (
  3216. CBunch bunch
  3217. )
  3218. {
  3219. CDfa dfa;
  3220. if (CUtility.DEBUG)
  3221. {
  3222. CUtility._assert(null != bunch.m_nfa_set);
  3223. CUtility._assert(null != bunch.m_nfa_bit);
  3224. CUtility._assert(null != bunch.m_accept
  3225. || CSpec.NONE == bunch.m_anchor);
  3226. }
  3227. /* Allocate, passing CSpec so dfa label can be set. */
  3228. dfa = CAlloc.newCDfa(m_spec);
  3229. /* Initialize fields, including the mark field. */
  3230. dfa.m_nfa_set = (Vector) bunch.m_nfa_set.clone();
  3231. dfa.m_nfa_bit = (SparseBitSet) bunch.m_nfa_bit.clone();
  3232. dfa.m_accept = bunch.m_accept;
  3233. dfa.m_anchor = bunch.m_anchor;
  3234. dfa.m_mark = false;
  3235. /* Register dfa state using BitSet in CSpec Hashtable. */
  3236. m_spec.m_dfa_sets.put(dfa.m_nfa_bit,dfa);
  3237. /*registerCDfa(dfa);*/
  3238. if (CUtility.OLD_DUMP_DEBUG)
  3239. {
  3240. System.out.print("Registering set : ");
  3241. m_lexGen.print_set(dfa.m_nfa_set);
  3242. System.out.println();
  3243. }
  3244. return dfa.m_label;
  3245. }
  3246. /***************************************************************
  3247. Function: in_dstates
  3248. **************************************************************/
  3249. private int in_dstates
  3250. (
  3251. CBunch bunch
  3252. )
  3253. {
  3254. CDfa dfa;
  3255. if (CUtility.OLD_DEBUG)
  3256. {
  3257. System.out.print("Looking for set : ");
  3258. m_lexGen.print_set(bunch.m_nfa_set);
  3259. }
  3260. dfa = (CDfa) m_spec.m_dfa_sets.get(bunch.m_nfa_bit);
  3261. if (null != dfa)
  3262. {
  3263. if (CUtility.OLD_DUMP_DEBUG)
  3264. {
  3265. System.out.println(" FOUND!");
  3266. }
  3267. return dfa.m_label;
  3268. }
  3269. if (CUtility.OLD_DUMP_DEBUG)
  3270. {
  3271. System.out.println(" NOT FOUND!");
  3272. }
  3273. return NOT_IN_DSTATES;
  3274. }
  3275. }
  3276. /***************************************************************
  3277. Class: CAlloc
  3278. **************************************************************/
  3279. class CAlloc
  3280. {
  3281. /***************************************************************
  3282. Function: newCDfa
  3283. **************************************************************/
  3284. static CDfa newCDfa
  3285. (
  3286. CSpec spec
  3287. )
  3288. {
  3289. CDfa dfa;
  3290. dfa = new CDfa(spec.m_dfa_states.size());
  3291. spec.m_dfa_states.addElement(dfa);
  3292. return dfa;
  3293. }
  3294. /***************************************************************
  3295. Function: newCNfaPair
  3296. Description:
  3297. **************************************************************/
  3298. static CNfaPair newCNfaPair
  3299. (
  3300. )
  3301. {
  3302. CNfaPair pair = new CNfaPair();
  3303. return pair;
  3304. }
  3305. /***************************************************************
  3306. Function: newNLPair
  3307. Description: return a new CNfaPair that matches a new
  3308. line: (\r\n?|[\n\uu2028\uu2029])
  3309. Added by CSA 8-Aug-1999, updated 10-Aug-1999
  3310. **************************************************************/
  3311. static CNfaPair newNLPair(CSpec spec) {
  3312. CNfaPair pair = newCNfaPair();
  3313. pair.m_end=newCNfa(spec); // newline accepting state
  3314. pair.m_start=newCNfa(spec); // new state with two epsilon edges
  3315. pair.m_start.m_next = newCNfa(spec);
  3316. pair.m_start.m_next.m_edge = CNfa.CCL;
  3317. pair.m_start.m_next.m_set = new CSet();
  3318. pair.m_start.m_next.m_set.add('\n');
  3319. if (spec.m_dtrans_ncols-CSpec.NUM_PSEUDO > 2029) {
  3320. pair.m_start.m_next.m_set.add(2028); /*U+2028 is LS, the line separator*/
  3321. pair.m_start.m_next.m_set.add(2029); /*U+2029 is PS, the paragraph sep.*/
  3322. }
  3323. pair.m_start.m_next.m_next = pair.m_end; // accept '\n', U+2028, or U+2029
  3324. pair.m_start.m_next2 = newCNfa(spec);
  3325. pair.m_start.m_next2.m_edge = '\r';
  3326. pair.m_start.m_next2.m_next = newCNfa(spec);
  3327. pair.m_start.m_next2.m_next.m_next = pair.m_end; // accept '\r';
  3328. pair.m_start.m_next2.m_next.m_next2 = newCNfa(spec);
  3329. pair.m_start.m_next2.m_next.m_next2.m_edge = '\n';
  3330. pair.m_start.m_next2.m_next.m_next2.m_next = pair.m_end; // accept '\r\n';
  3331. return pair;
  3332. }
  3333. /***************************************************************
  3334. Function: newCNfa
  3335. Description:
  3336. **************************************************************/
  3337. static CNfa newCNfa
  3338. (
  3339. CSpec spec
  3340. )
  3341. {
  3342. CNfa p;
  3343. /* UNDONE: Buffer this? */
  3344. p = new CNfa();
  3345. /*p.m_label = spec.m_nfa_states.size();*/
  3346. spec.m_nfa_states.addElement(p);
  3347. p.m_edge = CNfa.EPSILON;
  3348. return p;
  3349. }
  3350. }
  3351. /***************************************************************
  3352. Class: Main
  3353. Description: Top-level lexical analyzer generator function.
  3354. **************************************************************/
  3355. public class Main
  3356. {
  3357. /***************************************************************
  3358. Function: main
  3359. **************************************************************/
  3360. public static boolean staticFlag = false;
  3361. public static void printUsage() {
  3362. System.out.println("Usage: com.sun.jlex.internal.Main [-static] <filename>");
  3363. System.exit(1);
  3364. }
  3365. public static void main
  3366. (
  3367. String arg[]
  3368. )
  3369. throws java.io.IOException
  3370. {
  3371. int i;
  3372. CLexGen lg;
  3373. // Parse options starting with '-'
  3374. for (i = 0; i < arg.length && arg[i].charAt(0) == '-'; i++) {
  3375. if (arg[i].equals("-static")) {
  3376. staticFlag = true;
  3377. }
  3378. else {
  3379. printUsage();
  3380. }
  3381. }
  3382. // Enough arguments left ?
  3383. if (arg.length - i < 1) {
  3384. printUsage();
  3385. }
  3386. /* Note: For debuging, it may be helpful to remove the try/catch
  3387. block and permit the Exception to propagate to the top level.
  3388. This gives more information. */
  3389. try
  3390. {
  3391. lg = new CLexGen(arg[i]);
  3392. lg.generate();
  3393. }
  3394. catch (Error e)
  3395. {
  3396. System.out.println(e.getMessage());
  3397. }
  3398. }
  3399. }
  3400. /***************************************************************
  3401. Class: CDTrans
  3402. **************************************************************/
  3403. class CDTrans
  3404. {
  3405. /*************************************************************
  3406. Member Variables
  3407. ***********************************************************/
  3408. int m_dtrans[];
  3409. CAccept m_accept;
  3410. int m_anchor;
  3411. int m_label;
  3412. /*************************************************************
  3413. Constants
  3414. ***********************************************************/
  3415. static final int F = -1;
  3416. /*************************************************************
  3417. Function: CTrans
  3418. ***********************************************************/
  3419. CDTrans
  3420. (
  3421. int label,
  3422. CSpec spec
  3423. )
  3424. {
  3425. m_dtrans = new int[spec.m_dtrans_ncols];
  3426. m_accept = null;
  3427. m_anchor = CSpec.NONE;
  3428. m_label = label;
  3429. }
  3430. }
  3431. /***************************************************************
  3432. Class: CDfa
  3433. **************************************************************/
  3434. class CDfa
  3435. {
  3436. /***************************************************************
  3437. Member Variables
  3438. ***********************************************************/
  3439. int m_group;
  3440. boolean m_mark;
  3441. CAccept m_accept;
  3442. int m_anchor;
  3443. Vector m_nfa_set;
  3444. SparseBitSet m_nfa_bit;
  3445. int m_label;
  3446. /***************************************************************
  3447. Function: CDfa
  3448. **************************************************************/
  3449. CDfa
  3450. (
  3451. int label
  3452. )
  3453. {
  3454. m_group = 0;
  3455. m_mark = false;
  3456. m_accept = null;
  3457. m_anchor = CSpec.NONE;
  3458. m_nfa_set = null;
  3459. m_nfa_bit = null;
  3460. m_label = label;
  3461. }
  3462. }
  3463. /***************************************************************
  3464. Class: CAccept
  3465. **************************************************************/
  3466. class CAccept
  3467. {
  3468. /***************************************************************
  3469. Member Variables
  3470. **************************************************************/
  3471. char m_action[];
  3472. int m_action_read;
  3473. int m_line_number;
  3474. /***************************************************************
  3475. Function: CAccept
  3476. **************************************************************/
  3477. CAccept
  3478. (
  3479. char action[],
  3480. int action_read,
  3481. int line_number
  3482. )
  3483. {
  3484. int elem;
  3485. m_action_read = action_read;
  3486. m_action = new char[m_action_read];
  3487. for (elem = 0; elem < m_action_read; ++elem)
  3488. {
  3489. m_action[elem] = action[elem];
  3490. }
  3491. m_line_number = line_number;
  3492. }
  3493. /***************************************************************
  3494. Function: CAccept
  3495. **************************************************************/
  3496. CAccept
  3497. (
  3498. CAccept accept
  3499. )
  3500. {
  3501. int elem;
  3502. m_action_read = accept.m_action_read;
  3503. m_action = new char[m_action_read];
  3504. for (elem = 0; elem < m_action_read; ++elem)
  3505. {
  3506. m_action[elem] = accept.m_action[elem];
  3507. }
  3508. m_line_number = accept.m_line_number;
  3509. }
  3510. /***************************************************************
  3511. Function: mimic
  3512. **************************************************************/
  3513. void mimic
  3514. (
  3515. CAccept accept
  3516. )
  3517. {
  3518. int elem;
  3519. m_action_read = accept.m_action_read;
  3520. m_action = new char[m_action_read];
  3521. for (elem = 0; elem < m_action_read; ++elem)
  3522. {
  3523. m_action[elem] = accept.m_action[elem];
  3524. }
  3525. }
  3526. }
  3527. /***************************************************************
  3528. Class: CAcceptAnchor
  3529. **************************************************************/
  3530. class CAcceptAnchor
  3531. {
  3532. /***************************************************************
  3533. Member Variables
  3534. **************************************************************/
  3535. CAccept m_accept;
  3536. int m_anchor;
  3537. /***************************************************************
  3538. Function: CAcceptAnchor
  3539. **************************************************************/
  3540. CAcceptAnchor
  3541. (
  3542. )
  3543. {
  3544. m_accept = null;
  3545. m_anchor = CSpec.NONE;
  3546. }
  3547. }
  3548. /***************************************************************
  3549. Class: CNfaPair
  3550. **************************************************************/
  3551. class CNfaPair
  3552. {
  3553. /***************************************************************
  3554. Member Variables
  3555. **************************************************************/
  3556. CNfa m_start;
  3557. CNfa m_end;
  3558. /***************************************************************
  3559. Function: CNfaPair
  3560. **************************************************************/
  3561. CNfaPair
  3562. (
  3563. )
  3564. {
  3565. m_start = null;
  3566. m_end = null;
  3567. }
  3568. }
  3569. /***************************************************************
  3570. Class: CInput
  3571. Description:
  3572. **************************************************************/
  3573. class CInput
  3574. {
  3575. /***************************************************************
  3576. Member Variables
  3577. **************************************************************/
  3578. private java.io.BufferedReader m_input; /* JLex specification file. */
  3579. boolean m_eof_reached; /* Whether EOF has been encountered. */
  3580. boolean m_pushback_line;
  3581. char m_line[]; /* Line buffer. */
  3582. int m_line_read; /* Number of bytes read into line buffer. */
  3583. int m_line_index; /* Current index into line buffer. */
  3584. int m_line_number; /* Current line number. */
  3585. /***************************************************************
  3586. Constants
  3587. **************************************************************/
  3588. static final boolean EOF = true;
  3589. static final boolean NOT_EOF = false;
  3590. /***************************************************************
  3591. Function: CInput
  3592. Description:
  3593. **************************************************************/
  3594. CInput
  3595. (
  3596. java.io.Reader input
  3597. )
  3598. {
  3599. if (CUtility.DEBUG)
  3600. {
  3601. CUtility._assert(null != input);
  3602. }
  3603. /* Initialize input stream. */
  3604. m_input = new java.io.BufferedReader(input);
  3605. /* Initialize buffers and index counters. */
  3606. m_line = null;
  3607. m_line_read = 0;
  3608. m_line_index = 0;
  3609. /* Initialize state variables. */
  3610. m_eof_reached = false;
  3611. m_line_number = 0;
  3612. m_pushback_line = false;
  3613. }
  3614. /***************************************************************
  3615. Function: getLine
  3616. Description: Returns true on EOF, false otherwise.
  3617. Guarantees not to return a blank line, or a line
  3618. of zero length.
  3619. **************************************************************/
  3620. boolean getLine
  3621. (
  3622. )
  3623. throws java.io.IOException
  3624. {
  3625. String lineStr;
  3626. int elem;
  3627. /* Has EOF already been reached? */
  3628. if (m_eof_reached)
  3629. {
  3630. return EOF;
  3631. }
  3632. /* Pushback current line? */
  3633. if (m_pushback_line)
  3634. {
  3635. m_pushback_line = false;
  3636. /* Check for empty line. */
  3637. for (elem = 0; elem < m_line_read; ++elem)
  3638. {
  3639. if (false == CUtility.isspace(m_line[elem]))
  3640. {
  3641. break;
  3642. }
  3643. }
  3644. /* Nonempty? */
  3645. if (elem < m_line_read)
  3646. {
  3647. m_line_index = 0;
  3648. return NOT_EOF;
  3649. }
  3650. }
  3651. while (true)
  3652. {
  3653. if (null == (lineStr = m_input.readLine()))
  3654. {
  3655. m_eof_reached = true;
  3656. m_line_index = 0;
  3657. return EOF;
  3658. }
  3659. m_line = (lineStr + "\n").toCharArray();
  3660. m_line_read=m_line.length;
  3661. ++m_line_number;
  3662. /* Check for empty lines and discard them. */
  3663. elem = 0;
  3664. while (CUtility.isspace(m_line[elem]))
  3665. {
  3666. ++elem;
  3667. if (elem == m_line_read)
  3668. {
  3669. break;
  3670. }
  3671. }
  3672. if (elem < m_line_read)
  3673. {
  3674. break;
  3675. }
  3676. }
  3677. m_line_index = 0;
  3678. return NOT_EOF;
  3679. }
  3680. }
  3681. /********************************************************
  3682. Class: Utility
  3683. *******************************************************/
  3684. class CUtility
  3685. {
  3686. /********************************************************
  3687. Constants
  3688. *******************************************************/
  3689. static final boolean DEBUG = true;
  3690. static final boolean SLOW_DEBUG = true;
  3691. static final boolean DUMP_DEBUG = true;
  3692. /*static final boolean DEBUG = false;
  3693. static final boolean SLOW_DEBUG = false;
  3694. static final boolean DUMP_DEBUG = false;*/
  3695. static final boolean DESCENT_DEBUG = false;
  3696. static final boolean OLD_DEBUG = false;
  3697. static final boolean OLD_DUMP_DEBUG = false;
  3698. static final boolean FOODEBUG = false;
  3699. static final boolean DO_DEBUG = false;
  3700. /********************************************************
  3701. Constants: Integer Bounds
  3702. *******************************************************/
  3703. static final int INT_MAX = 2147483647;
  3704. static final int MAX_SEVEN_BIT = 127;
  3705. static final int MAX_EIGHT_BIT = 255;
  3706. static final int MAX_SIXTEEN_BIT=65535;
  3707. /********************************************************
  3708. Function: enter
  3709. Description: Debugging routine.
  3710. *******************************************************/
  3711. static void enter
  3712. (
  3713. String descent,
  3714. char lexeme,
  3715. int token
  3716. )
  3717. {
  3718. System.out.println("Entering " + descent
  3719. + " [lexeme: " + lexeme
  3720. + "] [token: " + token + "]");
  3721. }
  3722. /********************************************************
  3723. Function: leave
  3724. Description: Debugging routine.
  3725. *******************************************************/
  3726. static void leave
  3727. (
  3728. String descent,
  3729. char lexeme,
  3730. int token
  3731. )
  3732. {
  3733. System.out.println("Leaving " + descent
  3734. + " [lexeme:" + lexeme
  3735. + "] [token:" + token + "]");
  3736. }
  3737. /********************************************************
  3738. Function: assert
  3739. Description: Debugging routine.
  3740. *******************************************************/
  3741. static void _assert
  3742. (
  3743. boolean expr
  3744. )
  3745. {
  3746. if (DEBUG && false == expr)
  3747. {
  3748. System.out.println("Assertion Failed");
  3749. throw new Error("Assertion Failed.");
  3750. }
  3751. }
  3752. /***************************************************************
  3753. Function: doubleSize
  3754. **************************************************************/
  3755. static char[] doubleSize
  3756. (
  3757. char oldBuffer[]
  3758. )
  3759. {
  3760. char newBuffer[] = new char[2 * oldBuffer.length];
  3761. int elem;
  3762. for (elem = 0; elem < oldBuffer.length; ++elem)
  3763. {
  3764. newBuffer[elem] = oldBuffer[elem];
  3765. }
  3766. return newBuffer;
  3767. }
  3768. /***************************************************************
  3769. Function: doubleSize
  3770. **************************************************************/
  3771. static byte[] doubleSize
  3772. (
  3773. byte oldBuffer[]
  3774. )
  3775. {
  3776. byte newBuffer[] = new byte[2 * oldBuffer.length];
  3777. int elem;
  3778. for (elem = 0; elem < oldBuffer.length; ++elem)
  3779. {
  3780. newBuffer[elem] = oldBuffer[elem];
  3781. }
  3782. return newBuffer;
  3783. }
  3784. /********************************************************
  3785. Function: hex2bin
  3786. *******************************************************/
  3787. static char hex2bin
  3788. (
  3789. char c
  3790. )
  3791. {
  3792. if ('0' <= c && '9' >= c)
  3793. {
  3794. return (char) (c - '0');
  3795. }
  3796. else if ('a' <= c && 'f' >= c)
  3797. {
  3798. return (char) (c - 'a' + 10);
  3799. }
  3800. else if ('A' <= c && 'F' >= c)
  3801. {
  3802. return (char) (c - 'A' + 10);
  3803. }
  3804. CError.impos("Bad hexidecimal digit" + c);
  3805. return 0;
  3806. }
  3807. /********************************************************
  3808. Function: ishexdigit
  3809. *******************************************************/
  3810. static boolean ishexdigit
  3811. (
  3812. char c
  3813. )
  3814. {
  3815. if (('0' <= c && '9' >= c)
  3816. || ('a' <= c && 'f' >= c)
  3817. || ('A' <= c && 'F' >= c))
  3818. {
  3819. return true;
  3820. }
  3821. return false;
  3822. }
  3823. /********************************************************
  3824. Function: oct2bin
  3825. *******************************************************/
  3826. static char oct2bin
  3827. (
  3828. char c
  3829. )
  3830. {
  3831. if ('0' <= c && '7' >= c)
  3832. {
  3833. return (char) (c - '0');
  3834. }
  3835. CError.impos("Bad octal digit " + c);
  3836. return 0;
  3837. }
  3838. /********************************************************
  3839. Function: isoctdigit
  3840. *******************************************************/
  3841. static boolean isoctdigit
  3842. (
  3843. char c
  3844. )
  3845. {
  3846. if ('0' <= c && '7' >= c)
  3847. {
  3848. return true;
  3849. }
  3850. return false;
  3851. }
  3852. /********************************************************
  3853. Function: isspace
  3854. *******************************************************/
  3855. static boolean isspace
  3856. (
  3857. char c
  3858. )
  3859. {
  3860. if ('\b' == c
  3861. || '\t' == c
  3862. || '\n' == c
  3863. || '\f' == c
  3864. || '\r' == c
  3865. || ' ' == c)
  3866. {
  3867. return true;
  3868. }
  3869. return false;
  3870. }
  3871. /********************************************************
  3872. Function: isnewline
  3873. *******************************************************/
  3874. static boolean isnewline
  3875. (
  3876. char c
  3877. )
  3878. {
  3879. if ('\n' == c
  3880. || '\r' == c)
  3881. {
  3882. return true;
  3883. }
  3884. return false;
  3885. }
  3886. /********************************************************
  3887. Function: bytencmp
  3888. Description: Compares up to n elements of
  3889. byte array a[] against byte array b[].
  3890. The first byte comparison is made between
  3891. a[a_first] and b[b_first]. Comparisons continue
  3892. until the null terminating byte '\0' is reached
  3893. or until n bytes are compared.
  3894. Return Value: Returns 0 if arrays are the
  3895. same up to and including the null terminating byte
  3896. or up to and including the first n bytes,
  3897. whichever comes first.
  3898. *******************************************************/
  3899. static int bytencmp
  3900. (
  3901. byte a[],
  3902. int a_first,
  3903. byte b[],
  3904. int b_first,
  3905. int n
  3906. )
  3907. {
  3908. int elem;
  3909. for (elem = 0; elem < n; ++elem)
  3910. {
  3911. /*System.out.print((char) a[a_first + elem]);
  3912. System.out.print((char) b[b_first + elem]);*/
  3913. if ('\0' == a[a_first + elem] && '\0' == b[b_first + elem])
  3914. {
  3915. /*System.out.println("return 0");*/
  3916. return 0;
  3917. }
  3918. if (a[a_first + elem] < b[b_first + elem])
  3919. {
  3920. /*System.out.println("return 1");*/
  3921. return 1;
  3922. }
  3923. else if (a[a_first + elem] > b[b_first + elem])
  3924. {
  3925. /*System.out.println("return -1");*/
  3926. return -1;
  3927. }
  3928. }
  3929. /*System.out.println("return 0");*/
  3930. return 0;
  3931. }
  3932. /********************************************************
  3933. Function: charncmp
  3934. *******************************************************/
  3935. static int charncmp
  3936. (
  3937. char a[],
  3938. int a_first,
  3939. char b[],
  3940. int b_first,
  3941. int n
  3942. )
  3943. {
  3944. int elem;
  3945. for (elem = 0; elem < n; ++elem)
  3946. {
  3947. if ('\0' == a[a_first + elem] && '\0' == b[b_first + elem])
  3948. {
  3949. return 0;
  3950. }
  3951. if (a[a_first + elem] < b[b_first + elem])
  3952. {
  3953. return 1;
  3954. }
  3955. else if (a[a_first + elem] > b[b_first + elem])
  3956. {
  3957. return -1;
  3958. }
  3959. }
  3960. return 0;
  3961. }
  3962. }
  3963. /********************************************************
  3964. Class: CError
  3965. *******************************************************/
  3966. class CError
  3967. {
  3968. /********************************************************
  3969. Function: impos
  3970. Description:
  3971. *******************************************************/
  3972. static void impos
  3973. (
  3974. String message
  3975. )
  3976. {
  3977. System.out.println("JLex Error: " + message);
  3978. }
  3979. /********************************************************
  3980. Constants
  3981. Description: Error codes for parse_error().
  3982. *******************************************************/
  3983. static final int E_BADEXPR = 0;
  3984. static final int E_PAREN = 1;
  3985. static final int E_LENGTH = 2;
  3986. static final int E_BRACKET = 3;
  3987. static final int E_BOL = 4;
  3988. static final int E_CLOSE = 5;
  3989. static final int E_NEWLINE = 6;
  3990. static final int E_BADMAC = 7;
  3991. static final int E_NOMAC = 8;
  3992. static final int E_MACDEPTH = 9;
  3993. static final int E_INIT = 10;
  3994. static final int E_EOF = 11;
  3995. static final int E_DIRECT = 12;
  3996. static final int E_INTERNAL = 13;
  3997. static final int E_STATE = 14;
  3998. static final int E_MACDEF = 15;
  3999. static final int E_SYNTAX = 16;
  4000. static final int E_BRACE = 17;
  4001. static final int E_DASH = 18;
  4002. static final int E_ZERO = 19;
  4003. static final int E_BADCTRL = 20;
  4004. /********************************************************
  4005. Constants
  4006. Description: String messages for parse_error();
  4007. *******************************************************/
  4008. static final String errmsg[] =
  4009. {
  4010. "Malformed regular expression.",
  4011. "Missing close parenthesis.",
  4012. "Too many regular expressions or expression too long.",
  4013. "Missing [ in character class.",
  4014. "^ must be at start of expression or after [.",
  4015. "+ ? or * must follow an expression or subexpression.",
  4016. "Newline in quoted string.",
  4017. "Missing } in macro expansion.",
  4018. "Macro does not exist.",
  4019. "Macro expansions nested too deeply.",
  4020. "JLex has not been successfully initialized.",
  4021. "Unexpected end-of-file found.",
  4022. "Undefined or badly-formed JLex directive.",
  4023. "Internal JLex error.",
  4024. "Unitialized state name.",
  4025. "Badly formed macro definition.",
  4026. "Syntax error.",
  4027. "Missing brace at start of lexical action.",
  4028. "Special character dash - in character class [...] must\n"
  4029. + "\tbe preceded by start-of-range character.",
  4030. "Zero-length regular expression.",
  4031. "Illegal \\^C-style escape sequence (character following caret must\n"
  4032. + "\tbe alphabetic).",
  4033. };
  4034. /********************************************************
  4035. Function: parse_error
  4036. Description:
  4037. *******************************************************/
  4038. static void parse_error
  4039. (
  4040. int error_code,
  4041. int line_number
  4042. )
  4043. {
  4044. System.out.println("Error: Parse error at line "
  4045. + line_number + ".");
  4046. System.out.println("Description: " + errmsg[error_code]);
  4047. throw new Error("Parse error.");
  4048. }
  4049. }
  4050. /********************************************************
  4051. Class: CSet
  4052. *******************************************************/
  4053. class CSet
  4054. {
  4055. /********************************************************
  4056. Member Variables
  4057. *******************************************************/
  4058. private SparseBitSet m_set;
  4059. private boolean m_complement;
  4060. /********************************************************
  4061. Function: CSet
  4062. *******************************************************/
  4063. CSet
  4064. (
  4065. )
  4066. {
  4067. m_set = new SparseBitSet();
  4068. m_complement = false;
  4069. }
  4070. /********************************************************
  4071. Function: complement
  4072. *******************************************************/
  4073. void complement
  4074. (
  4075. )
  4076. {
  4077. m_complement = true;
  4078. }
  4079. /********************************************************
  4080. Function: add
  4081. *******************************************************/
  4082. void add
  4083. (
  4084. int i
  4085. )
  4086. {
  4087. m_set.set(i);
  4088. }
  4089. /********************************************************
  4090. Function: addncase
  4091. *******************************************************/
  4092. void addncase // add, ignoring case.
  4093. (
  4094. char c
  4095. )
  4096. {
  4097. /* Do this in a Unicode-friendly way. */
  4098. /* (note that duplicate adds have no effect) */
  4099. add(c);
  4100. add(Character.toLowerCase(c));
  4101. add(Character.toTitleCase(c));
  4102. add(Character.toUpperCase(c));
  4103. }
  4104. /********************************************************
  4105. Function: contains
  4106. *******************************************************/
  4107. boolean contains
  4108. (
  4109. int i
  4110. )
  4111. {
  4112. boolean result;
  4113. result = m_set.get(i);
  4114. if (m_complement)
  4115. {
  4116. return (false == result);
  4117. }
  4118. return result;
  4119. }
  4120. /********************************************************
  4121. Function: mimic
  4122. *******************************************************/
  4123. void mimic
  4124. (
  4125. CSet set
  4126. )
  4127. {
  4128. m_complement = set.m_complement;
  4129. m_set = (SparseBitSet) set.m_set.clone();
  4130. }
  4131. /** Map set using character classes [CSA] */
  4132. void map(CSet set, int[] mapping) {
  4133. m_complement = set.m_complement;
  4134. m_set.clearAll();
  4135. for (Enumeration e=set.m_set.elements(); e.hasMoreElements(); ) {
  4136. int old_value =((Integer)e.nextElement()).intValue();
  4137. if (old_value<mapping.length) // skip unmapped characters
  4138. m_set.set(mapping[old_value]);
  4139. }
  4140. }
  4141. }
  4142. /********************************************************
  4143. Class: CNfa
  4144. *******************************************************/
  4145. class CNfa
  4146. {
  4147. /********************************************************
  4148. Member Variables
  4149. *******************************************************/
  4150. int m_edge; /* Label for edge type:
  4151. character code,
  4152. CCL (character class),
  4153. [STATE,
  4154. SCL (state class),]
  4155. EMPTY,
  4156. EPSILON. */
  4157. CSet m_set; /* Set to store character classes. */
  4158. CNfa m_next; /* Next state (or null if none). */
  4159. CNfa m_next2; /* Another state with type == EPSILON
  4160. and null if not used.
  4161. The NFA construction should result in two
  4162. outgoing edges only if both are EPSILON edges. */
  4163. CAccept m_accept; /* Set to null if nonaccepting state. */
  4164. int m_anchor; /* Says if and where pattern is anchored. */
  4165. int m_label;
  4166. SparseBitSet m_states;
  4167. /********************************************************
  4168. Constants
  4169. *******************************************************/
  4170. static final int NO_LABEL = -1;
  4171. /********************************************************
  4172. Constants: Edge Types
  4173. Note: Edge transitions on one specific character
  4174. are labelled with the character Ascii (Unicode)
  4175. codes. So none of the constants below should
  4176. overlap with the natural character codes.
  4177. *******************************************************/
  4178. static final int CCL = -1;
  4179. static final int EMPTY = -2;
  4180. static final int EPSILON = -3;
  4181. /********************************************************
  4182. Function: CNfa
  4183. *******************************************************/
  4184. CNfa
  4185. (
  4186. )
  4187. {
  4188. m_edge = EMPTY;
  4189. m_set = null;
  4190. m_next = null;
  4191. m_next2 = null;
  4192. m_accept = null;
  4193. m_anchor = CSpec.NONE;
  4194. m_label = NO_LABEL;
  4195. m_states = null;
  4196. }
  4197. /********************************************************
  4198. Function: mimic
  4199. Description: Converts this NFA state into a copy of
  4200. the input one.
  4201. *******************************************************/
  4202. void mimic
  4203. (
  4204. CNfa nfa
  4205. )
  4206. {
  4207. m_edge = nfa.m_edge;
  4208. if (null != nfa.m_set)
  4209. {
  4210. if (null == m_set)
  4211. {
  4212. m_set = new CSet();
  4213. }
  4214. m_set.mimic(nfa.m_set);
  4215. }
  4216. else
  4217. {
  4218. m_set = null;
  4219. }
  4220. m_next = nfa.m_next;
  4221. m_next2 = nfa.m_next2;
  4222. m_accept = nfa.m_accept;
  4223. m_anchor = nfa.m_anchor;
  4224. if (null != nfa.m_states)
  4225. {
  4226. m_states = (SparseBitSet) nfa.m_states.clone();
  4227. }
  4228. else
  4229. {
  4230. m_states = null;
  4231. }
  4232. }
  4233. }
  4234. /***************************************************************
  4235. Class: CLexGen
  4236. **************************************************************/
  4237. class CLexGen
  4238. {
  4239. /***************************************************************
  4240. Member Variables
  4241. **************************************************************/
  4242. private java.io.Reader m_instream; /* JLex specification file. */
  4243. private java.io.PrintWriter m_outstream; /* Lexical analyzer source file. */
  4244. private CInput m_input; /* Input buffer class. */
  4245. private Hashtable m_tokens; /* Hashtable that maps characters to their
  4246. corresponding lexical code for
  4247. the internal lexical analyzer. */
  4248. private CSpec m_spec; /* Spec class holds information
  4249. about the generated lexer. */
  4250. private boolean m_init_flag; /* Flag set to true only upon
  4251. successful initialization. */
  4252. private CMakeNfa m_makeNfa; /* NFA machine generator module. */
  4253. private CNfa2Dfa m_nfa2dfa; /* NFA to DFA machine (transition table)
  4254. conversion module. */
  4255. private CMinimize m_minimize; /* Transition table compressor. */
  4256. private CSimplifyNfa m_simplifyNfa; /* NFA simplifier using char classes */
  4257. private CEmit m_emit; /* Output module that emits source code
  4258. into the generated lexer file. */
  4259. /********************************************************
  4260. Constants
  4261. *******************************************************/
  4262. private static final boolean ERROR = false;
  4263. private static final boolean NOT_ERROR = true;
  4264. private static final int BUFFER_SIZE = 1024;
  4265. /********************************************************
  4266. Constants: Token Types
  4267. *******************************************************/
  4268. static final int EOS = 1;
  4269. static final int ANY = 2;
  4270. static final int AT_BOL = 3;
  4271. static final int AT_EOL = 4;
  4272. static final int CCL_END = 5;
  4273. static final int CCL_START = 6;
  4274. static final int CLOSE_CURLY = 7;
  4275. static final int CLOSE_PAREN = 8;
  4276. static final int CLOSURE = 9;
  4277. static final int DASH = 10;
  4278. static final int END_OF_INPUT = 11;
  4279. static final int L = 12;
  4280. static final int OPEN_CURLY = 13;
  4281. static final int OPEN_PAREN = 14;
  4282. static final int OPTIONAL = 15;
  4283. static final int OR = 16;
  4284. static final int PLUS_CLOSE = 17;
  4285. /***************************************************************
  4286. Function: CLexGen
  4287. **************************************************************/
  4288. CLexGen
  4289. (
  4290. String filename
  4291. )
  4292. throws java.io.FileNotFoundException, java.io.IOException
  4293. {
  4294. /* Successful initialization flag. */
  4295. m_init_flag = false;
  4296. /* Open input stream. */
  4297. m_instream = new java.io.FileReader(filename);
  4298. if (null == m_instream)
  4299. {
  4300. System.out.println("Error: Unable to open input file "
  4301. + filename + ".");
  4302. return;
  4303. }
  4304. /* Open output stream. */
  4305. m_outstream
  4306. = new java.io.PrintWriter(new java.io.BufferedWriter(
  4307. new java.io.FileWriter(filename + ".java")));
  4308. if (null == m_outstream)
  4309. {
  4310. System.out.println("Error: Unable to open output file "
  4311. + filename + ".java.");
  4312. return;
  4313. }
  4314. /* Create input buffer class. */
  4315. m_input = new CInput(m_instream);
  4316. /* Initialize character hash table. */
  4317. m_tokens = new Hashtable();
  4318. m_tokens.put(new Character('$'),new Integer(AT_EOL));
  4319. m_tokens.put(new Character('('),new Integer(OPEN_PAREN));
  4320. m_tokens.put(new Character(')'),new Integer(CLOSE_PAREN));
  4321. m_tokens.put(new Character('*'),new Integer(CLOSURE));
  4322. m_tokens.put(new Character('+'),new Integer(PLUS_CLOSE));
  4323. m_tokens.put(new Character('-'),new Integer(DASH));
  4324. m_tokens.put(new Character('.'),new Integer(ANY));
  4325. m_tokens.put(new Character('?'),new Integer(OPTIONAL));
  4326. m_tokens.put(new Character('['),new Integer(CCL_START));
  4327. m_tokens.put(new Character(']'),new Integer(CCL_END));
  4328. m_tokens.put(new Character('^'),new Integer(AT_BOL));
  4329. m_tokens.put(new Character('{'),new Integer(OPEN_CURLY));
  4330. m_tokens.put(new Character('|'),new Integer(OR));
  4331. m_tokens.put(new Character('}'),new Integer(CLOSE_CURLY));
  4332. /* Initialize spec structure. */
  4333. m_spec = new CSpec(this);
  4334. /* Nfa to dfa converter. */
  4335. m_nfa2dfa = new CNfa2Dfa();
  4336. m_minimize = new CMinimize();
  4337. m_makeNfa = new CMakeNfa();
  4338. m_simplifyNfa = new CSimplifyNfa();
  4339. m_emit = new CEmit();
  4340. /* Successful initialization flag. */
  4341. m_init_flag = true;
  4342. }
  4343. /***************************************************************
  4344. Function: generate
  4345. Description:
  4346. **************************************************************/
  4347. void generate
  4348. (
  4349. )
  4350. throws java.io.IOException, java.io.FileNotFoundException
  4351. {
  4352. if (false == m_init_flag)
  4353. {
  4354. CError.parse_error(CError.E_INIT,0);
  4355. }
  4356. if (CUtility.DEBUG)
  4357. {
  4358. CUtility._assert(null != this);
  4359. CUtility._assert(null != m_outstream);
  4360. CUtility._assert(null != m_input);
  4361. CUtility._assert(null != m_tokens);
  4362. CUtility._assert(null != m_spec);
  4363. CUtility._assert(m_init_flag);
  4364. }
  4365. /*m_emit.emit_imports(m_spec,m_outstream);*/
  4366. if (m_spec.m_verbose)
  4367. {
  4368. System.out.println("Processing first section -- user code.");
  4369. }
  4370. userCode();
  4371. if (m_input.m_eof_reached)
  4372. {
  4373. CError.parse_error(CError.E_EOF,m_input.m_line_number);
  4374. }
  4375. if (m_spec.m_verbose)
  4376. {
  4377. System.out.println("Processing second section -- "
  4378. + "JLex declarations.");
  4379. }
  4380. userDeclare();
  4381. if (m_input.m_eof_reached)
  4382. {
  4383. CError.parse_error(CError.E_EOF,m_input.m_line_number);
  4384. }
  4385. if (m_spec.m_verbose)
  4386. {
  4387. System.out.println("Processing third section -- lexical rules.");
  4388. }
  4389. userRules();
  4390. if (CUtility.DO_DEBUG)
  4391. {
  4392. print_header();
  4393. }
  4394. if (m_spec.m_verbose)
  4395. {
  4396. System.out.println("Outputting lexical analyzer code.");
  4397. }
  4398. m_emit.emit(m_spec,m_outstream);
  4399. if (m_spec.m_verbose && true == CUtility.OLD_DUMP_DEBUG)
  4400. {
  4401. details();
  4402. }
  4403. m_outstream.close();
  4404. }
  4405. /***************************************************************
  4406. Function: userCode
  4407. Description: Process first section of specification,
  4408. echoing it into output file.
  4409. **************************************************************/
  4410. private void userCode
  4411. (
  4412. )
  4413. throws java.io.IOException
  4414. {
  4415. int count = 0;
  4416. if (false == m_init_flag)
  4417. {
  4418. CError.parse_error(CError.E_INIT,0);
  4419. }
  4420. if (CUtility.DEBUG)
  4421. {
  4422. CUtility._assert(null != this);
  4423. CUtility._assert(null != m_outstream);
  4424. CUtility._assert(null != m_input);
  4425. CUtility._assert(null != m_tokens);
  4426. CUtility._assert(null != m_spec);
  4427. }
  4428. if (m_input.m_eof_reached)
  4429. {
  4430. CError.parse_error(CError.E_EOF,0);
  4431. }
  4432. while (true)
  4433. {
  4434. if (m_input.getLine())
  4435. {
  4436. /* Eof reached. */
  4437. CError.parse_error(CError.E_EOF,0);
  4438. }
  4439. if (2 <= m_input.m_line_read
  4440. && '%' == m_input.m_line[0]
  4441. && '%' == m_input.m_line[1])
  4442. {
  4443. /* Discard remainder of line. */
  4444. m_input.m_line_index = m_input.m_line_read;
  4445. return;
  4446. }
  4447. m_outstream.print(new String(m_input.m_line,0,
  4448. m_input.m_line_read));
  4449. }
  4450. }
  4451. /***************************************************************
  4452. Function: getName
  4453. **************************************************************/
  4454. private char[] getName
  4455. (
  4456. )
  4457. {
  4458. char buffer[];
  4459. int elem;
  4460. /* Skip white space. */
  4461. while (m_input.m_line_index < m_input.m_line_read
  4462. && true == CUtility.isspace(m_input.m_line[m_input.m_line_index]))
  4463. {
  4464. ++m_input.m_line_index;
  4465. }
  4466. /* No name? */
  4467. if (m_input.m_line_index >= m_input.m_line_read)
  4468. {
  4469. CError.parse_error(CError.E_DIRECT,0);
  4470. }
  4471. /* Determine length. */
  4472. elem = m_input.m_line_index;
  4473. while (elem < m_input.m_line_read
  4474. && false == CUtility.isnewline(m_input.m_line[elem]))
  4475. {
  4476. ++elem;
  4477. }
  4478. /* Allocate non-terminated buffer of exact length. */
  4479. buffer = new char[elem - m_input.m_line_index];
  4480. /* Copy. */
  4481. elem = 0;
  4482. while (m_input.m_line_index < m_input.m_line_read
  4483. && false == CUtility.isnewline(m_input.m_line[m_input.m_line_index]))
  4484. {
  4485. buffer[elem] = m_input.m_line[m_input.m_line_index];
  4486. ++elem;
  4487. ++m_input.m_line_index;
  4488. }
  4489. return buffer;
  4490. }
  4491. private final int CLASS_CODE = 0;
  4492. private final int INIT_CODE = 1;
  4493. private final int EOF_CODE = 2;
  4494. private final int INIT_THROW_CODE = 3;
  4495. private final int YYLEX_THROW_CODE = 4;
  4496. private final int EOF_THROW_CODE = 5;
  4497. private final int EOF_VALUE_CODE = 6;
  4498. /***************************************************************
  4499. Function: packCode
  4500. Description:
  4501. **************************************************************/
  4502. private char[] packCode
  4503. (
  4504. char start_dir[],
  4505. char end_dir[],
  4506. char prev_code[],
  4507. int prev_read,
  4508. int specified
  4509. )
  4510. throws java.io.IOException
  4511. {
  4512. if (CUtility.DEBUG)
  4513. {
  4514. CUtility._assert(INIT_CODE == specified
  4515. || CLASS_CODE == specified
  4516. || EOF_CODE == specified
  4517. || EOF_VALUE_CODE == specified
  4518. || INIT_THROW_CODE == specified
  4519. || YYLEX_THROW_CODE == specified
  4520. || EOF_THROW_CODE == specified);
  4521. }
  4522. if (0 != CUtility.charncmp(m_input.m_line,
  4523. 0,
  4524. start_dir,
  4525. 0,
  4526. start_dir.length - 1))
  4527. {
  4528. CError.parse_error(CError.E_INTERNAL,0);
  4529. }
  4530. if (null == prev_code)
  4531. {
  4532. prev_code = new char[BUFFER_SIZE];
  4533. prev_read = 0;
  4534. }
  4535. if (prev_read >= prev_code.length)
  4536. {
  4537. prev_code = CUtility.doubleSize(prev_code);
  4538. }
  4539. m_input.m_line_index = start_dir.length - 1;
  4540. while (true)
  4541. {
  4542. while (m_input.m_line_index >= m_input.m_line_read)
  4543. {
  4544. if (m_input.getLine())
  4545. {
  4546. CError.parse_error(CError.E_EOF,m_input.m_line_number);
  4547. }
  4548. if (0 == CUtility.charncmp(m_input.m_line,
  4549. 0,
  4550. end_dir,
  4551. 0,
  4552. end_dir.length - 1))
  4553. {
  4554. m_input.m_line_index = end_dir.length - 1;
  4555. switch (specified)
  4556. {
  4557. case CLASS_CODE:
  4558. m_spec.m_class_read = prev_read;
  4559. break;
  4560. case INIT_CODE:
  4561. m_spec.m_init_read = prev_read;
  4562. break;
  4563. case EOF_CODE:
  4564. m_spec.m_eof_read = prev_read;
  4565. break;
  4566. case EOF_VALUE_CODE:
  4567. m_spec.m_eof_value_read = prev_read;
  4568. break;
  4569. case INIT_THROW_CODE:
  4570. m_spec.m_init_throw_read = prev_read;
  4571. break;
  4572. case YYLEX_THROW_CODE:
  4573. m_spec.m_yylex_throw_read = prev_read;
  4574. break;
  4575. case EOF_THROW_CODE:
  4576. m_spec.m_eof_throw_read = prev_read;
  4577. break;
  4578. default:
  4579. CError.parse_error(CError.E_INTERNAL,m_input.m_line_number);
  4580. break;
  4581. }
  4582. return prev_code;
  4583. }
  4584. }
  4585. while (m_input.m_line_index < m_input.m_line_read)
  4586. {
  4587. prev_code[prev_read] = m_input.m_line[m_input.m_line_index];
  4588. ++prev_read;
  4589. ++m_input.m_line_index;
  4590. if (prev_read >= prev_code.length)
  4591. {
  4592. prev_code = CUtility.doubleSize(prev_code);
  4593. }
  4594. }
  4595. }
  4596. }
  4597. /***************************************************************
  4598. Member Variables: JLex directives.
  4599. **************************************************************/
  4600. private char m_state_dir[] = {
  4601. '%', 's', 't',
  4602. 'a', 't', 'e',
  4603. '\0'
  4604. };
  4605. private char m_char_dir[] = {
  4606. '%', 'c', 'h',
  4607. 'a', 'r',
  4608. '\0'
  4609. };
  4610. private char m_line_dir[] = {
  4611. '%', 'l', 'i',
  4612. 'n', 'e',
  4613. '\0'
  4614. };
  4615. private char m_cup_dir[] = {
  4616. '%', 'c', 'u',
  4617. 'p',
  4618. '\0'
  4619. };
  4620. private char m_class_dir[] = {
  4621. '%', 'c', 'l',
  4622. 'a', 's', 's',
  4623. '\0'
  4624. };
  4625. private char m_implements_dir[] = {
  4626. '%', 'i', 'm', 'p', 'l', 'e', 'm', 'e', 'n', 't', 's',
  4627. '\0'
  4628. };
  4629. private char m_function_dir[] = {
  4630. '%', 'f', 'u',
  4631. 'n', 'c', 't',
  4632. 'i', 'o', 'n',
  4633. '\0'
  4634. };
  4635. private char m_type_dir[] = {
  4636. '%', 't', 'y',
  4637. 'p', 'e',
  4638. '\0'
  4639. };
  4640. private char m_integer_dir[] = {
  4641. '%', 'i', 'n',
  4642. 't', 'e', 'g',
  4643. 'e', 'r',
  4644. '\0'
  4645. };
  4646. private char m_intwrap_dir[] = {
  4647. '%', 'i', 'n',
  4648. 't', 'w', 'r',
  4649. 'a', 'p',
  4650. '\0'
  4651. };
  4652. private char m_full_dir[] = {
  4653. '%', 'f', 'u',
  4654. 'l', 'l',
  4655. '\0'
  4656. };
  4657. private char m_unicode_dir[] = {
  4658. '%', 'u', 'n',
  4659. 'i', 'c', 'o',
  4660. 'd', 'e',
  4661. '\0'
  4662. };
  4663. private char m_ignorecase_dir[] = {
  4664. '%', 'i', 'g',
  4665. 'n', 'o', 'r',
  4666. 'e', 'c', 'a',
  4667. 's', 'e',
  4668. '\0'
  4669. };
  4670. private char m_notunix_dir[] = {
  4671. '%', 'n', 'o',
  4672. 't', 'u', 'n',
  4673. 'i', 'x',
  4674. '\0'
  4675. };
  4676. private char m_init_code_dir[] = {
  4677. '%', 'i', 'n',
  4678. 'i', 't', '{',
  4679. '\0'
  4680. };
  4681. private char m_init_code_end_dir[] = {
  4682. '%', 'i', 'n',
  4683. 'i', 't', '}',
  4684. '\0'
  4685. };
  4686. private char m_init_throw_code_dir[] = {
  4687. '%', 'i', 'n',
  4688. 'i', 't', 't',
  4689. 'h', 'r', 'o',
  4690. 'w', '{',
  4691. '\0'
  4692. };
  4693. private char m_init_throw_code_end_dir[] = {
  4694. '%', 'i', 'n',
  4695. 'i', 't', 't',
  4696. 'h', 'r', 'o',
  4697. 'w', '}',
  4698. '\0'
  4699. };
  4700. private char m_yylex_throw_code_dir[] = {
  4701. '%', 'y', 'y', 'l',
  4702. 'e', 'x', 't',
  4703. 'h', 'r', 'o',
  4704. 'w', '{',
  4705. '\0'
  4706. };
  4707. private char m_yylex_throw_code_end_dir[] = {
  4708. '%', 'y', 'y', 'l',
  4709. 'e', 'x', 't',
  4710. 'h', 'r', 'o',
  4711. 'w', '}',
  4712. '\0'
  4713. };
  4714. private char m_eof_code_dir[] = {
  4715. '%', 'e', 'o',
  4716. 'f', '{',
  4717. '\0'
  4718. };
  4719. private char m_eof_code_end_dir[] = {
  4720. '%', 'e', 'o',
  4721. 'f', '}',
  4722. '\0'
  4723. };
  4724. private char m_eof_value_code_dir[] = {
  4725. '%', 'e', 'o',
  4726. 'f', 'v', 'a',
  4727. 'l', '{',
  4728. '\0'
  4729. };
  4730. private char m_eof_value_code_end_dir[] = {
  4731. '%', 'e', 'o',
  4732. 'f', 'v', 'a',
  4733. 'l', '}',
  4734. '\0'
  4735. };
  4736. private char m_eof_throw_code_dir[] = {
  4737. '%', 'e', 'o',
  4738. 'f', 't', 'h',
  4739. 'r', 'o', 'w',
  4740. '{',
  4741. '\0'
  4742. };
  4743. private char m_eof_throw_code_end_dir[] = {
  4744. '%', 'e', 'o',
  4745. 'f', 't', 'h',
  4746. 'r', 'o', 'w',
  4747. '}',
  4748. '\0'
  4749. };
  4750. private char m_class_code_dir[] = {
  4751. '%', '{',
  4752. '\0'
  4753. };
  4754. private char m_class_code_end_dir[] = {
  4755. '%', '}',
  4756. '\0'
  4757. };
  4758. private char m_yyeof_dir[] = {
  4759. '%', 'y', 'y',
  4760. 'e', 'o', 'f',
  4761. '\0'
  4762. };
  4763. private char m_public_dir[] = {
  4764. '%', 'p', 'u',
  4765. 'b', 'l', 'i',
  4766. 'c', '\0'
  4767. };
  4768. /***************************************************************
  4769. Function: userDeclare
  4770. Description:
  4771. **************************************************************/
  4772. private void userDeclare
  4773. (
  4774. )
  4775. throws java.io.IOException
  4776. {
  4777. int elem;
  4778. if (CUtility.DEBUG)
  4779. {
  4780. CUtility._assert(null != this);
  4781. CUtility._assert(null != m_outstream);
  4782. CUtility._assert(null != m_input);
  4783. CUtility._assert(null != m_tokens);
  4784. CUtility._assert(null != m_spec);
  4785. }
  4786. if (m_input.m_eof_reached)
  4787. {
  4788. /* End-of-file. */
  4789. CError.parse_error(CError.E_EOF,
  4790. m_input.m_line_number);
  4791. }
  4792. while (false == m_input.getLine())
  4793. {
  4794. /* Look for double percent. */
  4795. if (2 <= m_input.m_line_read
  4796. && '%' == m_input.m_line[0]
  4797. && '%' == m_input.m_line[1])
  4798. {
  4799. /* Mess around with line. */
  4800. m_input.m_line_read -= 2;
  4801. System.arraycopy(m_input.m_line, 2,
  4802. m_input.m_line, 0, m_input.m_line_read);
  4803. m_input.m_pushback_line = true;
  4804. /* Check for and discard empty line. */
  4805. if (0 == m_input.m_line_read
  4806. || '\n' == m_input.m_line[0])
  4807. {
  4808. m_input.m_pushback_line = false;
  4809. }
  4810. return;
  4811. }
  4812. if (0 == m_input.m_line_read)
  4813. {
  4814. continue;
  4815. }
  4816. if ('%' == m_input.m_line[0])
  4817. {
  4818. /* Special lex declarations. */
  4819. if (1 >= m_input.m_line_read)
  4820. {
  4821. CError.parse_error(CError.E_DIRECT,
  4822. m_input.m_line_number);
  4823. continue;
  4824. }
  4825. switch (m_input.m_line[1])
  4826. {
  4827. case '{':
  4828. if (0 == CUtility.charncmp(m_input.m_line,
  4829. 0,
  4830. m_class_code_dir,
  4831. 0,
  4832. m_class_code_dir.length - 1))
  4833. {
  4834. m_spec.m_class_code = packCode(m_class_code_dir,
  4835. m_class_code_end_dir,
  4836. m_spec.m_class_code,
  4837. m_spec.m_class_read,
  4838. CLASS_CODE);
  4839. break;
  4840. }
  4841. /* Bad directive. */
  4842. CError.parse_error(CError.E_DIRECT,
  4843. m_input.m_line_number);
  4844. break;
  4845. case 'c':
  4846. if (0 == CUtility.charncmp(m_input.m_line,
  4847. 0,
  4848. m_char_dir,
  4849. 0,
  4850. m_char_dir.length - 1))
  4851. {
  4852. /* Set line counting to ON. */
  4853. m_input.m_line_index = m_char_dir.length;
  4854. m_spec.m_count_chars = true;
  4855. break;
  4856. }
  4857. else if (0 == CUtility.charncmp(m_input.m_line,
  4858. 0,
  4859. m_class_dir,
  4860. 0,
  4861. m_class_dir.length - 1))
  4862. {
  4863. m_input.m_line_index = m_class_dir.length;
  4864. m_spec.m_class_name = getName();
  4865. break;
  4866. }
  4867. else if (0 == CUtility.charncmp(m_input.m_line,
  4868. 0,
  4869. m_cup_dir,
  4870. 0,
  4871. m_cup_dir.length - 1))
  4872. {
  4873. /* Set Java CUP compatibility to ON. */
  4874. m_input.m_line_index = m_cup_dir.length;
  4875. m_spec.m_cup_compatible = true;
  4876. // this is what %cup does: [CSA, 27-Jul-1999]
  4877. m_spec.m_implements_name =
  4878. "java_cup.runtime.Scanner".toCharArray();
  4879. m_spec.m_function_name =
  4880. "next_token".toCharArray();
  4881. m_spec.m_type_name =
  4882. "java_cup.runtime.Symbol".toCharArray();
  4883. break;
  4884. }
  4885. /* Bad directive. */
  4886. CError.parse_error(CError.E_DIRECT,
  4887. m_input.m_line_number);
  4888. break;
  4889. case 'e':
  4890. if (0 == CUtility.charncmp(m_input.m_line,
  4891. 0,
  4892. m_eof_code_dir,
  4893. 0,
  4894. m_eof_code_dir.length - 1))
  4895. {
  4896. m_spec.m_eof_code = packCode(m_eof_code_dir,
  4897. m_eof_code_end_dir,
  4898. m_spec.m_eof_code,
  4899. m_spec.m_eof_read,
  4900. EOF_CODE);
  4901. break;
  4902. }
  4903. else if (0 == CUtility.charncmp(m_input.m_line,
  4904. 0,
  4905. m_eof_value_code_dir,
  4906. 0,
  4907. m_eof_value_code_dir.length - 1))
  4908. {
  4909. m_spec.m_eof_value_code = packCode(m_eof_value_code_dir,
  4910. m_eof_value_code_end_dir,
  4911. m_spec.m_eof_value_code,
  4912. m_spec.m_eof_value_read,
  4913. EOF_VALUE_CODE);
  4914. break;
  4915. }
  4916. else if (0 == CUtility.charncmp(m_input.m_line,
  4917. 0,
  4918. m_eof_throw_code_dir,
  4919. 0,
  4920. m_eof_throw_code_dir.length - 1))
  4921. {
  4922. m_spec.m_eof_throw_code = packCode(m_eof_throw_code_dir,
  4923. m_eof_throw_code_end_dir,
  4924. m_spec.m_eof_throw_code,
  4925. m_spec.m_eof_throw_read,
  4926. EOF_THROW_CODE);
  4927. break;
  4928. }
  4929. /* Bad directive. */
  4930. CError.parse_error(CError.E_DIRECT,
  4931. m_input.m_line_number);
  4932. break;
  4933. case 'f':
  4934. if (0 == CUtility.charncmp(m_input.m_line,
  4935. 0,
  4936. m_function_dir,
  4937. 0,
  4938. m_function_dir.length - 1))
  4939. {
  4940. /* Set line counting to ON. */
  4941. m_input.m_line_index = m_function_dir.length;
  4942. m_spec.m_function_name = getName();
  4943. break;
  4944. }
  4945. else if (0 == CUtility.charncmp(m_input.m_line,
  4946. 0,
  4947. m_full_dir,
  4948. 0,
  4949. m_full_dir.length - 1))
  4950. {
  4951. m_input.m_line_index = m_full_dir.length;
  4952. m_spec.m_dtrans_ncols = CUtility.MAX_EIGHT_BIT + 1;
  4953. break;
  4954. }
  4955. /* Bad directive. */
  4956. CError.parse_error(CError.E_DIRECT,
  4957. m_input.m_line_number);
  4958. break;
  4959. case 'i':
  4960. if (0 == CUtility.charncmp(m_input.m_line,
  4961. 0,
  4962. m_integer_dir,
  4963. 0,
  4964. m_integer_dir.length - 1))
  4965. {
  4966. /* Set line counting to ON. */
  4967. m_input.m_line_index = m_integer_dir.length;
  4968. m_spec.m_integer_type = true;
  4969. break;
  4970. }
  4971. else if (0 == CUtility.charncmp(m_input.m_line,
  4972. 0,
  4973. m_intwrap_dir,
  4974. 0,
  4975. m_intwrap_dir.length - 1))
  4976. {
  4977. /* Set line counting to ON. */
  4978. m_input.m_line_index = m_integer_dir.length;
  4979. m_spec.m_intwrap_type = true;
  4980. break;
  4981. }
  4982. else if (0 == CUtility.charncmp(m_input.m_line,
  4983. 0,
  4984. m_init_code_dir,
  4985. 0,
  4986. m_init_code_dir.length - 1))
  4987. {
  4988. m_spec.m_init_code = packCode(m_init_code_dir,
  4989. m_init_code_end_dir,
  4990. m_spec.m_init_code,
  4991. m_spec.m_init_read,
  4992. INIT_CODE);
  4993. break;
  4994. }
  4995. else if (0 == CUtility.charncmp(m_input.m_line,
  4996. 0,
  4997. m_init_throw_code_dir,
  4998. 0,
  4999. m_init_throw_code_dir.length - 1))
  5000. {
  5001. m_spec.m_init_throw_code = packCode(m_init_throw_code_dir,
  5002. m_init_throw_code_end_dir,
  5003. m_spec.m_init_throw_code,
  5004. m_spec.m_init_throw_read,
  5005. INIT_THROW_CODE);
  5006. break;
  5007. }
  5008. else if (0 == CUtility.charncmp(m_input.m_line,
  5009. 0,
  5010. m_implements_dir,
  5011. 0,
  5012. m_implements_dir.length - 1))
  5013. {
  5014. m_input.m_line_index = m_implements_dir.length;
  5015. m_spec.m_implements_name = getName();
  5016. break;
  5017. }
  5018. else if (0 == CUtility.charncmp(m_input.m_line,
  5019. 0,
  5020. m_ignorecase_dir,
  5021. 0,
  5022. m_ignorecase_dir.length-1))
  5023. {
  5024. /* Set m_ignorecase to ON. */
  5025. m_input.m_line_index = m_ignorecase_dir.length;
  5026. m_spec.m_ignorecase = true;
  5027. break;
  5028. }
  5029. /* Bad directive. */
  5030. CError.parse_error(CError.E_DIRECT,
  5031. m_input.m_line_number);
  5032. break;
  5033. case 'l':
  5034. if (0 == CUtility.charncmp(m_input.m_line,
  5035. 0,
  5036. m_line_dir,
  5037. 0,
  5038. m_line_dir.length - 1))
  5039. {
  5040. /* Set line counting to ON. */
  5041. m_input.m_line_index = m_line_dir.length;
  5042. m_spec.m_count_lines = true;
  5043. break;
  5044. }
  5045. /* Bad directive. */
  5046. CError.parse_error(CError.E_DIRECT,
  5047. m_input.m_line_number);
  5048. break;
  5049. case 'n':
  5050. if (0 == CUtility.charncmp(m_input.m_line,
  5051. 0,
  5052. m_notunix_dir,
  5053. 0,
  5054. m_notunix_dir.length - 1))
  5055. {
  5056. /* Set line counting to ON. */
  5057. m_input.m_line_index = m_notunix_dir.length;
  5058. m_spec.m_unix = false;
  5059. break;
  5060. }
  5061. /* Bad directive. */
  5062. CError.parse_error(CError.E_DIRECT,
  5063. m_input.m_line_number);
  5064. break;
  5065. case 'p':
  5066. if (0 == CUtility.charncmp(m_input.m_line,
  5067. 0,
  5068. m_public_dir,
  5069. 0,
  5070. m_public_dir.length - 1))
  5071. {
  5072. /* Set public flag. */
  5073. m_input.m_line_index = m_public_dir.length;
  5074. m_spec.m_public = true;
  5075. break;
  5076. }
  5077. /* Bad directive. */
  5078. CError.parse_error(CError.E_DIRECT,
  5079. m_input.m_line_number);
  5080. break;
  5081. case 's':
  5082. if (0 == CUtility.charncmp(m_input.m_line,
  5083. 0,
  5084. m_state_dir,
  5085. 0,
  5086. m_state_dir.length - 1))
  5087. {
  5088. /* Recognize state list. */
  5089. m_input.m_line_index = m_state_dir.length;
  5090. saveStates();
  5091. break;
  5092. }
  5093. /* Undefined directive. */
  5094. CError.parse_error(CError.E_DIRECT,
  5095. m_input.m_line_number);
  5096. break;
  5097. case 't':
  5098. if (0 == CUtility.charncmp(m_input.m_line,
  5099. 0,
  5100. m_type_dir,
  5101. 0,
  5102. m_type_dir.length - 1))
  5103. {
  5104. /* Set Java CUP compatibility to ON. */
  5105. m_input.m_line_index = m_type_dir.length;
  5106. m_spec.m_type_name = getName();
  5107. break;
  5108. }
  5109. /* Undefined directive. */
  5110. CError.parse_error(CError.E_DIRECT,
  5111. m_input.m_line_number);
  5112. break;
  5113. case 'u':
  5114. if (0 == CUtility.charncmp(m_input.m_line,
  5115. 0,
  5116. m_unicode_dir,
  5117. 0,
  5118. m_unicode_dir.length - 1))
  5119. {
  5120. m_input.m_line_index = m_unicode_dir.length;
  5121. m_spec.m_dtrans_ncols= CUtility.MAX_SIXTEEN_BIT + 1;
  5122. break;
  5123. }
  5124. /* Bad directive. */
  5125. CError.parse_error(CError.E_DIRECT,
  5126. m_input.m_line_number);
  5127. break;
  5128. case 'y':
  5129. if (0 == CUtility.charncmp(m_input.m_line,
  5130. 0,
  5131. m_yyeof_dir,
  5132. 0,
  5133. m_yyeof_dir.length - 1))
  5134. {
  5135. m_input.m_line_index = m_yyeof_dir.length;
  5136. m_spec.m_yyeof = true;
  5137. break;
  5138. } else if (0 == CUtility.charncmp(m_input.m_line,
  5139. 0,
  5140. m_yylex_throw_code_dir,
  5141. 0,
  5142. m_yylex_throw_code_dir.length - 1))
  5143. {
  5144. m_spec.m_yylex_throw_code = packCode(m_yylex_throw_code_dir,
  5145. m_yylex_throw_code_end_dir,
  5146. m_spec.m_yylex_throw_code,
  5147. m_spec.m_yylex_throw_read,
  5148. YYLEX_THROW_CODE);
  5149. break;
  5150. }
  5151. /* Bad directive. */
  5152. CError.parse_error(CError.E_DIRECT,
  5153. m_input.m_line_number);
  5154. break;
  5155. default:
  5156. /* Undefined directive. */
  5157. CError.parse_error(CError.E_DIRECT,
  5158. m_input.m_line_number);
  5159. break;
  5160. }
  5161. }
  5162. else
  5163. {
  5164. /* Regular expression macro. */
  5165. m_input.m_line_index = 0;
  5166. saveMacro();
  5167. }
  5168. if (CUtility.OLD_DEBUG)
  5169. {
  5170. System.out.println("Line number "
  5171. + m_input.m_line_number + ":");
  5172. System.out.print(new String(m_input.m_line,
  5173. 0,m_input.m_line_read));
  5174. }
  5175. }
  5176. }
  5177. /***************************************************************
  5178. Function: userRules
  5179. Description: Processes third section of JLex
  5180. specification and creates minimized transition table.
  5181. **************************************************************/
  5182. private void userRules
  5183. (
  5184. )
  5185. throws java.io.IOException
  5186. {
  5187. int code;
  5188. if (false == m_init_flag)
  5189. {
  5190. CError.parse_error(CError.E_INIT,0);
  5191. }
  5192. if (CUtility.DEBUG)
  5193. {
  5194. CUtility._assert(null != this);
  5195. CUtility._assert(null != m_outstream);
  5196. CUtility._assert(null != m_input);
  5197. CUtility._assert(null != m_tokens);
  5198. CUtility._assert(null != m_spec);
  5199. }
  5200. /* UNDONE: Need to handle states preceding rules. */
  5201. if (m_spec.m_verbose)
  5202. {
  5203. System.out.println("Creating NFA machine representation.");
  5204. }
  5205. m_makeNfa.allocate_BOL_EOF(m_spec);
  5206. m_makeNfa.thompson(this,m_spec,m_input);
  5207. m_simplifyNfa.simplify(m_spec);
  5208. /*print_nfa();*/
  5209. if (CUtility.DEBUG)
  5210. {
  5211. CUtility._assert(END_OF_INPUT == m_spec.m_current_token);
  5212. }
  5213. if (m_spec.m_verbose)
  5214. {
  5215. System.out.println("Creating DFA transition table.");
  5216. }
  5217. m_nfa2dfa.make_dfa(this,m_spec);
  5218. if (CUtility.FOODEBUG) {
  5219. print_header();
  5220. }
  5221. if (m_spec.m_verbose)
  5222. {
  5223. System.out.println("Minimizing DFA transition table.");
  5224. }
  5225. m_minimize.min_dfa(m_spec);
  5226. }
  5227. /***************************************************************
  5228. Function: printccl
  5229. Description: Debugging routine that outputs readable form
  5230. of character class.
  5231. **************************************************************/
  5232. private void printccl
  5233. (
  5234. CSet set
  5235. )
  5236. {
  5237. int i;
  5238. System.out.print(" [");
  5239. for (i = 0; i < m_spec.m_dtrans_ncols; ++i)
  5240. {
  5241. if (set.contains(i))
  5242. {
  5243. System.out.print(interp_int(i));
  5244. }
  5245. }
  5246. System.out.print(']');
  5247. }
  5248. /***************************************************************
  5249. Function: plab
  5250. Description:
  5251. **************************************************************/
  5252. private String plab
  5253. (
  5254. CNfa state
  5255. )
  5256. {
  5257. int index;
  5258. if (null == state)
  5259. {
  5260. return (new String("--"));
  5261. }
  5262. index = m_spec.m_nfa_states.indexOf(state);
  5263. return ((new Integer(index)).toString());
  5264. }
  5265. /***************************************************************
  5266. Function: interp_int
  5267. Description:
  5268. **************************************************************/
  5269. private String interp_int
  5270. (
  5271. int i
  5272. )
  5273. {
  5274. switch (i)
  5275. {
  5276. case (int) '\b':
  5277. return (new String("\\b"));
  5278. case (int) '\t':
  5279. return (new String("\\t"));
  5280. case (int) '\n':
  5281. return (new String("\\n"));
  5282. case (int) '\f':
  5283. return (new String("\\f"));
  5284. case (int) '\r':
  5285. return (new String("\\r"));
  5286. case (int) ' ':
  5287. return (new String("\\ "));
  5288. default:
  5289. return ((new Character((char) i)).toString());
  5290. }
  5291. }
  5292. /***************************************************************
  5293. Function: print_nfa
  5294. Description:
  5295. **************************************************************/
  5296. void print_nfa
  5297. (
  5298. )
  5299. {
  5300. int elem;
  5301. CNfa nfa;
  5302. int size;
  5303. Enumeration states;
  5304. Integer index;
  5305. int i;
  5306. int j;
  5307. int vsize;
  5308. String state;
  5309. System.out.println("--------------------- NFA -----------------------");
  5310. size = m_spec.m_nfa_states.size();
  5311. for (elem = 0; elem < size; ++elem)
  5312. {
  5313. nfa = (CNfa) m_spec.m_nfa_states.elementAt(elem);
  5314. System.out.print("Nfa state " + plab(nfa) + ": ");
  5315. if (null == nfa.m_next)
  5316. {
  5317. System.out.print("(TERMINAL)");
  5318. }
  5319. else
  5320. {
  5321. System.out.print("--> " + plab(nfa.m_next));
  5322. System.out.print("--> " + plab(nfa.m_next2));
  5323. switch (nfa.m_edge)
  5324. {
  5325. case CNfa.CCL:
  5326. printccl(nfa.m_set);
  5327. break;
  5328. case CNfa.EPSILON:
  5329. System.out.print(" EPSILON ");
  5330. break;
  5331. default:
  5332. System.out.print(" " + interp_int(nfa.m_edge));
  5333. break;
  5334. }
  5335. }
  5336. if (0 == elem)
  5337. {
  5338. System.out.print(" (START STATE)");
  5339. }
  5340. if (null != nfa.m_accept)
  5341. {
  5342. System.out.print(" accepting "
  5343. + ((0 != (nfa.m_anchor & CSpec.START)) ? "^" : "")
  5344. + "<"
  5345. + (new String(nfa.m_accept.m_action,0,
  5346. nfa.m_accept.m_action_read))
  5347. + ">"
  5348. + ((0 != (nfa.m_anchor & CSpec.END)) ? "$" : ""));
  5349. }
  5350. System.out.println();
  5351. }
  5352. states = m_spec.m_states.keys();
  5353. while (states.hasMoreElements())
  5354. {
  5355. state = (String) states.nextElement();
  5356. index = (Integer) m_spec.m_states.get(state);
  5357. if (CUtility.DEBUG)
  5358. {
  5359. CUtility._assert(null != state);
  5360. CUtility._assert(null != index);
  5361. }
  5362. System.out.println("State \"" + state
  5363. + "\" has identifying index "
  5364. + index.toString() + ".");
  5365. System.out.print("\tStart states of matching rules: ");
  5366. i = index.intValue();
  5367. vsize = m_spec.m_state_rules[i].size();
  5368. for (j = 0; j < vsize; ++j)
  5369. {
  5370. nfa = (CNfa) m_spec.m_state_rules[i].elementAt(j);
  5371. System.out.print(m_spec.m_nfa_states.indexOf(nfa) + " ");
  5372. }
  5373. System.out.println();
  5374. }
  5375. System.out.println("-------------------- NFA ----------------------");
  5376. }
  5377. /***************************************************************
  5378. Function: getStates
  5379. Description: Parses the state area of a rule,
  5380. from the beginning of a line.
  5381. < state1, state2 ... > regular_expression { action }
  5382. Returns null on only EOF. Returns all_states,
  5383. initialied properly to correspond to all states,
  5384. if no states are found.
  5385. Special Notes: This function treats commas as optional
  5386. and permits states to be spread over multiple lines.
  5387. **************************************************************/
  5388. private SparseBitSet all_states = null;
  5389. SparseBitSet getStates
  5390. (
  5391. )
  5392. throws java.io.IOException
  5393. {
  5394. int start_state;
  5395. int count_state;
  5396. SparseBitSet states;
  5397. String name;
  5398. Integer index;
  5399. int i;
  5400. int size;
  5401. if (CUtility.DEBUG)
  5402. {
  5403. CUtility._assert(null != this);
  5404. CUtility._assert(null != m_outstream);
  5405. CUtility._assert(null != m_input);
  5406. CUtility._assert(null != m_tokens);
  5407. CUtility._assert(null != m_spec);
  5408. }
  5409. states = null;
  5410. /* Skip white space. */
  5411. while (CUtility.isspace(m_input.m_line[m_input.m_line_index]))
  5412. {
  5413. ++m_input.m_line_index;
  5414. while (m_input.m_line_index >= m_input.m_line_read)
  5415. {
  5416. /* Must just be an empty line. */
  5417. if (m_input.getLine())
  5418. {
  5419. /* EOF found. */
  5420. return null;
  5421. }
  5422. }
  5423. }
  5424. /* Look for states. */
  5425. if ('<' == m_input.m_line[m_input.m_line_index])
  5426. {
  5427. ++m_input.m_line_index;
  5428. states = new SparseBitSet();
  5429. /* Parse states. */
  5430. while (true)
  5431. {
  5432. /* We may have reached the end of the line. */
  5433. while (m_input.m_line_index >= m_input.m_line_read)
  5434. {
  5435. if (m_input.getLine())
  5436. {
  5437. /* EOF found. */
  5438. CError.parse_error(CError.E_EOF,m_input.m_line_number);
  5439. return states;
  5440. }
  5441. }
  5442. while (true)
  5443. {
  5444. /* Skip white space. */
  5445. while (CUtility.isspace(m_input.m_line[m_input.m_line_index]))
  5446. {
  5447. ++m_input.m_line_index;
  5448. while (m_input.m_line_index >= m_input.m_line_read)
  5449. {
  5450. if (m_input.getLine())
  5451. {
  5452. /* EOF found. */
  5453. CError.parse_error(CError.E_EOF,m_input.m_line_number);
  5454. return states;
  5455. }
  5456. }
  5457. }
  5458. if (',' != m_input.m_line[m_input.m_line_index])
  5459. {
  5460. break;
  5461. }
  5462. ++m_input.m_line_index;
  5463. }
  5464. if ('>' == m_input.m_line[m_input.m_line_index])
  5465. {
  5466. ++m_input.m_line_index;
  5467. if (m_input.m_line_index < m_input.m_line_read)
  5468. {
  5469. m_advance_stop = true;
  5470. }
  5471. return states;
  5472. }
  5473. /* Read in state name. */
  5474. start_state = m_input.m_line_index;
  5475. while (false == CUtility.isspace(m_input.m_line[m_input.m_line_index])
  5476. && ',' != m_input.m_line[m_input.m_line_index]
  5477. && '>' != m_input.m_line[m_input.m_line_index])
  5478. {
  5479. ++m_input.m_line_index;
  5480. if (m_input.m_line_index >= m_input.m_line_read)
  5481. {
  5482. /* End of line means end of state name. */
  5483. break;
  5484. }
  5485. }
  5486. count_state = m_input.m_line_index - start_state;
  5487. /* Save name after checking definition. */
  5488. name = new String(m_input.m_line,
  5489. start_state,
  5490. count_state);
  5491. index = (Integer) m_spec.m_states.get(name);
  5492. if (null == index)
  5493. {
  5494. /* Uninitialized state. */
  5495. System.out.println("Uninitialized State Name: " + name);
  5496. CError.parse_error(CError.E_STATE,m_input.m_line_number);
  5497. }
  5498. states.set(index.intValue());
  5499. }
  5500. }
  5501. if (null == all_states)
  5502. {
  5503. all_states = new SparseBitSet();
  5504. size = m_spec.m_states.size();
  5505. for (i = 0; i < size; ++i)
  5506. {
  5507. all_states.set(i);
  5508. }
  5509. }
  5510. if (m_input.m_line_index < m_input.m_line_read)
  5511. {
  5512. m_advance_stop = true;
  5513. }
  5514. return all_states;
  5515. }
  5516. /********************************************************
  5517. Function: expandMacro
  5518. Description: Returns false on error, true otherwise.
  5519. *******************************************************/
  5520. private boolean expandMacro
  5521. (
  5522. )
  5523. {
  5524. int elem;
  5525. int start_macro;
  5526. int end_macro;
  5527. int start_name;
  5528. int count_name;
  5529. String def;
  5530. int def_elem;
  5531. String name;
  5532. char replace[];
  5533. int rep_elem;
  5534. if (CUtility.DEBUG)
  5535. {
  5536. CUtility._assert(null != this);
  5537. CUtility._assert(null != m_outstream);
  5538. CUtility._assert(null != m_input);
  5539. CUtility._assert(null != m_tokens);
  5540. CUtility._assert(null != m_spec);
  5541. }
  5542. /* Check for macro. */
  5543. if ('{' != m_input.m_line[m_input.m_line_index])
  5544. {
  5545. CError.parse_error(CError.E_INTERNAL,m_input.m_line_number);
  5546. return ERROR;
  5547. }
  5548. start_macro = m_input.m_line_index;
  5549. elem = m_input.m_line_index + 1;
  5550. if (elem >= m_input.m_line_read)
  5551. {
  5552. CError.impos("Unfinished macro name");
  5553. return ERROR;
  5554. }
  5555. /* Get macro name. */
  5556. start_name = elem;
  5557. while ('}' != m_input.m_line[elem])
  5558. {
  5559. ++elem;
  5560. if (elem >= m_input.m_line_read)
  5561. {
  5562. CError.impos("Unfinished macro name at line "
  5563. + m_input.m_line_number);
  5564. return ERROR;
  5565. }
  5566. }
  5567. count_name = elem - start_name;
  5568. end_macro = elem;
  5569. /* Check macro name. */
  5570. if (0 == count_name)
  5571. {
  5572. CError.impos("Nonexistent macro name");
  5573. return ERROR;
  5574. }
  5575. /* Debug checks. */
  5576. if (CUtility.DEBUG)
  5577. {
  5578. CUtility._assert(0 < count_name);
  5579. }
  5580. /* Retrieve macro definition. */
  5581. name = new String(m_input.m_line,start_name,count_name);
  5582. def = (String) m_spec.m_macros.get(name);
  5583. if (null == def)
  5584. {
  5585. /*CError.impos("Undefined macro \"" + name + "\".");*/
  5586. System.out.println("Error: Undefined macro \"" + name + "\".");
  5587. CError.parse_error(CError.E_NOMAC, m_input.m_line_number);
  5588. return ERROR;
  5589. }
  5590. if (CUtility.OLD_DUMP_DEBUG)
  5591. {
  5592. System.out.println("expanded escape: " + def);
  5593. }
  5594. /* Replace macro in new buffer,
  5595. beginning by copying first part of line buffer. */
  5596. replace = new char[m_input.m_line.length];
  5597. for (rep_elem = 0; rep_elem < start_macro; ++rep_elem)
  5598. {
  5599. replace[rep_elem] = m_input.m_line[rep_elem];
  5600. if (CUtility.DEBUG)
  5601. {
  5602. CUtility._assert(rep_elem < replace.length);
  5603. }
  5604. }
  5605. /* Copy macro definition. */
  5606. if (rep_elem >= replace.length)
  5607. {
  5608. replace = CUtility.doubleSize(replace);
  5609. }
  5610. for (def_elem = 0; def_elem < def.length(); ++def_elem)
  5611. {
  5612. replace[rep_elem] = def.charAt(def_elem);
  5613. ++rep_elem;
  5614. if (rep_elem >= replace.length)
  5615. {
  5616. replace = CUtility.doubleSize(replace);
  5617. }
  5618. }
  5619. /* Copy last part of line. */
  5620. if (rep_elem >= replace.length)
  5621. {
  5622. replace = CUtility.doubleSize(replace);
  5623. }
  5624. for (elem = end_macro + 1; elem < m_input.m_line_read; ++elem)
  5625. {
  5626. replace[rep_elem] = m_input.m_line[elem];
  5627. ++rep_elem;
  5628. if (rep_elem >= replace.length)
  5629. {
  5630. replace = CUtility.doubleSize(replace);
  5631. }
  5632. }
  5633. /* Replace buffer. */
  5634. m_input.m_line = replace;
  5635. m_input.m_line_read = rep_elem;
  5636. if (CUtility.OLD_DEBUG)
  5637. {
  5638. System.out.println(new String(m_input.m_line,0,m_input.m_line_read));
  5639. }
  5640. return NOT_ERROR;
  5641. }
  5642. /***************************************************************
  5643. Function: saveMacro
  5644. Description: Saves macro definition of form:
  5645. macro_name = macro_definition
  5646. **************************************************************/
  5647. private void saveMacro
  5648. (
  5649. )
  5650. {
  5651. int elem;
  5652. int start_name;
  5653. int count_name;
  5654. int start_def;
  5655. int count_def;
  5656. boolean saw_escape;
  5657. boolean in_quote;
  5658. boolean in_ccl;
  5659. if (CUtility.DEBUG)
  5660. {
  5661. CUtility._assert(null != this);
  5662. CUtility._assert(null != m_outstream);
  5663. CUtility._assert(null != m_input);
  5664. CUtility._assert(null != m_tokens);
  5665. CUtility._assert(null != m_spec);
  5666. }
  5667. /* Macro declarations are of the following form:
  5668. macro_name macro_definition */
  5669. elem = 0;
  5670. /* Skip white space preceding macro name. */
  5671. while (CUtility.isspace(m_input.m_line[elem]))
  5672. {
  5673. ++elem;
  5674. if (elem >= m_input.m_line_read)
  5675. {
  5676. /* End of line has been reached,
  5677. and line was found to be empty. */
  5678. return;
  5679. }
  5680. }
  5681. /* Read macro name. */
  5682. start_name = elem;
  5683. while (false == CUtility.isspace(m_input.m_line[elem])
  5684. && '=' != m_input.m_line[elem])
  5685. {
  5686. ++elem;
  5687. if (elem >= m_input.m_line_read)
  5688. {
  5689. /* Macro name but no associated definition. */
  5690. CError.parse_error(CError.E_MACDEF,m_input.m_line_number);
  5691. }
  5692. }
  5693. count_name = elem - start_name;
  5694. /* Check macro name. */
  5695. if (0 == count_name)
  5696. {
  5697. /* Nonexistent macro name. */
  5698. CError.parse_error(CError.E_MACDEF,m_input.m_line_number);
  5699. }
  5700. /* Skip white space between name and definition. */
  5701. while (CUtility.isspace(m_input.m_line[elem]))
  5702. {
  5703. ++elem;
  5704. if (elem >= m_input.m_line_read)
  5705. {
  5706. /* Macro name but no associated definition. */
  5707. CError.parse_error(CError.E_MACDEF,m_input.m_line_number);
  5708. }
  5709. }
  5710. if ('=' == m_input.m_line[elem])
  5711. {
  5712. ++elem;
  5713. if (elem >= m_input.m_line_read)
  5714. {
  5715. /* Macro name but no associated definition. */
  5716. CError.parse_error(CError.E_MACDEF,m_input.m_line_number);
  5717. }
  5718. }
  5719. /* Skip white space between name and definition. */
  5720. while (CUtility.isspace(m_input.m_line[elem]))
  5721. {
  5722. ++elem;
  5723. if (elem >= m_input.m_line_read)
  5724. {
  5725. /* Macro name but no associated definition. */
  5726. CError.parse_error(CError.E_MACDEF,m_input.m_line_number);
  5727. }
  5728. }
  5729. /* Read macro definition. */
  5730. start_def = elem;
  5731. in_quote = false;
  5732. in_ccl = false;
  5733. saw_escape = false;
  5734. while (false == CUtility.isspace(m_input.m_line[elem])
  5735. || true == in_quote
  5736. || true == in_ccl
  5737. || true == saw_escape)
  5738. {
  5739. if ('\"' == m_input.m_line[elem] && false == saw_escape)
  5740. {
  5741. in_quote = !in_quote;
  5742. }
  5743. if ('\\' == m_input.m_line[elem] && false == saw_escape)
  5744. {
  5745. saw_escape = true;
  5746. }
  5747. else
  5748. {
  5749. saw_escape = false;
  5750. }
  5751. if (false == saw_escape && false == in_quote) { // CSA, 24-jul-99
  5752. if ('[' == m_input.m_line[elem] && false == in_ccl)
  5753. in_ccl = true;
  5754. if (']' == m_input.m_line[elem] && true == in_ccl)
  5755. in_ccl = false;
  5756. }
  5757. ++elem;
  5758. if (elem >= m_input.m_line_read)
  5759. {
  5760. /* End of line. */
  5761. break;
  5762. }
  5763. }
  5764. count_def = elem - start_def;
  5765. /* Check macro definition. */
  5766. if (0 == count_def)
  5767. {
  5768. /* Nonexistent macro name. */
  5769. CError.parse_error(CError.E_MACDEF,m_input.m_line_number);
  5770. }
  5771. /* Debug checks. */
  5772. if (CUtility.DEBUG)
  5773. {
  5774. CUtility._assert(0 < count_def);
  5775. CUtility._assert(0 < count_name);
  5776. CUtility._assert(null != m_spec.m_macros);
  5777. }
  5778. if (CUtility.OLD_DEBUG)
  5779. {
  5780. System.out.println("macro name \""
  5781. + new String(m_input.m_line,start_name,count_name)
  5782. + "\".");
  5783. System.out.println("macro definition \""
  5784. + new String(m_input.m_line,start_def,count_def)
  5785. + "\".");
  5786. }
  5787. /* Add macro name and definition to table. */
  5788. m_spec.m_macros.put(new String(m_input.m_line,start_name,count_name),
  5789. new String(m_input.m_line,start_def,count_def));
  5790. }
  5791. /***************************************************************
  5792. Function: saveStates
  5793. Description: Takes state declaration and makes entries
  5794. for them in state hashtable in CSpec structure.
  5795. State declaration should be of the form:
  5796. %state name0[, name1, name2 ...]
  5797. (But commas are actually optional as long as there is
  5798. white space in between them.)
  5799. **************************************************************/
  5800. private void saveStates
  5801. (
  5802. )
  5803. {
  5804. int start_state;
  5805. int count_state;
  5806. if (CUtility.DEBUG)
  5807. {
  5808. CUtility._assert(null != this);
  5809. CUtility._assert(null != m_outstream);
  5810. CUtility._assert(null != m_input);
  5811. CUtility._assert(null != m_tokens);
  5812. CUtility._assert(null != m_spec);
  5813. }
  5814. /* EOF found? */
  5815. if (m_input.m_eof_reached)
  5816. {
  5817. return;
  5818. }
  5819. /* Debug checks. */
  5820. if (CUtility.DEBUG)
  5821. {
  5822. CUtility._assert('%' == m_input.m_line[0]);
  5823. CUtility._assert('s' == m_input.m_line[1]);
  5824. CUtility._assert(m_input.m_line_index <= m_input.m_line_read);
  5825. CUtility._assert(0 <= m_input.m_line_index);
  5826. CUtility._assert(0 <= m_input.m_line_read);
  5827. }
  5828. /* Blank line? No states? */
  5829. if (m_input.m_line_index >= m_input.m_line_read)
  5830. {
  5831. return;
  5832. }
  5833. while (m_input.m_line_index < m_input.m_line_read)
  5834. {
  5835. if (CUtility.OLD_DEBUG)
  5836. {
  5837. System.out.println("line read " + m_input.m_line_read
  5838. + "\tline index = " + m_input.m_line_index);
  5839. }
  5840. /* Skip white space. */
  5841. while (CUtility.isspace(m_input.m_line[m_input.m_line_index]))
  5842. {
  5843. ++m_input.m_line_index;
  5844. if (m_input.m_line_index >= m_input.m_line_read)
  5845. {
  5846. /* No more states to be found. */
  5847. return;
  5848. }
  5849. }
  5850. /* Look for state name. */
  5851. start_state = m_input.m_line_index;
  5852. while (false == CUtility.isspace(m_input.m_line[m_input.m_line_index])
  5853. && ',' != m_input.m_line[m_input.m_line_index])
  5854. {
  5855. ++m_input.m_line_index;
  5856. if (m_input.m_line_index >= m_input.m_line_read)
  5857. {
  5858. /* End of line and end of state name. */
  5859. break;
  5860. }
  5861. }
  5862. count_state = m_input.m_line_index - start_state;
  5863. if (CUtility.OLD_DEBUG)
  5864. {
  5865. System.out.println("State name \""
  5866. + new String(m_input.m_line,start_state,count_state)
  5867. + "\".");
  5868. System.out.println("Integer index \""
  5869. + m_spec.m_states.size()
  5870. + "\".");
  5871. }
  5872. /* Enter new state name, along with unique index. */
  5873. m_spec.m_states.put(new String(m_input.m_line,start_state,count_state),
  5874. new Integer(m_spec.m_states.size()));
  5875. /* Skip comma. */
  5876. if (',' == m_input.m_line[m_input.m_line_index])
  5877. {
  5878. ++m_input.m_line_index;
  5879. if (m_input.m_line_index >= m_input.m_line_read)
  5880. {
  5881. /* End of line. */
  5882. return;
  5883. }
  5884. }
  5885. }
  5886. }
  5887. /********************************************************
  5888. Function: expandEscape
  5889. Description: Takes escape sequence and returns
  5890. corresponding character code.
  5891. *******************************************************/
  5892. private char expandEscape
  5893. (
  5894. )
  5895. {
  5896. char r;
  5897. /* Debug checks. */
  5898. if (CUtility.DEBUG)
  5899. {
  5900. CUtility._assert(m_input.m_line_index < m_input.m_line_read);
  5901. CUtility._assert(0 < m_input.m_line_read);
  5902. CUtility._assert(0 <= m_input.m_line_index);
  5903. }
  5904. if ('\\' != m_input.m_line[m_input.m_line_index])
  5905. {
  5906. ++m_input.m_line_index;
  5907. return m_input.m_line[m_input.m_line_index - 1];
  5908. }
  5909. else
  5910. {
  5911. boolean unicode_escape = false;
  5912. ++m_input.m_line_index;
  5913. switch (m_input.m_line[m_input.m_line_index])
  5914. {
  5915. case 'b':
  5916. ++m_input.m_line_index;
  5917. return '\b';
  5918. case 't':
  5919. ++m_input.m_line_index;
  5920. return '\t';
  5921. case 'n':
  5922. ++m_input.m_line_index;
  5923. return '\n';
  5924. case 'f':
  5925. ++m_input.m_line_index;
  5926. return '\f';
  5927. case 'r':
  5928. ++m_input.m_line_index;
  5929. return '\r';
  5930. case '^':
  5931. ++m_input.m_line_index;
  5932. r=Character.toUpperCase(m_input.m_line[m_input.m_line_index]);
  5933. if (r<'@' || r>'Z') // non-fatal
  5934. CError.parse_error(CError.E_BADCTRL,m_input.m_line_number);
  5935. r = (char) (r - '@');
  5936. ++m_input.m_line_index;
  5937. return r;
  5938. case 'u':
  5939. unicode_escape = true;
  5940. case 'x':
  5941. ++m_input.m_line_index;
  5942. r = 0;
  5943. for (int i=0; i<(unicode_escape?4:2); i++)
  5944. if (CUtility.ishexdigit(m_input.m_line[m_input.m_line_index]))
  5945. {
  5946. r = (char) (r << 4);
  5947. r = (char) (r | CUtility.hex2bin(m_input.m_line[m_input.m_line_index]));
  5948. ++m_input.m_line_index;
  5949. }
  5950. else break;
  5951. return r;
  5952. default:
  5953. if (false == CUtility.isoctdigit(m_input.m_line[m_input.m_line_index]))
  5954. {
  5955. r = m_input.m_line[m_input.m_line_index];
  5956. ++m_input.m_line_index;
  5957. }
  5958. else
  5959. {
  5960. r = 0;
  5961. for (int i=0; i<3; i++)
  5962. if (CUtility.isoctdigit(m_input.m_line[m_input.m_line_index]))
  5963. {
  5964. r = (char) (r << 3);
  5965. r = (char) (r | CUtility.oct2bin(m_input.m_line[m_input.m_line_index]));
  5966. ++m_input.m_line_index;
  5967. }
  5968. else break;
  5969. }
  5970. return r;
  5971. }
  5972. }
  5973. }
  5974. /********************************************************
  5975. Function: packAccept
  5976. Description: Packages and returns CAccept
  5977. for action next in input stream.
  5978. *******************************************************/
  5979. CAccept packAccept
  5980. (
  5981. )
  5982. throws java.io.IOException
  5983. {
  5984. CAccept accept;
  5985. char action[];
  5986. int action_index;
  5987. int brackets;
  5988. boolean insinglequotes;
  5989. boolean indoublequotes;
  5990. boolean instarcomment;
  5991. boolean inslashcomment;
  5992. boolean escaped;
  5993. boolean slashed;
  5994. action = new char[BUFFER_SIZE];
  5995. action_index = 0;
  5996. if (CUtility.DEBUG)
  5997. {
  5998. CUtility._assert(null != this);
  5999. CUtility._assert(null != m_outstream);
  6000. CUtility._assert(null != m_input);
  6001. CUtility._assert(null != m_tokens);
  6002. CUtility._assert(null != m_spec);
  6003. }
  6004. /* Get a new line, if needed. */
  6005. while (m_input.m_line_index >= m_input.m_line_read)
  6006. {
  6007. if (m_input.getLine())
  6008. {
  6009. CError.parse_error(CError.E_EOF,m_input.m_line_number);
  6010. return null;
  6011. }
  6012. }
  6013. /* Look for beginning of action. */
  6014. while (CUtility.isspace(m_input.m_line[m_input.m_line_index]))
  6015. {
  6016. ++m_input.m_line_index;
  6017. /* Get a new line, if needed. */
  6018. while (m_input.m_line_index >= m_input.m_line_read)
  6019. {
  6020. if (m_input.getLine())
  6021. {
  6022. CError.parse_error(CError.E_EOF,m_input.m_line_number);
  6023. return null;
  6024. }
  6025. }
  6026. }
  6027. /* Look for brackets. */
  6028. if ('{' != m_input.m_line[m_input.m_line_index])
  6029. {
  6030. CError.parse_error(CError.E_BRACE,m_input.m_line_number);
  6031. }
  6032. /* Copy new line into action buffer. */
  6033. brackets = 0;
  6034. insinglequotes = indoublequotes = inslashcomment = instarcomment =
  6035. escaped = slashed = false;
  6036. while (true)
  6037. {
  6038. action[action_index] = m_input.m_line[m_input.m_line_index];
  6039. /* Look for quotes. */
  6040. if ((insinglequotes || indoublequotes) && escaped)
  6041. escaped=false; // only protects one char, but this is enough.
  6042. else if ((insinglequotes || indoublequotes) &&
  6043. '\\' == m_input.m_line[m_input.m_line_index])
  6044. escaped=true;
  6045. else if (!(insinglequotes || inslashcomment || instarcomment) &&
  6046. '\"' == m_input.m_line[m_input.m_line_index])
  6047. indoublequotes=!indoublequotes; // unescaped double quote.
  6048. else if (!(indoublequotes || inslashcomment || instarcomment) &&
  6049. '\'' == m_input.m_line[m_input.m_line_index])
  6050. insinglequotes=!insinglequotes; // unescaped single quote.
  6051. /* Look for comments. */
  6052. if (instarcomment) { // inside "/*" comment; look for "*/"
  6053. if (slashed && '/' == m_input.m_line[m_input.m_line_index])
  6054. instarcomment = slashed = false;
  6055. else // note that inside a star comment, slashed means starred
  6056. slashed = ('*' == m_input.m_line[m_input.m_line_index]);
  6057. } else if (!inslashcomment && !insinglequotes && !indoublequotes) {
  6058. // not in comment, look for /* or //
  6059. inslashcomment =
  6060. (slashed && '/' == m_input.m_line[m_input.m_line_index]);
  6061. instarcomment =
  6062. (slashed && '*' == m_input.m_line[m_input.m_line_index]);
  6063. slashed = ('/' == m_input.m_line[m_input.m_line_index]);
  6064. }
  6065. /* Look for brackets. */
  6066. if (!insinglequotes && !indoublequotes &&
  6067. !instarcomment && !inslashcomment) {
  6068. if ('{' == m_input.m_line[m_input.m_line_index])
  6069. {
  6070. ++brackets;
  6071. }
  6072. else if ('}' == m_input.m_line[m_input.m_line_index])
  6073. {
  6074. --brackets;
  6075. if (0 == brackets)
  6076. {
  6077. ++action_index;
  6078. ++m_input.m_line_index;
  6079. break;
  6080. }
  6081. }
  6082. }
  6083. ++action_index;
  6084. /* Double the buffer size, if needed. */
  6085. if (action_index >= action.length)
  6086. {
  6087. action = CUtility.doubleSize(action);
  6088. }
  6089. ++m_input.m_line_index;
  6090. /* Get a new line, if needed. */
  6091. while (m_input.m_line_index >= m_input.m_line_read)
  6092. {
  6093. inslashcomment = slashed = false;
  6094. if (insinglequotes || indoublequotes) { // non-fatal
  6095. CError.parse_error(CError.E_NEWLINE,m_input.m_line_number);
  6096. insinglequotes = indoublequotes = false;
  6097. }
  6098. if (m_input.getLine())
  6099. {
  6100. CError.parse_error(CError.E_SYNTAX,m_input.m_line_number);
  6101. return null;
  6102. }
  6103. }
  6104. }
  6105. accept = new CAccept(action,action_index,m_input.m_line_number);
  6106. if (CUtility.DEBUG)
  6107. {
  6108. CUtility._assert(null != accept);
  6109. }
  6110. if (CUtility.DESCENT_DEBUG)
  6111. {
  6112. System.out.print("Accepting action:");
  6113. System.out.println(new String(accept.m_action,0,accept.m_action_read));
  6114. }
  6115. return accept;
  6116. }
  6117. /********************************************************
  6118. Function: advance
  6119. Description: Returns code for next token.
  6120. *******************************************************/
  6121. private boolean m_advance_stop = false;
  6122. int advance
  6123. (
  6124. )
  6125. throws java.io.IOException
  6126. {
  6127. boolean saw_escape = false;
  6128. Integer code;
  6129. /*if (m_input.m_line_index > m_input.m_line_read) {
  6130. System.out.println("m_input.m_line_index = " + m_input.m_line_index);
  6131. System.out.println("m_input.m_line_read = " + m_input.m_line_read);
  6132. CUtility._assert(m_input.m_line_index <= m_input.m_line_read);
  6133. }*/
  6134. if (m_input.m_eof_reached)
  6135. {
  6136. /* EOF has already been reached,
  6137. so return appropriate code. */
  6138. m_spec.m_current_token = END_OF_INPUT;
  6139. m_spec.m_lexeme = '\0';
  6140. return m_spec.m_current_token;
  6141. }
  6142. /* End of previous regular expression?
  6143. Refill line buffer? */
  6144. if (EOS == m_spec.m_current_token
  6145. /* ADDED */
  6146. || m_input.m_line_index >= m_input.m_line_read)
  6147. /* ADDED */
  6148. {
  6149. if (m_spec.m_in_quote)
  6150. {
  6151. CError.parse_error(CError.E_SYNTAX,m_input.m_line_number);
  6152. }
  6153. while (true)
  6154. {
  6155. if (false == m_advance_stop
  6156. || m_input.m_line_index >= m_input.m_line_read)
  6157. {
  6158. if (m_input.getLine())
  6159. {
  6160. /* EOF has already been reached,
  6161. so return appropriate code. */
  6162. m_spec.m_current_token = END_OF_INPUT;
  6163. m_spec.m_lexeme = '\0';
  6164. return m_spec.m_current_token;
  6165. }
  6166. m_input.m_line_index = 0;
  6167. }
  6168. else
  6169. {
  6170. m_advance_stop = false;
  6171. }
  6172. while (m_input.m_line_index < m_input.m_line_read
  6173. && true == CUtility.isspace(m_input.m_line[m_input.m_line_index]))
  6174. {
  6175. ++m_input.m_line_index;
  6176. }
  6177. if (m_input.m_line_index < m_input.m_line_read)
  6178. {
  6179. break;
  6180. }
  6181. }
  6182. }
  6183. if (CUtility.DEBUG) {
  6184. CUtility._assert(m_input.m_line_index <= m_input.m_line_read);
  6185. }
  6186. while (true)
  6187. {
  6188. if (false == m_spec.m_in_quote
  6189. && '{' == m_input.m_line[m_input.m_line_index])
  6190. {
  6191. if (false == expandMacro())
  6192. {
  6193. break;
  6194. }
  6195. if (m_input.m_line_index >= m_input.m_line_read)
  6196. {
  6197. m_spec.m_current_token = EOS;
  6198. m_spec.m_lexeme = '\0';
  6199. return m_spec.m_current_token;
  6200. }
  6201. }
  6202. else if ('\"' == m_input.m_line[m_input.m_line_index])
  6203. {
  6204. m_spec.m_in_quote = !m_spec.m_in_quote;
  6205. ++m_input.m_line_index;
  6206. if (m_input.m_line_index >= m_input.m_line_read)
  6207. {
  6208. m_spec.m_current_token = EOS;
  6209. m_spec.m_lexeme = '\0';
  6210. return m_spec.m_current_token;
  6211. }
  6212. }
  6213. else
  6214. {
  6215. break;
  6216. }
  6217. }
  6218. if (m_input.m_line_index > m_input.m_line_read) {
  6219. System.out.println("m_input.m_line_index = " + m_input.m_line_index);
  6220. System.out.println("m_input.m_line_read = " + m_input.m_line_read);
  6221. CUtility._assert(m_input.m_line_index <= m_input.m_line_read);
  6222. }
  6223. /* Look for backslash, and corresponding
  6224. escape sequence. */
  6225. if ('\\' == m_input.m_line[m_input.m_line_index])
  6226. {
  6227. saw_escape = true;
  6228. }
  6229. else
  6230. {
  6231. saw_escape = false;
  6232. }
  6233. if (false == m_spec.m_in_quote)
  6234. {
  6235. if (false == m_spec.m_in_ccl &&
  6236. CUtility.isspace(m_input.m_line[m_input.m_line_index]))
  6237. {
  6238. /* White space means the end of
  6239. the current regular expression. */
  6240. m_spec.m_current_token = EOS;
  6241. m_spec.m_lexeme = '\0';
  6242. return m_spec.m_current_token;
  6243. }
  6244. /* Process escape sequence, if needed. */
  6245. if (saw_escape)
  6246. {
  6247. m_spec.m_lexeme = expandEscape();
  6248. }
  6249. else
  6250. {
  6251. m_spec.m_lexeme = m_input.m_line[m_input.m_line_index];
  6252. ++m_input.m_line_index;
  6253. }
  6254. }
  6255. else
  6256. {
  6257. if (saw_escape
  6258. && (m_input.m_line_index + 1) < m_input.m_line_read
  6259. && '\"' == m_input.m_line[m_input.m_line_index + 1])
  6260. {
  6261. m_spec.m_lexeme = '\"';
  6262. m_input.m_line_index = m_input.m_line_index + 2;
  6263. }
  6264. else
  6265. {
  6266. m_spec.m_lexeme = m_input.m_line[m_input.m_line_index];
  6267. ++m_input.m_line_index;
  6268. }
  6269. }
  6270. code = (Integer) m_tokens.get(new Character(m_spec.m_lexeme));
  6271. if (m_spec.m_in_quote || true == saw_escape)
  6272. {
  6273. m_spec.m_current_token = L;
  6274. }
  6275. else
  6276. {
  6277. if (null == code)
  6278. {
  6279. m_spec.m_current_token = L;
  6280. }
  6281. else
  6282. {
  6283. m_spec.m_current_token = code.intValue();
  6284. }
  6285. }
  6286. if (CCL_START == m_spec.m_current_token) m_spec.m_in_ccl = true;
  6287. if (CCL_END == m_spec.m_current_token) m_spec.m_in_ccl = false;
  6288. if (CUtility.FOODEBUG)
  6289. {
  6290. System.out.println("Lexeme: " + m_spec.m_lexeme
  6291. + "\tToken: " + m_spec.m_current_token
  6292. + "\tIndex: " + m_input.m_line_index);
  6293. }
  6294. return m_spec.m_current_token;
  6295. }
  6296. /***************************************************************
  6297. Function: details
  6298. Description: High level debugging routine.
  6299. **************************************************************/
  6300. private void details
  6301. (
  6302. )
  6303. {
  6304. Enumeration names;
  6305. String name;
  6306. String def;
  6307. Enumeration states;
  6308. String state;
  6309. Integer index;
  6310. int elem;
  6311. int size;
  6312. System.out.println();
  6313. System.out.println("\t** Macros **");
  6314. names = m_spec.m_macros.keys();
  6315. while (names.hasMoreElements())
  6316. {
  6317. name = (String) names.nextElement();
  6318. def = (String) m_spec.m_macros.get(name);
  6319. if (CUtility.DEBUG)
  6320. {
  6321. CUtility._assert(null != name);
  6322. CUtility._assert(null != def);
  6323. }
  6324. System.out.println("Macro name \"" + name
  6325. + "\" has definition \""
  6326. + def + "\".");
  6327. }
  6328. System.out.println();
  6329. System.out.println("\t** States **");
  6330. states = m_spec.m_states.keys();
  6331. while (states.hasMoreElements())
  6332. {
  6333. state = (String) states.nextElement();
  6334. index = (Integer) m_spec.m_states.get(state);
  6335. if (CUtility.DEBUG)
  6336. {
  6337. CUtility._assert(null != state);
  6338. CUtility._assert(null != index);
  6339. }
  6340. System.out.println("State \"" + state
  6341. + "\" has identifying index "
  6342. + index.toString() + ".");
  6343. }
  6344. System.out.println();
  6345. System.out.println("\t** Character Counting **");
  6346. if (false == m_spec.m_count_chars)
  6347. {
  6348. System.out.println("Character counting is off.");
  6349. }
  6350. else
  6351. {
  6352. if (CUtility.DEBUG)
  6353. {
  6354. CUtility._assert(m_spec.m_count_lines);
  6355. }
  6356. System.out.println("Character counting is on.");
  6357. }
  6358. System.out.println();
  6359. System.out.println("\t** Line Counting **");
  6360. if (false == m_spec.m_count_lines)
  6361. {
  6362. System.out.println("Line counting is off.");
  6363. }
  6364. else
  6365. {
  6366. if (CUtility.DEBUG)
  6367. {
  6368. CUtility._assert(m_spec.m_count_lines);
  6369. }
  6370. System.out.println("Line counting is on.");
  6371. }
  6372. System.out.println();
  6373. System.out.println("\t** Operating System Specificity **");
  6374. if (false == m_spec.m_unix)
  6375. {
  6376. System.out.println("Not generating UNIX-specific code.");
  6377. System.out.println("(This means that \"\\r\\n\" is a "
  6378. + "newline, rather than \"\\n\".)");
  6379. }
  6380. else
  6381. {
  6382. System.out.println("Generating UNIX-specific code.");
  6383. System.out.println("(This means that \"\\n\" is a "
  6384. + "newline, rather than \"\\r\\n\".)");
  6385. }
  6386. System.out.println();
  6387. System.out.println("\t** Java CUP Compatibility **");
  6388. if (false == m_spec.m_cup_compatible)
  6389. {
  6390. System.out.println("Generating CUP compatible code.");
  6391. System.out.println("(Scanner implements "
  6392. + "java_cup.runtime.Scanner.)");
  6393. }
  6394. else
  6395. {
  6396. System.out.println("Not generating CUP compatible code.");
  6397. }
  6398. if (CUtility.FOODEBUG) {
  6399. if (null != m_spec.m_nfa_states && null != m_spec.m_nfa_start)
  6400. {
  6401. System.out.println();
  6402. System.out.println("\t** NFA machine **");
  6403. print_nfa();
  6404. }
  6405. }
  6406. if (null != m_spec.m_dtrans_vector)
  6407. {
  6408. System.out.println();
  6409. System.out.println("\t** DFA transition table **");
  6410. /*print_header();*/
  6411. }
  6412. /*if (null != m_spec.m_accept_vector && null != m_spec.m_anchor_array)
  6413. {
  6414. System.out.println();
  6415. System.out.println("\t** Accept States and Anchor Vector **");
  6416. print_accept();
  6417. }*/
  6418. }
  6419. /***************************************************************
  6420. function: print_set
  6421. **************************************************************/
  6422. void print_set
  6423. (
  6424. Vector nfa_set
  6425. )
  6426. {
  6427. int size;
  6428. int elem;
  6429. CNfa nfa;
  6430. size = nfa_set.size();
  6431. if (0 == size)
  6432. {
  6433. System.out.print("empty ");
  6434. }
  6435. for (elem = 0; elem < size; ++elem)
  6436. {
  6437. nfa = (CNfa) nfa_set.elementAt(elem);
  6438. /*System.out.print(m_spec.m_nfa_states.indexOf(nfa) + " ");*/
  6439. System.out.print(nfa.m_label + " ");
  6440. }
  6441. }
  6442. /***************************************************************
  6443. Function: print_header
  6444. **************************************************************/
  6445. private void print_header
  6446. (
  6447. )
  6448. {
  6449. Enumeration states;
  6450. int i;
  6451. int j;
  6452. int chars_printed=0;
  6453. CDTrans dtrans;
  6454. int last_transition;
  6455. String str;
  6456. CAccept accept;
  6457. String state;
  6458. Integer index;
  6459. System.out.println("/*---------------------- DFA -----------------------");
  6460. states = m_spec.m_states.keys();
  6461. while (states.hasMoreElements())
  6462. {
  6463. state = (String) states.nextElement();
  6464. index = (Integer) m_spec.m_states.get(state);
  6465. if (CUtility.DEBUG)
  6466. {
  6467. CUtility._assert(null != state);
  6468. CUtility._assert(null != index);
  6469. }
  6470. System.out.println("State \"" + state
  6471. + "\" has identifying index "
  6472. + index.toString() + ".");
  6473. i = index.intValue();
  6474. if (CDTrans.F != m_spec.m_state_dtrans[i])
  6475. {
  6476. System.out.println("\tStart index in transition table: "
  6477. + m_spec.m_state_dtrans[i]);
  6478. }
  6479. else
  6480. {
  6481. System.out.println("\tNo associated transition states.");
  6482. }
  6483. }
  6484. for (i = 0; i < m_spec.m_dtrans_vector.size(); ++i)
  6485. {
  6486. dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(i);
  6487. if (null == m_spec.m_accept_vector && null == m_spec.m_anchor_array)
  6488. {
  6489. if (null == dtrans.m_accept)
  6490. {
  6491. System.out.print(" * State " + i + " [nonaccepting]");
  6492. }
  6493. else
  6494. {
  6495. System.out.print(" * State " + i
  6496. + " [accepting, line "
  6497. + dtrans.m_accept.m_line_number
  6498. + " <"
  6499. + (new String(dtrans.m_accept.m_action,0,
  6500. dtrans.m_accept.m_action_read))
  6501. + ">]");
  6502. if (CSpec.NONE != dtrans.m_anchor)
  6503. {
  6504. System.out.print(" Anchor: "
  6505. + ((0 != (dtrans.m_anchor & CSpec.START))
  6506. ? "start " : "")
  6507. + ((0 != (dtrans.m_anchor & CSpec.END))
  6508. ? "end " : ""));
  6509. }
  6510. }
  6511. }
  6512. else
  6513. {
  6514. accept = (CAccept) m_spec.m_accept_vector.elementAt(i);
  6515. if (null == accept)
  6516. {
  6517. System.out.print(" * State " + i + " [nonaccepting]");
  6518. }
  6519. else
  6520. {
  6521. System.out.print(" * State " + i
  6522. + " [accepting, line "
  6523. + accept.m_line_number
  6524. + " <"
  6525. + (new String(accept.m_action,0,
  6526. accept.m_action_read))
  6527. + ">]");
  6528. if (CSpec.NONE != m_spec.m_anchor_array[i])
  6529. {
  6530. System.out.print(" Anchor: "
  6531. + ((0 != (m_spec.m_anchor_array[i] & CSpec.START))
  6532. ? "start " : "")
  6533. + ((0 != (m_spec.m_anchor_array[i] & CSpec.END))
  6534. ? "end " : ""));
  6535. }
  6536. }
  6537. }
  6538. last_transition = -1;
  6539. for (j = 0; j < m_spec.m_dtrans_ncols; ++j)
  6540. {
  6541. if (CDTrans.F != dtrans.m_dtrans[j])
  6542. {
  6543. if (last_transition != dtrans.m_dtrans[j])
  6544. {
  6545. System.out.println();
  6546. System.out.print(" * goto " + dtrans.m_dtrans[j]
  6547. + " on ");
  6548. chars_printed = 0;
  6549. }
  6550. str = interp_int((int) j);
  6551. System.out.print(str);
  6552. chars_printed = chars_printed + str.length();
  6553. if (56 < chars_printed)
  6554. {
  6555. System.out.println();
  6556. System.out.print(" * ");
  6557. chars_printed = 0;
  6558. }
  6559. last_transition = dtrans.m_dtrans[j];
  6560. }
  6561. }
  6562. System.out.println();
  6563. }
  6564. System.out.println(" */");
  6565. System.out.println();
  6566. }
  6567. }
  6568. /*
  6569. * SparseBitSet 25-Jul-1999.
  6570. * C. Scott Ananian <cananian@alumni.princeton.edu>
  6571. *
  6572. * Re-implementation of the standard java.util.BitSet to support sparse
  6573. * sets, which we need to efficiently support unicode character classes.
  6574. */
  6575. /**
  6576. * A set of bits. The set automatically grows as more bits are
  6577. * needed.
  6578. *
  6579. * @version 1.00, 25 Jul 1999
  6580. * @author C. Scott Ananian
  6581. */
  6582. final class SparseBitSet implements Cloneable {
  6583. /** Sorted array of bit-block offsets. */
  6584. int offs[];
  6585. /** Array of bit-blocks; each holding BITS bits. */
  6586. long bits[];
  6587. /** Number of blocks currently in use. */
  6588. int size;
  6589. /** log base 2 of BITS, for the identity: x/BITS == x >> LG_BITS */
  6590. static final private int LG_BITS = 6;
  6591. /** Number of bits in a block. */
  6592. static final private int BITS = 1<<LG_BITS;
  6593. /** BITS-1, using the identity: x % BITS == x & (BITS-1) */
  6594. static final private int BITS_M1 = BITS-1;
  6595. /**
  6596. * Creates an empty set.
  6597. */
  6598. public SparseBitSet() {
  6599. bits = new long[4];
  6600. offs = new int [4];
  6601. size = 0;
  6602. }
  6603. /**
  6604. * Creates an empty set with the specified size.
  6605. * @param nbits the size of the set
  6606. */
  6607. public SparseBitSet(int nbits) {
  6608. this();
  6609. }
  6610. /**
  6611. * Creates an empty set with the same size as the given set.
  6612. */
  6613. public SparseBitSet(SparseBitSet set) {
  6614. bits = new long[set.size];
  6615. offs = new int [set.size];
  6616. size = 0;
  6617. }
  6618. private void new_block(int bnum) {
  6619. new_block(bsearch(bnum), bnum);
  6620. }
  6621. private void new_block(int idx, int bnum) {
  6622. if (size==bits.length) { // resize
  6623. long[] nbits = new long[size*3];
  6624. int [] noffs = new int [size*3];
  6625. System.arraycopy(bits, 0, nbits, 0, size);
  6626. System.arraycopy(offs, 0, noffs, 0, size);
  6627. bits = nbits;
  6628. offs = noffs;
  6629. }
  6630. CUtility._assert(size<bits.length);
  6631. insert_block(idx, bnum);
  6632. }
  6633. private void insert_block(int idx, int bnum) {
  6634. CUtility._assert(idx<=size);
  6635. CUtility._assert(idx==size || offs[idx]!=bnum);
  6636. System.arraycopy(bits, idx, bits, idx+1, size-idx);
  6637. System.arraycopy(offs, idx, offs, idx+1, size-idx);
  6638. offs[idx]=bnum;
  6639. bits[idx]=0; //clear them bits.
  6640. size++;
  6641. }
  6642. private int bsearch(int bnum) {
  6643. int l=0, r=size; // search interval is [l, r)
  6644. while (l<r) {
  6645. int p = (l+r)/2;
  6646. if (bnum<offs[p]) r=p;
  6647. else if (bnum>offs[p]) l=p+1;
  6648. else return p;
  6649. }
  6650. CUtility._assert(l==r);
  6651. return l; // index at which the bnum *should* be, if it's not.
  6652. }
  6653. /**
  6654. * Sets a bit.
  6655. * @param bit the bit to be set
  6656. */
  6657. public void set(int bit) {
  6658. int bnum = bit >> LG_BITS;
  6659. int idx = bsearch(bnum);
  6660. if (idx >= size || offs[idx]!=bnum)
  6661. new_block(idx, bnum);
  6662. bits[idx] |= (1L << (bit & BITS_M1) );
  6663. }
  6664. /**
  6665. * Clears a bit.
  6666. * @param bit the bit to be cleared
  6667. */
  6668. public void clear(int bit) {
  6669. int bnum = bit >> LG_BITS;
  6670. int idx = bsearch(bnum);
  6671. if (idx >= size || offs[idx]!=bnum)
  6672. new_block(idx, bnum);
  6673. bits[idx] &= ~(1L << (bit & BITS_M1) );
  6674. }
  6675. /**
  6676. * Clears all bits.
  6677. */
  6678. public void clearAll() {
  6679. size = 0;
  6680. }
  6681. /**
  6682. * Gets a bit.
  6683. * @param bit the bit to be gotten
  6684. */
  6685. public boolean get(int bit) {
  6686. int bnum = bit >> LG_BITS;
  6687. int idx = bsearch(bnum);
  6688. if (idx >= size || offs[idx]!=bnum)
  6689. return false;
  6690. return 0 != ( bits[idx] & (1L << (bit & BITS_M1) ) );
  6691. }
  6692. /**
  6693. * Logically ANDs this bit set with the specified set of bits.
  6694. * @param set the bit set to be ANDed with
  6695. */
  6696. public void and(SparseBitSet set) {
  6697. binop(this, set, AND);
  6698. }
  6699. /**
  6700. * Logically ORs this bit set with the specified set of bits.
  6701. * @param set the bit set to be ORed with
  6702. */
  6703. public void or(SparseBitSet set) {
  6704. binop(this, set, OR);
  6705. }
  6706. /**
  6707. * Logically XORs this bit set with the specified set of bits.
  6708. * @param set the bit set to be XORed with
  6709. */
  6710. public void xor(SparseBitSet set) {
  6711. binop(this, set, XOR);
  6712. }
  6713. // BINARY OPERATION MACHINERY
  6714. private static interface BinOp {
  6715. public long op(long a, long b);
  6716. }
  6717. private static final BinOp AND = new BinOp() {
  6718. public final long op(long a, long b) { return a & b; }
  6719. };
  6720. private static final BinOp OR = new BinOp() {
  6721. public final long op(long a, long b) { return a | b; }
  6722. };
  6723. private static final BinOp XOR = new BinOp() {
  6724. public final long op(long a, long b) { return a ^ b; }
  6725. };
  6726. private static final void binop(SparseBitSet a, SparseBitSet b, BinOp op) {
  6727. int nsize = a.size + b.size;
  6728. long[] nbits;
  6729. int [] noffs;
  6730. int a_zero, a_size;
  6731. // be very clever and avoid allocating more memory if we can.
  6732. if (a.bits.length < nsize) { // oh well, have to make working space.
  6733. nbits = new long[nsize];
  6734. noffs = new int [nsize];
  6735. a_zero = 0; a_size = a.size;
  6736. } else { // reduce, reuse, recycle!
  6737. nbits = a.bits;
  6738. noffs = a.offs;
  6739. a_zero = a.bits.length - a.size; a_size = a.bits.length;
  6740. System.arraycopy(a.bits, 0, a.bits, a_zero, a.size);
  6741. System.arraycopy(a.offs, 0, a.offs, a_zero, a.size);
  6742. }
  6743. // ok, crunch through and binop those sets!
  6744. nsize = 0;
  6745. for (int i=a_zero, j=0; i<a_size || j<b.size; ) {
  6746. long nb; int no;
  6747. if (i<a_size && (j>=b.size || a.offs[i] < b.offs[j])) {
  6748. nb = op.op(a.bits[i], 0);
  6749. no = a.offs[i];
  6750. i++;
  6751. } else if (j<b.size && (i>=a_size || a.offs[i] > b.offs[j])) {
  6752. nb = op.op(0, b.bits[j]);
  6753. no = b.offs[j];
  6754. j++;
  6755. } else { // equal keys; merge.
  6756. nb = op.op(a.bits[i], b.bits[j]);
  6757. no = a.offs[i];
  6758. i++; j++;
  6759. }
  6760. if (nb!=0) {
  6761. nbits[nsize] = nb;
  6762. noffs[nsize] = no;
  6763. nsize++;
  6764. }
  6765. }
  6766. a.bits = nbits;
  6767. a.offs = noffs;
  6768. a.size = nsize;
  6769. }
  6770. /**
  6771. * Gets the hashcode.
  6772. */
  6773. public int hashCode() {
  6774. long h = 1234;
  6775. for (int i=0; i<size; i++)
  6776. h ^= bits[i] * offs[i];
  6777. return (int)((h >> 32) ^ h);
  6778. }
  6779. /**
  6780. * Calculates and returns the set's size
  6781. */
  6782. public int size() {
  6783. return (size==0)?0:((1+offs[size-1]) << LG_BITS);
  6784. }
  6785. /**
  6786. * Compares this object against the specified object.
  6787. * @param obj the object to commpare with
  6788. * @return true if the objects are the same; false otherwise.
  6789. */
  6790. public boolean equals(Object obj) {
  6791. if ((obj != null) && (obj instanceof SparseBitSet))
  6792. return equals(this, (SparseBitSet)obj);
  6793. return false;
  6794. }
  6795. /**
  6796. * Compares two SparseBitSets for equality.
  6797. * @return true if the objects are the same; false otherwise.
  6798. */
  6799. public static boolean equals(SparseBitSet a, SparseBitSet b) {
  6800. for (int i=0, j=0; i<a.size || j<b.size; ) {
  6801. if (i<a.size && (j>=b.size || a.offs[i] < b.offs[j])) {
  6802. if (a.bits[i++]!=0) return false;
  6803. } else if (j<b.size && (i>=a.size || a.offs[i] > b.offs[j])) {
  6804. if (b.bits[j++]!=0) return false;
  6805. } else { // equal keys
  6806. if (a.bits[i++]!=b.bits[j++]) return false;
  6807. }
  6808. }
  6809. return true;
  6810. }
  6811. /**
  6812. * Clones the SparseBitSet.
  6813. */
  6814. public Object clone() {
  6815. try {
  6816. SparseBitSet set = (SparseBitSet)super.clone();
  6817. set.bits = (long[]) bits.clone();
  6818. set.offs = (int []) offs.clone();
  6819. return set;
  6820. } catch (CloneNotSupportedException e) {
  6821. // this shouldn't happen, since we are Cloneable
  6822. throw new InternalError();
  6823. }
  6824. }
  6825. /**
  6826. * Return an <code>Enumeration</code> of <code>Integer</code>s
  6827. * which represent set bit indices in this SparseBitSet.
  6828. */
  6829. public Enumeration elements() {
  6830. return new Enumeration() {
  6831. int idx=-1, bit=BITS;
  6832. { advance(); }
  6833. public boolean hasMoreElements() {
  6834. return (idx<size);
  6835. }
  6836. public Object nextElement() {
  6837. int r = bit + (offs[idx] << LG_BITS);
  6838. advance();
  6839. return new Integer(r);
  6840. }
  6841. private void advance() {
  6842. while (idx<size) {
  6843. while (++bit<BITS)
  6844. if (0!=(bits[idx] & (1L<<bit)))
  6845. return;
  6846. idx++; bit=-1;
  6847. }
  6848. }
  6849. };
  6850. }
  6851. /**
  6852. * Converts the SparseBitSet to a String.
  6853. */
  6854. public String toString() {
  6855. StringBuffer sb = new StringBuffer();
  6856. sb.append('{');
  6857. for (Enumeration e=elements(); e.hasMoreElements(); ) {
  6858. if (sb.length() > 1) sb.append(", ");
  6859. sb.append(e.nextElement());
  6860. }
  6861. sb.append('}');
  6862. return sb.toString();
  6863. }
  6864. /** Check validity. */
  6865. private boolean isValid() {
  6866. if (bits.length!=offs.length) return false;
  6867. if (size>bits.length) return false;
  6868. if (size!=0 && 0<=offs[0]) return false;
  6869. for (int i=1; i<size; i++)
  6870. if (offs[i] < offs[i-1])
  6871. return false;
  6872. return true;
  6873. }
  6874. /** Self-test. */
  6875. public static void main(String[] args) {
  6876. final int ITER = 500;
  6877. final int RANGE= 65536;
  6878. SparseBitSet a = new SparseBitSet();
  6879. CUtility._assert(!a.get(0) && !a.get(1));
  6880. CUtility._assert(!a.get(123329));
  6881. a.set(0); CUtility._assert(a.get(0) && !a.get(1));
  6882. a.set(1); CUtility._assert(a.get(0) && a.get(1));
  6883. a.clearAll();
  6884. CUtility._assert(!a.get(0) && !a.get(1));
  6885. java.util.Random r = new java.util.Random();
  6886. java.util.Vector v = new java.util.Vector();
  6887. for (int n=0; n<ITER; n++) {
  6888. int rr = ((r.nextInt()>>>1) % RANGE) << 1;
  6889. a.set(rr); v.addElement(new Integer(rr));
  6890. // check that all the numbers are there.
  6891. CUtility._assert(a.get(rr) && !a.get(rr+1) && !a.get(rr-1));
  6892. for (int i=0; i<v.size(); i++)
  6893. CUtility._assert(a.get(((Integer)v.elementAt(i)).intValue()));
  6894. }
  6895. SparseBitSet b = (SparseBitSet) a.clone();
  6896. CUtility._assert(a.equals(b) && b.equals(a));
  6897. for (int n=0; n<ITER2; n++) {
  6898. int rr = (r.nextInt()>>>1) % v.size();
  6899. int m = ((Integer)v.elementAt(rr)).intValue();
  6900. b.clear(m); v.removeElementAt(rr);
  6901. // check that numbers are removed properly.
  6902. CUtility._assert(!b.get(m));
  6903. }
  6904. CUtility._assert(!a.equals(b));
  6905. SparseBitSet c = (SparseBitSet) a.clone();
  6906. SparseBitSet d = (SparseBitSet) a.clone();
  6907. c.and(a);
  6908. CUtility._assert(c.equals(a) && a.equals(c));
  6909. c.xor(a);
  6910. CUtility._assert(!c.equals(a) && c.size()==0);
  6911. d.or(b);
  6912. CUtility._assert(d.equals(a) && !b.equals(d));
  6913. d.and(b);
  6914. CUtility._assert(!d.equals(a) && b.equals(d));
  6915. d.xor(a);
  6916. CUtility._assert(!d.equals(a) && !b.equals(d));
  6917. c.or(d); c.or(b);
  6918. CUtility._assert(c.equals(a) && a.equals(c));
  6919. c = (SparseBitSet) d.clone();
  6920. c.and(b);
  6921. CUtility._assert(c.size()==0);
  6922. System.out.println("Success.");
  6923. }
  6924. }
  6925. /************************************************************************
  6926. JLEX COPYRIGHT NOTICE, LICENSE AND DISCLAIMER.
  6927. Copyright 1996 by Elliot Joel Berk
  6928. Permission to use, copy, modify, and distribute this software and its
  6929. documentation for any purpose and without fee is hereby granted,
  6930. provided that the above copyright notice appear in all copies and that
  6931. both the copyright notice and this permission notice and warranty
  6932. disclaimer appear in supporting documentation, and that the name of
  6933. Elliot Joel Berk not be used in advertising or publicity pertaining
  6934. to distribution of the software without specific, written prior permission.
  6935. Elliot Joel Berk disclaims all warranties with regard to this software,
  6936. including all implied warranties of merchantability and fitness. In no event
  6937. shall Elliot Joel Berk be liable for any special, indirect or consequential
  6938. damages or any damages whatsoever resulting from loss of use, data or
  6939. profits, whether in an action of contract, negligence or other
  6940. tortious action, arising out of or in connection with the use or
  6941. performance of this software.
  6942. ***********************************************************************/
  6943. // set emacs indentation
  6944. // Local Variables:
  6945. // c-basic-offset:2
  6946. // End: