CpS 450 Language Translation Systems

Phase 1: Lexical Analysis

Your Submission Repository

Requirements

Write a scanner that identifies all the tokens of a Dream program (see Dream Lexical Components). It must take the name of one or more Dream source files from the command line and report any lexical errors. Lexical errors fall into one of three categories: unrecognized character, unterminated string, illegal string.

  • An unrecognized character is any character of the input file that is not a part of one of the lexical components listed below

  • An unterminated string is a string whose closing quote is not found by the end of the current input line

  • An illegal string is a string which contains illegal escape sequences

You may use any language or technology you wish that is available (or can be installed, with instructor approval) on csunix. Your scanner must build and run on csunix.

Usage Specifications

Your program will be invoked from the command line as follows:

build/install/dream/bin/dream -ds <filename>

where <filename> is the name of a Dream source file to be scanned for tokens, and -ds is an optional debugging output switch:

  • When the -ds command line option is present, produce a list of tokens, one per line, to standard output. The output should be formatted as shown in the sample run shown in the Dream Lexical Components handout.

  • When the -ds option is not present, your scanner should not display tokens, but it should still display lexical errors.

Unit Test

You must modify the provided JUnit test LexerTest.java to test your Lexer. A reasonably comprehensive test is expected: your test should check for each keyword and each type of token.

If you use a technology other than Java, use an appropriate unit test framework.

Design / Style Requirements

  • Use good program design and coding style. Keep the main() method small; move token processing into a separate method.
  • Method header comments and file header comments are expected on all hand-coded files (including the ones provided to you).
  • Separate command-line argument processing into its own method; create an Options class that holds the results of the command-line argument processing (create instance variables in Options for argument flags and the filename). More argument processing will be required in future phases.

Getting Started

Here’s a quick roadmap to help you get started on Phase 1:

  1. Set up the software. If you wish to use a language other than Java, ANTLR can generate scanners and parsers in other languages (Swift, Go, Python). Or, you can use a different compiler generation tool. Check with the instructor to get clearance for your desired technology, then port the provided Java project to your desired implementation language. If you choose to use a different technology, be aware that you will be tested over your knowledge of aspects of the Java technology that is presented in class.

  2. Use the “Your Submission Repository” link at the top of these instructions to create a submission repository for your phase 1 submission. Clone the repository to your computer. It has a copy of the lexer example in a folder named dream, preconfigured to work with VScode. Open the dream folder in VSCode (don’t open the root repository folder, or you won’t get the right VScode settings).

  3. Rename Arithmetic.g4 to Dream.g4. Change the first line to read grammar Dream. Edit the Main class to instantiate DreamLexer. Then compile and run the project with math.txt as the argument and verify that it still works after those changes. s
  4. Modify the provided Main class to implement the -ds option, following the Design Requirements specified above. Implementation tip: Comparing Strings in Java must be done by invoking a method in the String class, not by using the == operator.

  5. Review the ANTLR Lexer Rules specification.

  6. Edit Dream.g4 to define all of the Dream tokens. I suggest doing a few at a time, saving, and generating the scanner / testing. Find sample test input files in the class files in tests/phase1.

Bonus (+10%)

Create an extension for Visual Studio Code that implements syntax coloring for Dream source code files.

Implementing a syntax coloring extension for VSCode is not hard to do, and is a fun way of applying compiler technology. These documents should help you get started:

Create a folder inside the root of your submission repository named bonus that holds your extension. Also, make a short screencast of your extension in action, and include a link to it in your report.

Submission

  1. Create a README.md in the dream folder with a brief report in Markdown (don’t edit the root-level README.md, which gets replaced with the results of submission checks when you submit). Your report should indicate the number of hours you spent on this phase, and list any known bugs. Also, include an academic integrity statement indicating what help you received, if any.

  2. Push your code to the submission repository. Check the top-level README.md about a minute after you push to see the results of the automated checks. The test script runs your lexer using the sample input at the end of the Dream Lexical Components sheet. You should see output that matches the sample output on the Dream Lexical Components sheet. Note that your code will be tested with more comprehensive tests.