AS computing

Cie AS Computing

Dear A2 students,

                               It is indeed a great privilege for me to welcome all of you to this A2 computing class, which stared June 2012. I wish you have all had a great experience in you AS level computing where you had learned programming and theory in your own way. From my side, I have always seen you all as students who have good knowledge in programming. However, coming to the theory, I am sorry to say that you have not met my requirements, most of the time you answers have deviated from the examiners requirement and as such, you have lost marks heavily in theory. Now, please dont take the same experience and attitude with you in your A2 computing course, you may sure end up landing no where. So, right from the beginning, inculcate the habit of reading more into the subject thought and sure reading alone would provoke you to bring dozens of questions to classroom discussions. I dont want the classroom to be an passive teaching place or an active learning place, I want the classroom to be a flipped classroom. That is where the students experience are discussed in the classroom and the problems are solved. This alone can get you a A grade. For this, you need to have your reading habit to go on and on on a daily basis until the end of the A2 course. Hope all of you would take this A2 course very serious and fill in your intellectual void with my words of wisdom taken from the CIE oven and served to you hot.

All the best,

Keep moving,

Dr.Ravi   

Views: 390

Reply to This

Replies to This Discussion

Powerpoint in class just now

Attachments:

A token is a generic description of the type of symbol. For example, if you had a constant called 'months_in_year', this would be replaced by token CONSTANT. If you had variables called 'speed' and 'distance' then these would be replaced with the token VARIABLE. If you had a keyword 'IF', this would get replaced with the selection token for 'IF'. If you had a maths operator such as '>', this would get replaced with the token OPERATOR. By replacing symbols, variables, keywords and so on with generic tokens, you can turn a program into a set of 'patterns' of instructions. It is then a relatively easy job for the compiler in the syntax  stage to check each pattern against the allowable ones. Tokens replace:

  •         A keyword like IF, FOR, PRINT, THEN and so on.
  •         A symbol that has got a fixed meaning, such as +, *.
  •         Numeric constants.
  •         User-defined variable names.

For example, if you had the following line in a program: IF x > 10 THEN this might be changed into this pattern of tokens:

                       SELECTION/IF-VARIABLE-OPERATOR-CONSTANT-SELECTION/THEN

Notice that the spaces have have been removed and generic descriptions (tokens) have replaced keywords, variables, symbols and numeric constants.

Assembly language is the most basic programming language available for any processor.

programming languages

 

written in assembly language, looks like this:

MOV AX, 47104
MOV DS, AX
MOV [3998], 36
INT 32
The program still isn't quite clear, but it is much easier to understand than it was before. The first instruction, MOV AX, 47104, tells the computer to copy the number 47104 into the location AX. The next instruction, MOV DS, AX, tells the computer to copy the number in AX into the location DS. The next instruction, MOV [3998], 36 tells the computer to put the number 36 into memory location 3998. Finally, INT 32 exits the program by returning to the operating system.

 

Abstract syntax tree

Attachments:

Symbol Table

 

            A symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and sometimes its location.

 

Uses

-          An object file will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a linker will use these symbol tables to resolve any unresolved references.

 

-          A symbol table may only exist during the translation process, or it may be embedded in the output of that process for later exploitation, for example, during an interactive debugging session, or as a resource for formatting a diagnostic report during or after execution of a program.

 

-          While reverse engineering an executable, many tools refer to the symbol table to check what addresses have been assigned to global variables and known functions. If the symbol table has been stripped or cleaned out before being converted into an executable, tools will find it harder to determine addresses or understand anything about the program.

 

-          At the time of accessing variables and allocating memory dynamically, a compiler should perform many works and as such the extended stack model requires the symbol table.

 

 

 

Example

 

            The symbol table of a small program is listed below. The table itself was generated using the GNU binutils' nm utility. There is one data symbol, (noted by the "D" type), and many functions (self defined as well as from the standard library). The first column is where the symbol is located in the memory, the second is "The symbol type" and the third is the name of the symbol. By passing suitable parameters, the symbol table was made to sort on basis of address.

 

Example table

Address

Type

Name

00000020

a

T_BIT

00000040

a

F_BIT

00000080

a

I_BIT

20000004

t

irqvec

20000008

t

fiqvec

2000000c

t

InitReset

20000018

T

_main

20000024

t

End

20000030

T

AT91F_US3_CfgPIO_useB

2000005c

t

AT91F_PIO_CfgPeriph

200000b0

T

main

20000120

T

AT91F_DBGU_Printk

20000190

t

AT91F_US_TxReady

200001c0

t

AT91F_US_PutChar

200001f8

T

AT91F_SpuriousHandler

20000214

T

AT91F_DataAbort

20000230

T

AT91F_FetchAbort

2000024c

T

AT91F_Undef

20000268

T

AT91F_UndefHandler

20000284

T

AT91F_LowLevelInit

200002e0

t

AT91F_DBGU_CfgPIO

2000030c

t

AT91F_PIO_CfgPeriph

20000360

t

AT91F_US_Configure

200003dc

t

AT91F_US_SetBaudrate

2000041c

t

AT91F_US_Baudrate

200004ec

t

AT91F_US_SetTimeguard

2000051c

t

AT91F_PDC_Open

2000059c

t

AT91F_PDC_DisableRx

200005c8

t

AT91F_PDC_DisableTx

200005f4

t

AT91F_PDC_SetNextTx

20000638

t

AT91F_PDC_SetNextRx

2000067c

t

AT91F_PDC_SetTx

200006c0

t

AT91F_PDC_SetRx

20000704

t

AT91F_PDC_EnableRx

20000730

t

AT91F_PDC_EnableTx

2000075c

t

AT91F_US_EnableTx

20000788

T

__aeabi_uidiv

20000788

T

__udivsi3

20000884

T

__aeabi_uidivmod

2000089c

T

__aeabi_idiv0

2000089c

T

__aeabi_ldiv0

2000089c

T

__div0

200009a0

D

_data

200009a0

A

_etext

200009a0

D

holaamigosh

200009a4

A

__bss_end__

200009a4

A

__bss_start

200009a4

A

__bss_start__

200009a4

A

_edata

200009a4

A

_end

 

SEMANTIC ANALYSIS - A phase of natural language processing, following parsing, that involves extraction of context-independent aspects of a sentence's meaning, including the semantic roles of entities mentioned in the sentence, and quantification information, such as cardinality, iteration, and dependency.

 

Parsing is a very important part of many computer science disciplines. For example, compilers must parse source code to be able to translate it intoobject code. Likewise, any application that processes complex commandsmust be able to parse the commands. This includes virtually all end-user applications.

Parsing is often divided into lexical analysis and semantic parsing. Lexical analysis concentrates on dividing strings into components, called tokens, based on punctuation and other keys. Semantic parsing then attempts to determine the meaning of the string.

 

Lexemes (tokens)

The specification of a programming language often includes a set of rules which defines the lexer ( lexical analyser ) . These rules usually consist of regular expressions, and they define the set of possible character sequences that are used to form individual tokens or lexemes.

Token

A token is a string of characters, categorized according to the rules as a symbol (e.g., IDENTIFIER, NUMBER, COMMA). The process of forming tokens from an input stream of characters is called tokenization, and the lexer categorizes them according to a symbol type. A token can look like anything that is useful for processing an input text stream or text file.
A lexical analyzer generally does nothing with combinations of tokens, a task left for a parser. For example, a typical lexical analyzer recognizes parentheses as tokens, but does nothing to ensure that each "(" is matched with a ")".

Consider this expression in the C programming language:

sum=3+2;

Tokenized in the following table:

 

SEMANTIC ANALYSIS - A phase of natural language processing, following parsing, that involves extraction of context-independent aspects of a sentence's meaning, including the semantic roles of entities mentioned in the sentence, and quantification information, such as cardinality, iteration, and dependency.

 

Parsing is a very important part of many computer science disciplines. For example, compilers must parse source code to be able to translate it intoobject code. Likewise, any application that processes complex commandsmust be able to parse the commands. This includes virtually all end-user applications.

Parsing is often divided into lexical analysis and semantic parsing. Lexical analysis concentrates on dividing strings into components, called tokens, based on punctuation and other keys. Semantic parsing then attempts to determine the meaning of the string.

 

After syntax and semantic analysis, some compilers generate a full and clear intermediate representation of the source program. We can think of this Intermediate Representation (IR) as a program for an abstract machine, produced from the "Intermediate Code Generator" (ICG). This IR should have two important properties: It should be easy to produce and it should be easy to translate into target program.This takes place after the "Front End" of the compiler design.

(http://nptel.iitm.ac.in/courses/Webcourse-contents/IIT-KANPUR/compi...)

Characteristics of Assembly Languages

 


Although assembly languages are processor-dependent they do have some
common characteristics. All assembly languages use mnemonics to specify the
processor instructions and certain supporting operations required of the assembler
itself.


All assembly languages include a set of directives, which are used by the
programmer to issue commands to the assembler itself. These are called pseudoinstructions
since they resemble instructions in format although they are not


instructions. They are also known as assembler directives, since they are commands
or directives issued to the assembler. For example, the origin directive, whose
mnemonic is (universally) ORG, may be used to specify the origin or starting
address of a section of the program.


All assemblers provide for the use of symbolic labels which may be affixed to
constants and memory addresses. These labels may be used as substitutes for the
numbers with which they are identified. The use of labels rather than absolute
numbers is a powerful programming tool.


A branch target address, too, is easier to implement in the program when the
programmer need not be concerned with its actual value or its distance but may
refer to it by name.


Constants may be specified in one of several different bases with decimal,
hexadecimal, octal, and binary being common options.


The set of rules regarding the proper form for statements in the assembly
language is called the syntax of the assembler. The syntax for most assemblers
divides each statement into four fields: from left to right, the label, mnemonic or
operation, operand, and comment fields.


An assembler scans the source program twice while translating it into machine
code. During the first pass it counts bytes and locates all of the label definitions,
which it places into a table along with the corresponding numerical values. During
the second pass it uses this label table or symbol table to generate the machine code.
This feature is particularly important in branches where the target address may be
defined by being a label on a later instruction.

Source File: This is the program that is read by the compiler or interpreter. This is the text that needs to be compiled or interpreted.

Scanner: This is the first module in a compiler or interpreter. Its job is to read the source file one character at a time. It can also keep track of which line number and character is currently being read. A typical scanner can be instructed to move backwards and forwards through the source file. Why do we need to move backwards? We will see why in just a little bit when we examine the lexer. For now, assume that each time the scanner is called, it returns the next character in the file.

Lexer: This module serves to break up the source file into chunks (called tokens). It calls the scanner to get characters one at a time and organizes them into tokens and token types. For instance, if the source file read something like this:

Code:
cx = cy + 324; print "value of cx is ", cx;


a lexer would perhaps break it like this:

Code:
cx  --> Identifier (variable) =  --> Symbol (assignment operator) cy  --> Identifier (variable) + --> Symbol (addition operator) 324 --> Numeric constant (integer) ; --> Symbol (end of statement) print --> Identifier (keyword) "value of cx is " --> String constant , --> Symbol (string concatenation operator) cx --> Identifier (variable) ; --> Symbol (end of statement)


Thus, the lexer calls the scanner to pass it one character at a time and groups them together and identifies them up as tokens for the language parser (which is the next stage). It also identifies the type of token (variable vs. keyword, assignment operator vs. addition operator vs. string concatenation operator etc.) Occasionally, the lexer has to tell the scanner to back up though. Consider a language that has operators that may be more than one character long (! vs. !=, < vs. <=, + vs. ++ etc.). Assume that the lexer has requested the scanner for a character and it has returned '<'. The lexer needs to determine whether the operator is a < or a <=. So it requests the scanner for another character. If the next character is a '=', it changes the token to '<=' and passes it to the parser. If not, it tells the scanner to back up one character and hold it in the buffer, while it passes the '<' to the parser.

Parser: This is the part of the compiler that really understands the syntax of the language. It calls the lexer to get tokens and processes the tokens per the syntax of the language. For instance, taking the example from the lexer above, the hypothetical interaction between the lexer and parser could go like this:

Code:
Parser: Give me the next token Lexer: Next token is "cx" which is a variable. Parser: Ok, I have "cx" as a declared integer variable. Give me next token Lexer: Next token is "=", the assignment operator. Parser: Ok, the program wants me to assign something to "cx". Next token       Lexer: The next token is "cy" which is a variable.      Parser: Ok, I know "cy" is an integer variable. Next token please      Lexer: The next token is '+', which is an addition operator.      Parser: Ok, so I need to add something to the value in "cy". Next token please.          Lexer: The next token is "324", which is an integer.          Parser: Ok, both "cy" and "324" are integers, so I can add them. Next token please:          Lexer: The next token is ";" which is end of statement.      Parser: Ok, I will evaluate "cy + 324" and get the answer Parser: I'll take the answer from "cy + 324" and assign it to "cx"


In the above, the indenting shows a subprocess that the parser enters, to evaluate "cy + 324". This gives you a decent idea about how the parser operates. Also note that the parser is checking types and syntax rules (for instance, it checked whether cy and 324 were both integer types before adding them). If the parser gets a token that it was not expecting, it will stop processing and complain to the user about an error. The Scanner holds the current line number and character, so the Parser can inform the user approximately where the error occurred.

Interpreter/Code Generator: This is the part that actually takes the action that is specified by a program statement. In some cases, this is actually part of the parser (especially for interpreters) and the parser interprets and takes action directly. In other cases, the parser converts the statements into byte-code (intermediate language). In case of a compiler, it then hands them to the Code Generator to convert into machine code instructions. If you want a compiler for a different CPU or architecture, all you have to do is put a new code generator unit to translate the byte code into machine code for the new CPU.

Reply to Discussion

RSS

© 2025   Created by Ravichandran.   Powered by

Report an Issue  |  Terms of Service