What will we cover? |
---|
In any creative activity we need three basic ingredients: tools, materials and techniques. For example when I paint the tools are my brushes, pencils and palettes. The techniques are things like ‘washes’, wet on wet, blending, spraying etc. Finally the materials are the paints, paper and water. Similarly when I program, my tools are the programming languages, operating systems and hardware. The techniques are the programming constructs that we discussed in the previous section and the material is the data that I manipulate. In this chapter we look at the materials of programming.
This is quite a long section and by its nature you might find it a bit dry, the good news is that you don’t need to read it all at once. The chapter starts off by looking at the most basic data types available, then moves on to how we handle collections of items and finally looks at some more advanced material. It should be possible to drop out of the chapter after the collections material, cover a couple of the following chapters and then come back to this one as we start to use the more advanced bits.
Data is one of those terms that everyone uses but few really understand.
My dictionary defines it as:
"facts or figures from which conclusions can be inferred; information"
That's not too much help but at least gives a starting point. Let’s see if we can clarify things by looking at how data is used in programming terms. Data is the “stuff”, the raw information, that your program manipulates. Without data a program cannot perform any useful function. Programs manipulate data in many ways, often depending on the type of the data. Each data type also has a number of operations - things that you can do to it. For example we’ve seen that we can add numbers together. Addition is an operation on the number type of data. Data comes in many types and we’ll look at each of the most common types and the operations available for that type:
Data is stored in the memory of your computer. You can liken this to the big wall full of boxes used in mail rooms to sort the mail. You can put a letter in any box but unless the boxes are labelled with the destination address it’s pretty meaningless. Variables are the labels on the boxes in your computer's memory.
Knowing what data looks like is fine so far as it goes but to manipulate it we need to be able to access it and that’s what variables are used for. In programming terms we can create instances of data types and assign them to variables. A variable is a reference to a specific area somewhere in the computers memory. These areas hold the data. In some computer languages a variable must match the type of data that it points to. Any attempt to assign the wrong type of data to such a variable will cause an error. Some programmers prefer this type of system, known as static typing because it can prevent some subtle bugs which are hard to detect.
In Python a variable takes the type of the data assigned to it. It will keep that type and you will be warned if you try to mix data in strange ways - like trying to add a string to a number. (Recall the example error message? It was an example of just that kind of error.) We can change the type of data that a variable points to by reassigning the variable.
>>> q = 7 # q is now a number >>> print q 7 >>> q = "Seven" # reassign q to a string >>> print q Seven
Note that q was set to point to the number 7 initially. It maintained that value until we made it point at the character string "Seven". Thus, Python variables maintain the type of whatever they point to, but we can change what they point to simply by reassigning the variable. At that point the original data is 'lost' and Python will erase it from memory (unless another variable points at it too) this is known as garbage collection.
Garbage collection can be likened to the mailroom clerk who comes round once in a while and removes any packets that are in boxes with no labels. If he can't find an owner or address on the packets he throws them in the garbage. Let’s take a look at some examples of data types and see how all of this fits together.
Primitive data types are so called because they are the most basic types of data we can manipulate. More complex data types are really combinations of the primitive types. These are the building blocks upon which all the other types are built, the very foundation of computing. They include letters, numbers and something called a boolean type.
We've already seen these. They are literally any string or sequence of characters that can be printed on your screen. (In fact there can even be non-printable control characters too).
In Python, strings can be represented in several ways:
With single quotes:
'Here is a string'
With double quotes:
"Here is a very similar string"
With triple double quotes:
""" Here is a very long string that can if we wish span several lines and Python will preserve the lines as we type them..."""
One special use of the latter form is to build in documentation for Python functions that we create ourselves - we'll see this later.
You can access the individual characters in a string by treating it as an array of characters (see arrays below). There are also usually some operations provided by the programming language to help you manipulate strings - find a sub string, join two strings, copy one to another etc.
It is worth pointing out that some languages have a separate type for charactrs themselves, that is for a single character. In this case strings are literally just collections of these character values. Python by contrast just uses a string of length 1 to store an individual character, no special syntax is required.
There are a number of operations that can be performed on strings. Some of these are built in to Python but many others are provided by modules that you must import (as we did with sys in the Simple Sequences section).
Operator | Description |
---|---|
S1 + S2 | Concatenation of S1 and S2 |
S1 * N | N repetitions of S1 |
We can see these in action in the following examples:
>>> print 'Again and ' + 'again' # string concatenation Again and again >>> print 'Repeat ' * 3 # string repetition Repeat Repeat Repeat >>> print 'Again ' + ('and again ' * 3) # combine '+' and '*' Again and again and again and again
We can also assign character strings to variables:
>>> s1 = 'Again ' >>> s2 = 'and again ' >>> print s1 + (s2 * 3) Again and again and again and again
Notice that the last two examples produced the same output.
DIM MyString$ MyString$ = "Hello there!" PRINT MyString$
Tcl uses strings internally for everything. From the users point of view however this is not usually obvious. When explicitly dealing with a string you surround it in double quotes. To assign a value to a variable in Tcl use the set command and to read a string variable (or indeed any variable in Tcl) put a '$' in front of the name, like so:
% set Mystring "Hello world" % put $Mystring
Note: in both Tcl and BASIC only double quotes can be used for strings.
Integers are whole numbers from a large negative value through to a large positive value. That’s an important point to remember. Normally we don’t think of numbers being restricted in size but on a computer there are upper and lower limits. The size of this upper limit is known as MAXINT and depends on the number of bits used on your computer to represent a number. On most current computers it's 32 bits so MAXINT is around 2 billion.
Numbers with positive and negative values are known as signed integers. You can also get unsigned integers which are restricted to positive numbers, including zero. This means there is a bigger maximum number available of around 2 * MAXINT or 4 billion on a 32 bit computer since we can use the space previously used for representing negative numbers to represent more positive numbers.
Because integers are restricted in size to MAXINT adding two integers together where the total is greater than MAXINT causes the total to be wrong. On some systems/languages the wrong value is just returned as is (usually with some kind of secret flag raised that you can test if you think it might have ben set). Normally an error condition is raised and either your program can handle the error or the program will exit. Python adopts this latter approach while Tcl adopts the former. BASIC throws an error but provides no way to catch it (at least I don't know how!)
We've already seen most of the arithmetic operators that you need in the 'Simple Sequences' section, however to recap:
Operator Example | Description |
---|---|
M + N | Addition of M and N |
M - N | Subtraction of N from M |
M * N | Multiplication of M and N |
M / N | Division, either integer or floating point result depending on the types of M and N. If either M or N are real numbers(see below) the result will be real. |
M % N | Modulo: find the remainder of M divided by N |
M**N | Exponentiation: M to the power N |
We haven’t seen the last one before so let’s look at an example of creating some integer variables and using the exponentiation operator:
>>> i1 = 2 # create an integer and assign it to i1 >>> i2 = 4 >>> i3 = 2**4 # assign the result of 2 to the power 4 to i3 >>> print i3 16
BASIC has somre extra rules around integers. To declare an integer variable in BASIC you can either use a plain unadorned name or you can signal to BASIC that it is an integer we wish to store(this will be slightly more efficient). We do this by ending the name with '%':
FOO = 8 REM FOO can hold any kind of number BAR% = 9 REM BAR can only hold integers
One final gotcha with integer variables in BASIC:
i% = 7 PRINT 2 * i% i% = 4.5 PRINT 2 * i%
Notice that the assignment of 4.5 to i% seemed to work but only the integer part was actually assigned. This is reminiscent of the way Python dealt with division of integers. All programming languages have their own little idiosyncracies like this!
As mentioned earlier Tcl stores everuything internally as strings, however this doesn't really make any diffeence to the user because Tcl converts the values into numbers and back again under the covers, as it were. Thus all the restrictions on number sizes still apply.
Using numbers in Tcl is slightly more complex than in most languages since to do any calculations you have to signal to the interpreter that a calculation is needed. You do that with the expr command:
% put [expr 6 + 5] 11
Tcl ses the square brackets and evaluates that part first, as if it had been typed at the command line. In doing so it sees the expr command and does the calculation. The result is then put to the screen. If you try to put the sum directly Tcl will just print out "6 + 5":
% put 6 + 5 6 + 5
These are fractions. They can represent very large numbers, much bigger than MAXINT, but with less precision. That is to say that 2 real numbers which should be identical may not seem to be when compared by the computer. This is because the computer only approximates some of the lowest details. Thus 4.0 could be represented by the computer as 3.9999999.... or 4.000000....01. These approximations are close enough for most purposes but occasionally they become important! If you get a funny result when using real numbers, bear this in mind.
Floating point numbers have the same operations as integers with the addition of the capability to truncate the number to an integer value.
If you have a scientific or mathematical background you may be wondering about complex numbers? If you aren't you may not even have heard of complex numbers! Anyhow some programming languages, including Python, provide builtin support for the complex type while others provide a library of functions which can operate on complex numbers. And before you ask, the same applies to matrices too.
In Python a complex number is represented as:
(real+imaginaryj)
Thus a simple complex number addition looks like:
>>> M = (2+4j) >>> N = (7+6j) >>> print M + N (9+10j)
All of the integer operations also apply to complex numbers.
Like the heading says, this type has only 2 values - either true or false. Some languages support boolean values directly, others use a convention whereby some numeric value (often 0) represents false and another (often 1 or -1) represents true.
Boolean values are sometimes known as "truth values" because they are used to test whether something is true or not. For example if you write a program to backup all the files in a directory you might backup each file then ask the operating system for the name of the next file. If there are no more files to save it will return an empty string. You can then test to see if the name is an empty string and store the result as a boolean value (true if it is empty). You'll see how we would use that result later on in the course.
Operator Example | Description | Effect |
---|---|---|
A and B | AND | True if A,B are both True, False otherwise. |
A or B | OR | True if either or both of A,B are true. False if both A and B are false |
A == B | Equality | True if A is equal to B |
A != B or A <> B | Inequality | True if A is NOT equal to B. |
not B | Negation | True if B is not True |
Note: the last one operates on a single value, the others all compare two values.
Computer science has built a whole discipline around studying collections and their various behaviours. Sometimes collections are called containers. In this section we will look first of all at the collections supported in Python then we’ll conclude with a brief summary of some other collection types you might come across in other languages.
A list is a sequence of items. What makes it different from an array is that it can keep on growing - you just add another item. But it's not usually indexed so you have to find the item you need by stepping through the list from front to back checking each item to see if it's the item you want. Both Python and Tcl have lists built into the language. In BASIC it's harder and we have to do some tricky programming to simulate them. BASIC programmers usually just create very big arrays instead. Python also allows you to index it's lists. As we will see this is a very useful feature.
Python provides many operations on collections. Nearly all of them apply to Lists and a subset apply to other collection types, including strings which are just a special type of list of characters. To create and access a list in Python we use square brackets. You can create an empty list by using a pair of square brackets with nothing inside, or create a list with contents by separating the values with commas inside the brackets:
>>> aList = [] >>> another = [1,2,3] >>> print another [1, 2, 3]
We can access the individual elements using an index number, where the first element is 0, inside square brackets:
>>> print another[2] 3
We can also change the values of the elements of a list in a similar fashion:
>>> another[2] = 7 >>> print another [1, 2, 7]
You can use negative index numbers to access members from the end of the list. This is most commonly done using -1 to get the last item:
>>> print another[-1] 7
We can also add new elements to the end of a list using the append() operator:
>>> aList.append(42) >>> print aList [42]
We can even hold one list inside another, thus if we append our second list to the first:
>>> aList.append(another) >>> print aList [42, [1, 2, 7]]
Notice how the result is a list of two elements but the second element is itself a list (as shown by the []’s around it). This is useful since it allows us to build up representations of tables or grids using a list of lists. We can then access the element 7 by using a double index:
>>> print aList[1][2] 7
The first index, 1, extracts the second element which is in turn a list. The second index, 2, extracts the third element of the sublist.
The opposite of adding elements is, of course, removing them and to do that we use the del command:
>>> del aList[1] >>> print aList [42]
If we want to join two lists together to make one we can use the same concatenation operator ‘+’ that we saw for strings:
>>> newList = aList + another >>> print newList [42, 1, 2, 7]
In the same way we can apply the repetition operator to populate a list with multiples of the same value:
>>> zeroList = [0] * 5 >>> print zeroList [0, 0, 0, 0, 0]
Finally, we can determine the length of a list using the built-in len() function:
>>> print len(aList) 2 >>> print len(zeroList) 5
Tcl also has a built in list type and a variety of commands for operating on these lists. These commands are identifiable by the 'l' prefix, for example linsert,lappend, lindex, etc. An example of creating a simple Tcl list and accessing a member follows:
% set L [list 1 2 3] % put [lindex $L 2] 3
Not every language provides a tuple construct but in those that do it’s extremely useful. A tuple is really just an arbitrary collection of values which can be treated as a unit. In many ways a tuple is like a list, but with the significant difference that tuples are immutable which is to say that you can’t change them nor append to them once created. In Python, tuples are simply represented by parentheses containing a comma separated list of values, like so:
>>> aTuple = (1,3,5) >>> print aTuple[1] # use indexing like a list 3 >> aTuple[2] = 7 # error, can’t change a tuple’s elements Traceback (innermost last): File "", line 1, in ? aTuple[2] = 7 TypeError: object doesn't support item assignment
The main things to remember are that while parentheses are used to define the tuple, square brackets are used to index it and you can’t change a tuple once its created. Otherwise most of the list operations also apply to tuples.
A dictionary as the name suggests contains a value associated with some key, in the same way that a literal dictionary associates a meaning with a word. The value can be retrieved by ‘indexing’ the dictionary with the key. Unlike a literal dictionary the key doesn’t need to be a character string(although it often is) but can be any immutable type including numbers and tuples. Similarly the values associated with the keys can be any kind of Python data type. Dictionaries are usually implemented internally using an advanced programming technique known as a hash table. For that reason a dictionary may sometimes be referred to as a hash. This has nothing to do with drugs!
Because access to the dictionary values is via the key you can only put in elements with unique keys. Dictionaries are immensely useful structures and are provided as a built-in type in Python although in many other languages you need to use a module or even build your own. We can use dictionaries in lots of ways and we'll see plenty examples later, but for now, here's how to create a dictionary in Python, fill it with some entries and read them back:
>>> dict = {} >>> dict['boolean'] = "A value which is either true or false" >>> dict['integer'] = "A whole number" >>> print dict['boolean'] A value which is either true or false
Notice that we initialise the dictionary with braces, then use square brackets to assign and read the values.
Due to their internal structure dictionaries do not support very many of the collection operators that we’ve seen so far. None of the concatenation, repetition or appending operations work. To assist us in accessing the dictionary keys there is a function that we can use, keys(), which returns a list of all the keys in a dictionary.
If you're getting a bit fed up, you can jump to the next chapter at this point. Remember to come back and finish this one when you start to come across types of data we haven't mentioned so far.
A list of items which are indexed for easy and fast retrieval. Usually you have to say up front how many items you want to store. Lets say I have an array called A, then I can extract the 3rd item in A by writing A[3]. Arrays are fundamental in BASIC, in fact they are the only built in collection type. In Python arrays are simulated using lists and in Tcl arrays are implemented using dictionaries.
An example of an array in BASIC follows:
DIM Myarray(20) REM Create a 20 element array
MyArray(1) = 27
MyArray(2) = 50
FOR i =1 TO 5
PRINT MyArray(i)
NEXT i
Notice that the index starts at 1 in BASIC, this is unusual and in most languages the index will start at 0. There are no other operations on arrays, all you can do is create them, assign values and read values.
Think of a stack of trays in a restaurant. A member of staff puts a pile of clean trays on top and these are removed one by one by customers. The trays at the bottom of the stack get used last (and least!). Data stacks work the same way: you push an item onto the stack or pop one off. The item popped is always the last one pushed. This property of stacks is sometimes called Last In First Out or LIFO. One useful property of stacks is that you can reverse a list of items by pushing the list onto the stack then popping it off again. The result will be the reverse of the starting list. Stacks are not built in to Python, Tcl or BASIC. You have to write some program code to implement the behaviour. Lists are usually the best starting point since like stacks they can grow as needed.
A bag is a collection of items with no specified order and it can contain duplicates. Bags usually have operators to enable you to add, find and remove items. In Python and Tcl bags are just lists. In BASIC you must build the bag from a large array.
A set has the property of only storing one of each item. You can usually test to see if an item is in a set (membership). Add, remove and retrieve items and join two sets together in various ways corresponding to set theory in math (eg union, intersect etc). None of our sample languages implement sets directly but they can be easily implemented in both Python and Tcl by using the built in dictionary type.
A queue is rather like a stack except that the first item into a queue is also the first item out.This is known as First In First Out or FIFO behaviour.
There's a whole bunch of other collection types but these are the main ones that you might see. (In fact we'll only be dealing with a few of these in this tutor!)
As a computer user you know all about files - the very basis of nearly everything we do with computers. It should be no surprise then, to discover that most programming languages provide a special file type of data. However files and the processing of them are so important that I will defer discussing them till later when they get a whole section to themselves.
Dates and times are often given dedicated types in programming. At other times they are simply represented as a large number (typically the number of seconds from some arbitrary date/time!). In other cases the data type is what is known as a complex type as described in the next section. This usually makes it easier to extract the month, day, hour etc.
Sometimes the basic types described above are inadequate even
when combined in collections. Sometimes what we want to do is
group several bits of data together then treat it as a single
item. An example might be the description of an address:
a house number, a street and a town. Finally there's the
post code or zip code.
Most languages allow us to group such information together in a record or structure.
In BASIC such a record definition looks like:
Type Address HsNumber AS INTEGER Street AS STRING * 20 Town AS STRING * 15 ZipCode AS STRING * 7 End Type
The number after the STRING is simply the maximum length of the string.
In Python it's a little different:
>>>class Address: ... def __init__(self, Hs, St, Town, Zip): ... self.HsNumber = Hs ... self.Street = St ... self.Town = Town ... self.ZipCode = Zip ...
That may look a little arcane but don't worry I’ll explain what the def __init__(...) and self bits mean in the section on object orientation. One thing to note is that there are two underscores at each end on __init__. This is a Python convention that we will discuss later. Some people have had problems trying to type this example in at the Python prompt. At the end of this chapter you will find a box with more explanation, but you can just wait till we get the full story later in the course if you prefer. If you do try typing it into Python then please make sure you copy the indentation shown. As you'll see later Python is very particular about indentation levels.
The main thing I want you to recognise in all of this is that we have gathered several pieces of data into a single structure.
We can assign a complex data type to a variable too, but to access the individual fields of the type we must use some special access mechanism (which will be defined by the language). Usually this is a dot.
To consider the case of the address type we defined above we would do this in BASIC:
DIM Addr AS Address Addr.HsNumber = 7 Addr.Street = "High St" Addr.Town = "Anytown" Addr.ZipCode = "123 456" PRINT Addr.HsNumber," ",Addr.Street
And in Python, assuming you have already typed in the class definition above:
Addr = Address(7,"High St","Anytown","123 456") print Addr.HsNumber, Addr.Street
Which creates an instance of our Address type and assigns it to the variable addr. We then print out the HsNumber and Street fields of the newly created instance using the dot operator. You could, of course, create several new Address type variables each with their own individual values of house number, street etc.
In Tcl the nearest approximation to complex types is to simply store the fields in a list. You need to remember the sequence of the fields so that you can extract them again. This could be simplified a little by assigning the field numbers to variables, in this way the example above would look like:
set HsNum 0 set Street 1 set Town 2 set zip 3 set addr [list 7 "High St" "Anytown" "123 456"] puts [format "%s %s" [lindex $addr $HsNum] [lindex $addr $Street]]
Note the use of the Tcl format string and the nested sets of '[]'s
User defined types can, in some languages, have operations defined too. This is the basis of what is known as object oriented programming. We dedicate a whole section to this topic later but essentially an object is a collection of data elements and the operations associated with that data, wrapped up as a single unit. Python uses objects extensively in its standard library of modules and also allows us as programmers to create our own object types.
Object operations are accessed in the same way as data members of a user defined type, via the dot operator, but otherwise look like functions. These special functions are called methods. We have already seen this with the append() operation of a list. Recall that to use it we must tag the function call onto the variable name:
>>> listObject = [] # an empty list >>> listObject.append(42) # a method call of the list object >>> print listObject [42]
When an object type, known as a class, is provided in a module we must import the module (as we did with sys earlier, then prefix the object type with the module name to create an instance that we can store in a variable. We can then use the variable without using the module name.
We will illustrate this by considering a fictitious module meat which provides a Spam class. We import the module, create an instance of Spam and access its operations and data like so:
>>> import meat >>> mySpam = meat.Spam() # create an instance, use module name >>> mySpam.slice() # use a Spam operation >>> print mySpam.ingredients # access Spam data {"Pork":"40%", "Ham":"45%", "Fat":"15%"}
Other than the need to create an instance, there’s no real difference between using objects provided within modules and functions found within modules. Think of the object name simply as a label which keeps related functions and variables grouped together.
Another way to look at it is that objects represent real world things, to which we as programmers can do things. That view is where the original idea of objects in programs came from: writing computer simulations of real world situations.
Neither QBASIC nor Tcl provide facilities for adding operators to complex types. There are however add on libraries for Tcl which allow this and the more modern Visual Basic dialect of BASIC does permit this.
In this tutor my primary objective is to teach you to program and although I use Python in the tutor there is no reason why, having read this, you couldn’t go out and read about another language and use that instead. Indeed that’s exactly what I expect you to do since no single programming language, even Python, can do everything. However because of that objective I do not teach all of the features of Python but focus on those which can generally be found in other languages too. As a result there are several Python specific features which, while they are quite powerful, I don’t describe at all, and that includes special operators. Most programming languages have operations which they support and other languages do not. It is often these 'unique' operators that bring new programming languages into being, and certainly are important factors in determining how popular the language becomes.
For example Python supports such relatively uncommon operations as list slicing ( spam[X:Y] ) and tuple assignment ( X, Y = 12, 34 ). It also has the facility to perform an operation on every member of a collection using its map() function. There are many more, it’s often said that "Python comes with the batteries included". For details of how these Python specific operations work you’ll need to consult the Python documentation.
Finally, it’s worth pointing out that although I say they are Python specific, that is not to say that they can’t be found in any other languages but rather that they will not all be found in every language. The operators that we cover in the main text are generally available in some form in virtually all modern programming languages.
That concludes our look at the raw materials of programming, let’s move onto the more exciting topic of technique and see how we can put these materials to work.
More information on the Address exampleAlthough, as I said earlier, the details of this example are explained later, some readers have found difficulty getting the example to work. This note gives a line by line explanation of the Python code: The complete code for the example looks like this: >>> class Address: ... def __init__(self, Hs, St, Town, Zip): ... self.HsNumber = Hs ... self.Street = St ... self.Town = Town ... self.Zip_Code = Zip ... >>> Addr = Address(7,"High St","Anytown","123 456") >>> print Addr.HsNumber, Addr.Street Here is the explanation:
>>> class Address: The class statement tells Python that we are about to define a new type called, in this case, Address. The colon indicates that any indented lines following will be part of the class definition. The definition will end at the next unindented line. If you are using IDLE you should find that the editor has indented the next line for you, if working at a command line Python prompt in an MS DOS window then you will need to manually indent the lines as shown. Python doesn't care how much you indent by, just so long as it is consistent. ... def __init__(self, Hs, St, Town, Zip): The first item within our class is what is known as a method definition. One very important detail is that the name has a double underscore at each end, this is a Python convention for names that it treats as having special significance. This particular method is called __init__ and is a special operation, performed by Python, when we create an instance of our new class, we'll see that shortly. The colon, as before, simply tells Python that the next set of indented lines will be the actual definition of the method. ... self.HsNumber = Hs This line plus the next three, all assign values to the internal fields of our object. They are indented from the def statement to tell Python that they constitute the actual definition of the __init__ operation.The blank line tells the Python interpreter that the class definition is finished so that we get the >>> prompt back. >>> Addr = Address(7,"High St","Anytown","123 456") This creates a new instance of our Address type and Python uses the __init__ operation defined above to assign the values we provide to the internal fields. The instance is assigned to the Addr variable just like an instance of any other data type would be. >>> print Addr.HsNumber, Addr.Street Now we print out the values of two of the internal fields using the dot operator to access them. As I said we cover all of this in more detail later in the tutorial. The key point to take away is that Python allows us to create our own data types and use them pretty much like the built in ones. |
Points to remember |
---|
|
Previous  Next  Contents
If you have any questions or feedback on this page
send me mail at:
alan.gauld@btinternet.com