Data

In almost any creative activity we need three basic ingredients: tools, materials and techniques. For example when I paint the tools are my brushes, pencils and palettes. The techniques are things like ‘washes’, wet on wet, blending, spraying etc. Finally the materials are the paints, paper and water. Similarly when I program, my tools are the programming languages, operating systems and hardware. The techniques are the programming constructs that we discussed in the previous section and the material is the data that I manipulate. In this chapter we look at the materials of programming.

This is quite a long section and by its nature you might find it a bit dry. The good news is that you don’t need to read it all at once. The chapter starts off by looking at the most basic data types available, then moves on to how we handle collections of items and finally looks at some more advanced material. It should be possible to drop out of the chapter after the collections material, cover a couple of the following chapters and then come back to this one as we start to use the more advanced bits.

Data

Data is one of those terms that everyone uses but few really understand. My dictionary defines it as:

That's not too much help but at least gives a starting point. Let’s see if we can clarify things by looking at how data is used in programming terms. Data is the “stuff”, the raw information, that your program manipulates. Without data a program cannot perform any useful function. Programs manipulate data in many ways, often depending on the type of the data. Each data type also has a number of operations - things that you can do to it. For example we’ve seen that we can add numbers together. Addition is an operation on the number type of data. Data comes in many types and we’ll look at each of the most common types and the operations available for that type.

Variables

Knowing what data looks like is fine, so far as it goes but to manipulate it we need to be able to access it and that’s what variables are used for. In programming terms we can create instances of data types and assign them to variables. An instance, in turn, is just an occurrence of a particular piece of data such as 3 or 5, which are instances of numbers. A variable is a reference to a specific area somewhere in the computer's memory. These areas hold the data. In some computer languages a variable must be declared to match the type of data that it points to. Any attempt to assign the wrong type of data to such a variable will cause an error. Some programmers prefer this type of system, known as static typing because it can help to prevent some subtle bugs which are hard to detect. All of the languages we use are more flexible and a variable can be assigned any kind of data, this is known as dynamic typing.

Variable names follow certain rules dependent on the programming language. Every language has its own rules about which characters are allowed or not allowed. Some languages, including Python and JavaScript, take notice of the case and are therefore called case sensitive languages, others, like VBScript don't care. Case sensitive languages require a little bit more care from the programmer to avoid mistakes, but a consistent approach to naming variables will help a lot. One common style which we will use a lot is to start variable names with a lower case letter and use a capital letter for each first letter of subsequent words in the name, like this:

We won't discuss the specific rules about which characters are legal in our languages but if you consistently use a style like that shown you shouldn't have too many problems. (Another commonly used naming style is to separate the words of the name with underscores as in: another_variable_name)

In Python a variable takes the type of the data assigned to it. It will keep that type and you will be warned if you try to mix data in strange ways - like trying to add a string to a number. (Recall the example error message? It was an example of just that kind of error.) We can change the type of data that a variable points to by reassigning the variable.

Note that the variable q was set to point to the number 7 initially. It maintained that value until we made it point at the character string "Seven". Thus, Python variables maintain the type of whatever they point to, but we can change what they point to simply by reassigning the variable. This is the dynamic typing that we mentioned earlier in action. We can check the type of a variable by using the type() function:

At the point of reassignment the original data is 'lost' and Python will erase it from memory (unless another variable points at it too) this erasing is known as garbage collection.

Garbage collection can be likened to the mail room clerk who comes round once in a while and removes any packets that are in boxes with no labels. If he can't find an owner or address on the packets he throws them in the garbage. Let’s take a look at some examples of data types and see how all of this fits together.

VBScript and JavaScript variables

Both JavaScript and VBScript introduce a subtle variation in the way we use variables. In both languages it is considered good practice that variables be declared before being used. This is a common feature of compiled languages and of statically typed languages. Even in a dynamically typed language, like the ones we use, there is a big advantage in doing this in that, if a spelling error is made when using a variable, the translator can detect that an unknown variable has been used and flag an error. The disadvantage is, of course, some extra typing required by the programmer.

VBScript

In VBScript the declaration of a variable is done via the Dim statement, which is short for Dimension. This is a throw-back to VBScript's early roots in BASIC and in turn to Assembler languages before that. In those languages you had to tell the assembler how much memory a variable would use - its dimensions. The abbreviation has carried through from there.

Once declared we can proceed to assign values to it just like we did in Python. We can declare several variables in the one Dim statement by listing them separated by commas:

There is another keyword, Let that you may occasionally see. This is another throwback to BASIC and because it's not really needed you very rarely see it. In case you do, it's used like this:

JavaScript

In JavaScript you can pre-declare variables with the var keyword and, like VBScript, you can list several variables in a single var statement:

JavaScript also allows you to initialize (or define) the variables as part of the var statement. Like this:

This saves a little typing but otherwise is no different to VBScript's two step approach to variables. You can also declare and initialise JavaScript variables without using var, in the same way that you do in Python:

But JavaScript aficionados consider it good practice to use the var statement, so I will do so in this tutor.

Hopefully this brief look at VBScript and JavaScript variables has demonstrated the difference between declaration and definition of variables. Python variables are declared by defining them.

Typing - Strict, dynamic, static, etc

You probably noticed the references to static and dynamic typing in the text above. Other terms that you may come across include strict (or occasionally strong) and loose (or weak) typing. This note just tries to clarify the differences in these terms.

Variables are created either as flexible data types or as fixed data types. A fixed type is said to be statically typed - the type never changes. A variable which can refer to multiple data types is dynamically typed. So static and dynamic refer to the kinds of data that can be stored in a variable.

When we are programming we can combine variables in different operations and each operation requires its variables to have certain types such as numbers or characters. If the language refuses to allow an operation to use the wrong types this is known as strict typing. However some languages will attempt to coerce the values into valid types, while others will allow any type but then the operation will fail resulting in an error. This is loose typing. So strict and loose (or strong and weak) typing refers to the checking of types used in operations).

All of our languages use strong dynamic typing

Primitive Data Types

Primitive data types are so called because they are the most basic types of data we can manipulate. More complex data types are really combinations of the primitive types. These are the building blocks upon which all the other types are built, the very foundation of computing. They include letters, numbers and something called a boolean type.

Character Strings

We've already seen these. They are literally any string or sequence of characters that can be printed on your screen. (In fact there can even be non-printable control characters too).

Strings are displayed as characters but internally the computer stores them as sequences of numbers. The mapping of numbers to characters is known as the character encoding. There are lots of different encodings in use, although nowadays most things use one of a set of 3 encodings defined by the Unicode standard. We discuss Unicode more in a later topic but its purpose is to provide a single encoding scheme that can represent any letter, from any alphabet, anywhere in the world, as well as all the special characters needed on computers (things like escape, or control, and all the different types of bracket or parentheses, and so on). Most of the time you can ignore this but occasionally you will need to convert between the numeric codes and the characters - eg. recall the issue with quotes in the VBSCript example in the previous topic?

One special use of the latter form is to build in documentation for Python functions that we create ourselves - we'll see this later. (You can use triple single quotes but I do not recommend that since it can become hard to tell whether it is triple single quotes or a double quote and a single quote together.)

You can access the individual characters in a string by treating it as an array of characters (see arrays below). There are also usually some operations provided by the programming language to help you manipulate strings - find a sub string, join two strings, copy one to another etc.

It is worth pointing out that some languages have a separate type for characters themselves, that is for a single character. In this case strings are literally just collections of these character values. Python by contrast just uses a string of length 1 to store an individual character, no special syntax is required.

String Operators

There are a number of operations that can be performed on strings. Some of these are built in to Python but many others are provided by modules that you must import (as we did with sys in the Simple Sequences section).

Operator	Description
S1 + S2	Concatenation of S1 and S2
S1 * N	N repetitions of S1

There are lots of other things we can do with strings but we'll look at those in more detail in a later topic after we've gained a bit more basic knowledge. One important thing to note about strings in Python is that they cannot be modified. That is, you can only create a new string with some of the characters changed but you cannot directly alter any of the characters within a string. A data type that cannot be altered is known as an immutable type.

VBScript String Variables

In VBScript all variables are called variants, that is they can hold any type of data and VBScript tries to convert it to the appropriate type as needed. Thus you may assign a number to a variable but if you use it as a string VBScript will try to convert it for you. In practice this is similar to what Python's print command does but extended to any VBScript command. You can give VBScript a hint that you want a numeric value treated as a string by enclosing it in double quotes:

We can join VBScript strings together, a process known as concatenation, using the & operator:

JavaScript Strings

JavaScript strings are enclosed in either single or double quotes. In JavaScript you should declare variables before we use them. This is easily done using the var keyword. Thus to declare and define two string variables in JavaScript we do this:

Finally JavaScript also allows us to create String objects. We will discuss objects a little later in this topic but for now just think of String objects as being strings with some extra features. The main difference is that we create them slightly differently:

You are probably thinking thats an awful lot of extra typing to achieve the same as before? You would be right in this case, but string objects do offer some advantages in other situations as we will see later.

Integers

Integers are whole numbers from a large negative value through to a large positive value. That’s an important point to remember. Normally we don’t think of numbers being restricted in size but on a computer there are upper and lower limits. The size of this upper limit is known as MAXINT and depends on the number of bits used on your computer to represent a number. On most current computers and programming languages it's 64 bits so MAXINT is around 10 to the power 19. In fact Python can tell us the maxint value using the sys module. On a 64 bit PC the value of sys.maxsize is: 9223372036854775807 which is a very big number. (However, for historic reasons, VBScript is limited to about +/-32000).

Numbers with positive and negative values are known as signed integers. You can also get unsigned integers which are restricted to positive numbers, including zero. This means there is a bigger maximum number available of around 2 * MAXINT since we can use the space previously used for representing negative numbers to represent more positive numbers.

Because integers are restricted in size to MAXINT adding two integers together where the total is greater than MAXINT causes the total to be wrong. On some systems/languages the wrong value is just returned as is (usually with some kind of secret flag raised that you can test if you think it might have been set). Normally an error condition is raised and either your program can handle the error or the program will exit. VBScript and JavaScript both convert the number into a different format that they can handle, albeit with a small loss of accuracy. Python is a little different in that Python uses something called a Long Integer, which is a Python specific feature allowing virtually unlimited size integers.

Notice that the result, although considered an int type by Python is much bigger than the value you would normally expect from a computer. The equivalent code in VBScript or JavaScript results in the number being displayed in a different format to the integer we expect. We'll find out more about that in the section on Real Numbers below.

Arithmetic Operators

We've already seen most of the arithmetic operators that you need in the 'Simple Sequences' section, however to recap:

We haven’t seen the last one before so let’s look at an example of creating some integer variables and using the exponentiation operator:

Operator Example	Description
M + N	Addition of M and N
M - N	Subtraction of N from M
M * N	Multiplication of M and N
M / N	Division of M by N. The result will be a real number (see below)
M // N	Integer division of M by N. The result will be an integer
M % N	Modulo: find the remainder of M divided by N
M**N	Exponentiation: M to the power of N

Python also has the math module that contains common math functions (such as sin, cos etc),

Shortcut operators

One very common operation that is carried out while programming is incrementing a variable's value. Thus if we have a variable called x with a value of 42 and we want to increase its value to 43 we can do it like this:

This is meaningless in mathematics but in programming it is. What it means is that x takes on the previous value of x plus 1. If you have done a lot of math this might take a bit of getting used to, but basically the equal sign in this case could be read as becomes. So that it reads: x becomes x + 1.

Now it turns out that this type of operation is so common in practice that Python (and JavaScript) provides a shortcut operator to save some typing:

This means exactly the same as the previous assignment statement but is shorter. And for consistency similar shortcuts exist for the other arithmetic operators:

VBScript Integers

Operator Example	Description
M += N	M = M + N
M -= N	M = M - N
M *= N	M = M * N
M /= N	M = M / N
M //= N	M = M // N
M %= N	M = M % N

As I said earlier VBScript integers are limited to a lower value of MAXINT than Python or JavaScript (corresponding to a 16 bit value) namely about +/-32000. If you need an integer bigger than that you can use a long integer which is the same size as a standard 32bit integer(about 2 billion). There is also a byte type which is an 8 bit number with a maximum size of 255. In practice you will usually find the standard integer type sufficient. If the result of an operation is bigger than MAXINT then VBScript automatically converts the result to a floating point (or real) number (see below).

All the usual arithmetic operators are supported. Modulo is represented differently in VBScript, using the MOD operator. (We actually saw that in the Simple Sequences topic.) Exponentiation too is different with the caret (^) symbol being used instead of Python's **.

JavaScript Numbers

It will be no surprise to discover that JavaScript too has a numeric type. It too is an object as we'll describe later and its called a Number, original eh? :-)

A JavaScript number can also be Not a Number or NaN. This is a special version of the Number object which represents invalid numbers, usually the result of some operation which is mathematically impossible. (Python actually has a NaN too.) The point of NaN is that it allows us to check for certain kinds of error without actually breaking the program. JavaScript also has special number versions to represent positive and negative infinity, a relatively rare feature in a programming language. JavaScript number objects can be either integers or real numbers, which we look at next.

JavaScript uses mostly the same operators as Python but exponentiation is done using a special JavaScript object called Math. We will cover this a bit later in the tutorial when we take a closer look at modules.

Real Numbers

These include fractions. (I'm using the Oxford English Dictionary definition of fraction here. Some US correspondents tell me the US term fraction means something more specific. I simply mean any number that is not a whole number). They can represent very large numbers, much bigger than MAXINT, but with less precision. That is to say that 2 real numbers which should be identical may not seem to be when compared by the computer. This is because the computer only approximates some of the lowest details. Thus 5.0 could be represented by the computer as 4.9999999.... or 5.000000....01. These approximations are close enough for most purposes but occasionally they become important! If you get a funny result when using real numbers, bear this in mind.

Real numbers, also known as Floating Point numbers have the same operations as integers with the addition of the capability to truncate the number to an integer value.

Python, VBScript and JavaScript all support real numbers. In Python we create them by simply specifying a number with a decimal point in it, as we saw in the Simple Sequences topic. In VBScript and JavaScript there is no clear distinction between integers and real numbers, just use them and mostly the language will pretty much sort itself out.

Complex or Imaginary Numbers

If you have a scientific or mathematical background you may be wondering about complex numbers? If you haven't, you may not even have heard of complex numbers, in which case you can safely jump to the next heading because you don't need them! Anyhow some programming languages, including Python, provide built in support for the complex type while others provide a library of functions which can operate on complex numbers. And before you ask, the same applies to matrices too.

All of the integer operations also apply to complex numbers. There is also the cmath module that contains common math functions that work on complex number values.

This strange sounding type is named after a 19th century mathematician, George Boole who studied logic. Like the heading says, this type has only 2 values - either true or false. Some languages support Boolean values directly, others use a convention whereby some numeric value (often 0) represents false and another (often 1 or -1) represents true. Up until version 2.2 Python did this, however since version 2.3 Python supports Boolean values directly, using the values True and False. (remember Python is case sensitive!)

Boolean values are sometimes known as "truth values" because they are used to test whether something is true or not. For example if you write a program to backup all the files in a directory you might backup each file then ask the operating system for the name of the next file. If there are no more files to save it will return an empty string. You can then test to see if the name is an empty string and store the result as a boolean value (True if it is empty, False if it isn't). You'll see how we would use that result later on in the tutorial.

Boolean (or Logical) Operators

Operator Example	Description	Effect
A and B	AND	True if A,B are both True, False otherwise.
A or B	OR	True if either or both of A,B are true. False if both A and B are false
A == B	Equality	True if A is equal to B
A != B	Inequality	True if A is NOT equal to B.
not B	Negation	True if B is not True

Note: the last one operates on a single value, the others all compare two values.

JavaScript also supports a Boolean type but this time the values are true and false (note, with a lowercase first letter).

Finally the different languages have slightly different names for the Boolean type internally, in Python it is bool, in VBScript and JavaScript it is Boolean. Most of the time you won't need to worry about that because we tend not to create variables of Boolean types but simply use the results in tests.

Collections

Computer science has built a whole discipline around studying collections and their various behaviors. Sometimes collections are called containers or sequences. In this section we will look first of all at the collections supported in Python, VBScript and JavaScript, then we’ll conclude with a brief summary of some other collection types you might come across in other languages.

List

We are all familiar with lists in everyday life. A list is just a sequence of items. We can add items to a list or remove items from the list. Usually, where the list is written on paper we can't insert items in the middle of a list, only at the end. However, if the list is in electronic format - in a word processor say - then we can insert items anywhere in the list.

We can also search a list to check whether something is already in the list or not. But you have to find the item you need by stepping through the list from top to bottom checking each item to see if it's the item you want. Lists are a fundamental collection type found in many modern programming languages.

Python lists are built into the language. They can do all the basic list operations we discussed above and in addition have the ability to index the elements inside the list. By indexing I mean that we can refer to a list element by its sequence number (assuming the first element starts at zero).

In VBScript there are no lists as such but other collection types which we discuss later can simulate their features.

In JavaScript there are no lists as such but almost everything you need to do with a list can be done using a JavaScript array which is another collection type that we discuss a little later.

Python provides many operations on collections. Nearly all of them apply to Lists and a subset apply to other collection types, including strings which are just a special type of list - a list of characters. To create and access a list in Python we use square brackets. You can create an empty list by using a pair of square brackets with nothing inside, or create a list with contents by separating the values with commas inside the brackets:

We can access the individual elements using an index number, where the first element is 0, inside square brackets. For example to access the third element, which will be index number 2 since we start from zero, we do this:

You can use negative index numbers to access members from the end of the list. This is most commonly done using -1 to get the last item:

We can even hold one list inside another, thus if we append our second list to the first:

Notice how the result is a list of two elements but the second element is itself a list (as shown by the []s around it). We can now access the element 7 by using a double index:

The first index, 1, extracts the second element which is in turn a list. The second index, 2, extracts the third element of the sublist.

This nesting of lists one inside the other is extremely useful since it effectively allows us to build tables of data, like this:

We could use this to create an address book where each entry was a list of name and address details. For example, here is such an address book with two entries:

Notice that although we entered four lines of text Python treats it as a single line of input, as we can tell from the ... prompts. That is because Python sees that the number of opening and closing brackets don't match and keeps on reading input until they do. This can be a very effective way of quickly constructing complex data structures while making the overall structure - a list of lists in this case - clear to the reader. (If you are using IDLE you won't see the ... prompt, just a blank line.)

As an exercise try extracting Fred's telephone number - element 3, from the first row - remembering that the indexes start at zero. Also try adding a few new entries of your own using the append() operation described above.

Note that when you exit Python your data will be lost, however you will find out how to preserve it once we reach the topic on files.

The opposite of adding elements is, of course, removing them and to do that we use the del command:

Notice that del does not require parentheses around the value, unlike the print function. This is because del is technically a command not a function. The distinction is subtle and you can put parentheses around the value for consistency if you prefer, it will still work OK.

If we want to join two lists together to make one we can use the same concatenation operator ‘+’ that we saw for strings:

Notice that this is slightly different to when we appended the two lists earlier, then there were 2 elements, the second being a list, this time there are 4 elements because the elements of the second list have each, individually, been added to newList. This time if we access element 1, instead of getting a sublist, as we did previously, we will only get 1 returned:

We can also apply the multiplication sign as a repetition operator to populate a list with multiples of the same value:

We can find the index of a particular element in a list using the index() operation, like this:

Notice that trying to find the index of something that's not in the list results in an error. We can check whether something is in a collection using the in operator, like this:

Notice that the results were Boolean values; either True or False. We will see how we can use these results in a later topic.

Finally, we can determine the length of a list using the built-in len() function:

Neither JavaScript nor VBScript directly support a list type although as we will see later they do have an Array type that can do many of the things that Python's lists can do.

Tuple

Not every language provides a tuple construct but in those that do it’s extremely useful. A tuple is really just an arbitrary collection of values which can be treated as a unit. In many ways a tuple is like a list, but with the significant difference that tuples are immutable which, you may recall, means that you can’t change them nor append to them once created. In Python, tuples are simply represented by a comma separated list of values, like so:

In practice it is usual to surround the sequence with parentheses. These help to ensure that the values are not taken out of context as well as providing a visual reminder that they are a single unit, a tuple. The following illustrates these points:

In the first case the multiplication applies only to the last element of the tuple, in the second the entire tuple is multipled. To avoid any such ambiguity I will always use parentheses when describing tuples.

The main things to remember are that while parentheses are used to define the tuple, square brackets are used to index it and you can't change a tuple once it's created. Otherwise most of the list operations also apply to tuples.

Finally, although you cannot change a tuple you can effectively add members using the addition operator because this actually creates a new tuple. Like this:

If we didn't use the trailing comma after the 4 then Python would have interpreted it as the integer 4 inside parentheses, not as a true tuple. But since you can't add integers to tuples it results in an error, so we add the comma to tell Python to treat the parentheses as a tuple. Any time you need to persuade Python that a single entry tuple really is a tuple add a trailing comma as we did here.

Dictionary or Hash

In the same way that a literal dictionary associates a meaning with a word a dictionary type contains a value associated with a key, which may or may not be a string. The value can be retrieved by ‘indexing’ the dictionary with the key. Unlike a literal dictionary, the key doesn’t need to be a character string (although it often is) but can be any immutable type including numbers and tuples. Similarly the values associated with the keys can have any kind of data type. Dictionaries are usually implemented internally using an advanced programming technique known as a hash table. For that reason a dictionary may sometimes be referred to as a hash. This has nothing to do with drugs! :-)

Because access to the dictionary values is via the key, you can only put in elements with unique keys. (Although a key can refer to a list of values.) Dictionaries are immensely useful structures and are provided as a built-in type in Python although in many other languages you need to use a module or even build your own. We can use dictionaries in lots of ways and we'll see plenty examples later, but for now, here's how to create a dictionary in Python, fill it with some entries and read them back:

Notice that we initialize the dictionary with braces, then use square brackets to assign and read the values. (We can also use the type function dict() to return an empty dictionary.)

Just as we did with lists we can initialize a dictionary as we create it using the following format:

The key and value are separated by a colon and the pairs are separated by commas.

You can also specify a dictionary using a slightly different format (see below), which style you prefer is mainly a matter of taste!

Notice you don't need quotes around the key in the definition because Python assumes it is a string (but you still need them to extract the values). In practice this limits its usefulness so most programmers prefer and use the first version using braces.

Either way we have made our address book out of a dictionary which is keyed by name and stores our lists as the values. Rather than work out the numerical index of the entry we want we can just use the name to retrieve all the information, like this:

In the second case we indexed the returned list to get only the telephone number. By creating some variables and assigning the appropriate index values we can make this much easier to use:

Notice that whereas 'Rose' was in quotes because the key is a string, the town is not because it is a variable name and Python will convert it to the index value we assigned, namely 2. At this point our Address Book is beginning to resemble a usable database application, thanks largely to the power of dictionaries. It won't take a lot of extra work to save and restore the data and add a query prompt to allow us to specify the data we want. We will do that as we progress through the other tutorial topics.

Of course we could use a dictionary to store the data too, then our address book would consist of a dictionary whose keys were the names and the values were dictionaries whose keys were the field names, like this:

Notice that this is a very readable format although it requires a lot more typing. Data stored in a format where its meaning and content are combined in a human readable format is often referred to as self-documenting data. Also, when we include a data structure inside another identical structure - a dictionary inside a dictionary in this case - we call that nesting and the inner dictionary would be called the nested dictionary.

In practice we access this data in a very similar way to the list with named indexes:

Notice the extra quotes around town. Otherwise it's exactly the same. One advantage of using this approach is that we can insert new fields and the existing code will not break whereas with the named indexes we would need to go back and change all of the index values. If we used the same data in several programs that could be a lot of work. Thus a little bit of extra typing now could save us a lot of extra effort in the future.

Due to their internal structure dictionaries do not support very many of the collection operators that we’ve seen so far. None of the concatenation, repetition or appending operations work. (Although you can of course assign new key/value pairs directly as we saw at the beginning of the section.) To assist us in accessing the dictionary keys there is an operation that we can use, keys(), which allows us to get a list of all the keys in a dictionary. For example to get a list of all the names in our address book we could do:

Note that we had to use list() to get the actual key values. If you omit the list() you will get a slightly odd result which I won't discuss till later. Note too that dictionaries do not store their keys in the order in which they are inserted so you may find the keys appear in a strange order, indeed the order may even change over time. Don't worry about that, you can still use the keys to access your data and the right value will still come out OK. (Incidentally you can get a list of all the values too using an operation called values(), try that on the address book and see if you can get it to work. Use the keys() example above as a pattern.) You can also use the in operator on a dictionary. It will tell you whether a particular value exists as a key of the dictionary.

VBScript Dictionaries

VBScript provides a dictionary object which offers similar facilities to the Python dictionary but the usage is slightly different. To create a VBScript dictionary we have to declare a variable to hold the object, then create the object, finally we can add entries to the new dictionary, like this:

Notice that the CreateObject function specifies that we are creating a "Scripting.Dictionary" object, that is a Dictionary object from the VBScript's Scripting module. Don't worry too much about that for now, we'll discuss it in more depth when we look at objects later in the tutor. Hopefully you can at least recognize and recall the concept of using an object from a module from the simple sequences topic earlier. The other point to notice is that we must use the keyword Set when assigning an object to a variable in VBScript.

There are also operations to remove an item, get a list of all the keys, check that a key exists etc. Also you can store any kind of object in the dictionary not just strings. That, in turn, means you can create nested dictionaries since a dictionary is itself an object.

Here is a complete but simplified version of our address book example in VBScript:

This time, instead of using a list, we have stored all the data as a single string. (This of course makes it much harder to extract individual fields as we did with the list or dictionary.) We then access and print Rose's details in a message box.

JavaScript Dictionaries

JavaScript doesn't really have a dictionary object of its own, although if you are using Internet Explorer you can get access to the VBScript Scripting.Dictionary object discussed above, with all of the same facilities. But since it's really the same object I won't cover it further here. Finally JavaScript arrays can be used very much like dictionaries but we'll discuss that in the array section below.

If you're getting a bit fed up of all these data types and would prefer to write some more code, you can jump to the next topic at this point. Do remember to come back and finish this one when you start to come across types of data we haven't mentioned so far. They'll probably be lurking in the next few sections.

Other Collection Types

Array or Vector

The array is one of the earlier collection types in computing history. It is basically a list of items which are indexed for easy and fast retrieval. Usually you have to say up front how many items you want to store and usually you can only store data of a single type. These fixed size and fixed type features are what distinguishes arrays from the list data type discussed above. (Notice I said "usually" above. That's because different languages have widely different ideas of what exactly constitutes an array that it is hard to make definite rules.)

Python supports arrays through a module (called array) but it is rarely needed, except for high performance math computations, because the built in list type can usually be used instead. VBScript and JavaScript both have arrays as a data type, so let's briefly look at how they are used:

VBScript Arrays

In VBScript an array is a fixed length collection of data accessed by a numerical index. It is declared and accessed like this:

Note the use of the Dim keyword. This dimensions the variable. This is a way of telling VBScript about the variable, if you start your script with OPTION EXPLICIT VBScript will expect you to Dim any variables you use, which many programming experts believe is good practice and leads to more reliable programs. Also notice that we specify the last valid index, 42 in our example, which means the array actually has 43 elements because it starts at 0.

Notice also that in VBScript we use parentheses to dimension and index the array, not the square brackets used in Python and, as we'll soon see, JavaScript. Finally, recall that I said arrays usually only store one type of data? Well in VBScript there is only one official type of data: the Variant, which in turn can store any kind of VBScript value. So a VBScript array only stores Variants, which, in practice, means they can store anything! Confusing? It is if you think about it too much, so don't, just use them!

Finally notice that with an array we can assign to any position, including the last, without filling in the intermediate positions. In VBScript the uninitialised elements will be set to the special value Null.

As with Python lists, we can declare multiple dimensional arrays to model tables of data, for our address book example:

Unfortunately there is no easy way to populate the data all in one go as we did with Python's lists, we have to populate each field one by one. (VBScript does have an Array function that can populate an array but we would need to nest these and it quickly becomes clunky and hard to read.) If we combine VBScript's dictionary and array capability we get almost the same usability as we did with Python. It looks like this:

The final aspect of VBScript arrays that I want to consider is the fact that they don't need to be fixed in size at all! However this does not mean we can just arbitrarily keep adding elements as we did with our lists, rather we can explicitly resize an array. For this to happen we need to declare a Dynamic array which we do, quite simply by omitting the size, like this:

Notice that the default behaviour is to delete all of the array contents when you resize. To prevent that you must explicitly tell VBScript to Preserve the contents.

As you can see this is not so convenient as a list which adjusts its length automatically, but it does give the programmer more control over how the program behaves. This level of control can, amongst other things improve security since some viruses can exploit dynamically re-sizeable data stores.

JavaScript Arrays

Arrays in JavaScript are in many ways a misnomer. They are called arrays but are actually a curious mix of the features of lists, dictionaries and traditional arrays. At the simplest level we can declare a new Array of 10 items of some type, like so:

Notice the use of the keyword new to create the Array. This is similar in effect to the CreateObject() function we used in VBScript to create a dictionary. Also notice that we use parentheses to define the size of the array.

So once again we use square brackets to access the array elements. And once again the indexes start from zero.

However, JavaScript arrays are not limited to storing a single type of value, we can assign anything to an array element:

Another feature of JavaScript arrays is that we can determine the length through a special property called length. We access the length like this:

Notice that once again the syntax for this uses a name.property format, with a dot connecting the property to the name. It is very similar to calling a function in a Python module, but without any trailing parentheses.

As mentioned, JavaScript arrays start indexing at zero by default. However, JavaScript array indexes are not limited to numbers, we can use strings too, and in this case they become almost identical to dictionaries! We can also extend an array by simply assigning a value to an index beyond the current maximum - which means we don't really need to specify a size when we create one, even though it is considered good practice! We can see these features in use in the following code segment:

If you try running that in your browser you should see the values populated on the screen. See if you can follow the assignments in the code to confirm why the values are what you see in the browser window.

Finally, let's look at our address book example once more, this time using JavaScript arrays:

Notice that we can also access the key as if it were a property like length. JavaScript arrays really are quite remarkably flexible data structures!

Stack

Think of a stack of trays in a restaurant. A member of staff puts a pile of clean trays on top and these are removed one by one by customers. The trays at the bottom of the stack get used last (and least!). Data stacks work the same way: you push an item onto the stack or pop one off. The item popped is always the last one pushed. This property of stacks is sometimes called Last In First Out or LIFO. One useful property of stacks is that you can reverse a list of items by pushing the list onto the stack then popping it off again. The result will be the reverse of the starting list. Stacks are not built in to Python, VBScript or JavaScript. You have to write some program code to implement the behavior. Lists are usually the best starting point since like stacks they can grow as needed.

Try writing a stack using a Python list. Remember that you can append() to the end of a list and del() items at a given index. Also you can use -1 to index the last item in a list. Armed with that information you should be able to write a program that pushes 4 characters onto a list and then pops them off again, printing them as you go. Just watch which order you call print and del! If you get it right then they should print in the reverse order to how you pushed them on. (In fact Python lists even have a pop() operation that returns and removes the last item in one step to make it even easier!)

Bag

A bag is a collection of items with no specified order and it can contain duplicates. Bags usually have operators to enable you to add, find and remove items. In our languages bags are just lists.

Set

A set has the property of only storing one of each item. You can usually test to see if an item is in a set (membership), add or remove items and join two sets together in various ways corresponding to set theory in math (e.g. union, intersect etc). Sets do not have any concept of order. VBScript and JavaScript do not implement sets directly but you can approximate the behavior fairly easily using dictionaries that only have keys with empty values.

Notice that you can initialize a set using braces. However, this can look superficially like a dictionary, so I personally prefer to use the set([...]]) notation even though it requires slightly more typing.

Queue

A queue is rather like a stack except that the first item into a queue is also the first item out. This is known as First In First Out or FIFO behavior. This is usually implemented using a list or array.

See if you can write a queue using a list. Remember you can add to a list with append() and delete from a given position using del(). Try to add 4 characters to your queue and then get them out and print them. They should print in the same order that you inserted them.

There's a whole bunch of other collection types but the ones we have covered are the main ones that you are likely to come across. (And in fact we'll only be using a few of the ones we've discussed in this tutor, but you will see the others mentioned in articles and in programming discussion groups!)

Files

As a computer user you should be very familiar with files - they form the very basis of nearly everything we do with computers. It should be no surprise then, to discover that most programming languages provide a special file type of data. However files and the processing of them are so important that I will put off discussing them till later when they get a whole topic to themselves.

Dates and Times

Dates and times are often given dedicated types in programming. At other times they are simply represented as a large number (typically the number of seconds from some arbitrary date/time, such as when the operating system was written!). In other cases the data type is what is known as a complex type as described in the next section. This usually makes it easier to extract the month, day, hour etc. We will take a brief look at using the Python time module in a later topic. Both VBScript and JavaScript have their own mechanisms for handling time but I won't be discussing them further.

User Defined

Sometimes the basic types described above are inadequate even when combined in collections. Sometimes, what we want to do is group several bits of data together then treat it as a single item. An example might be the description of an address:
a house number, a street and a town. Finally there's the post code or zip code.

Most languages allow us to group such information together in a record or structure or with the more modern, object oriented version, a class.

VBScript

The Public keyword simply means that the data is accessible to the rest of the program, it's possible to have Private data too, but we'll discuss that later in the course.

Python

That may look a little arcane but don't worry I’ll explain what the def __init__(...) and self bits mean in the section on object orientation. One thing to note is that there are two underscores at each end (4 in all) on __init__. This is a Python convention that we will discuss later. Also you need to use the spacing shown above, as we'll explain later Python is a bit picky about spacing. For now just make sure you copy the layout above.

Some people have had problems trying to type this example at the Python prompt. At the end of this chapter you will find a box with more explanation, but you can just wait till we get the full story later in the course if you prefer. If you do try typing it into Python then please make sure you copy the indentation shown. As you'll see later Python is very particular about indentation levels.

The main thing I want you to recognize in all of this is that, just as we did in VBScript, we have gathered several pieces of related data into a single structure called Address.

JavaScript

JavaScript provides a slightly strange name for its structure format, namely function! Now functions are normally associated with operations not collections of data however in JavaScript's case it can cover either. To create our address object in JavaScript we do this:

Once again, ignore the syntax and use of the keyword this, the end result is a group of data items that we call Address and can treat as a single unit.

OK, So we can create these data structures but what can we do with them once created? How do we access the data items inside? That's our next mission.

Accessing User Defined Types

We can assign a complex data type to a variable, but to access the individual fields of the type we must use some special access mechanism (which will be defined by the language). Usually this is a dot.

Using VBScript

To consider the case of the address class we defined above we would do this in VBScript:

Here we first of all dimension a new variable, Addr, using Dim then we use the Set and New keywords to create a new instance of the Address class. Next we assign values to the fields of the new address instance and finally, we print out the address in a Message Box.

And in Python

Which creates an instance of our Address type and assigns it to the variable Addr. In Python we can pass the field values to the new object when we create it. We then print out the HsNumber and Street fields of the newly created instance using the dot operator. You could, of course, create several new Address instances each with their own individual values of house number, street etc. Why not experiment with this yourself? Can you think of how this could be used in our address book example from earlier in the topic?

JavaScript too

The JavaScript mechanism is very similar to the others but has a couple of twists, as we'll see in a moment. However, the basic mechanism is straightforward and the one I recommend you use:

One final mechanism that we can use in JavaScript is to treat the object like a dictionary and use the field name as a key:

I can't really think of any good reason to use this form other than if you were to be given the field name as a string, perhaps after reading a file or input from the user of your program (we'll see how to do that later too).

User Defined Operators

User defined types can, in some languages, have operations defined too. This is the basis of what is known as object oriented programming. We dedicate a whole section to this topic later but essentially an object is a collection of data elements and the operations associated with that data, wrapped up as a single unit. Python uses objects extensively in its standard library of modules and also allows us as programmers to create our own object types.

Object operations are accessed in the same way as data members of a user defined type, via the dot operator, but otherwise look like functions. These special functions are called methods. We have already seen this with, for example, the append() operation of a list. Recall that to use it we must tag the function call onto the variable name:

When an object type, known as a class, is provided in a Python module we must import the module (as we did with sys earlier), then prefix the object type with the module name when creating an instance that we can store in a variable (while still using the parentheses, of course). We can then use the variable without using the module name.

We will illustrate this by considering the array module briefly mentioned earlier. It provides an array class. We import the module, create an instance of array, assigning it the name myArray and then use myArray to access its operations and data like so:

In the first line we import the array module into the program. In the second line we use the module to create an instance of the array class - by calling it as if it were a function. (We need to provide a typecode string to indicate what type of data it should store - recall that pure arrays notionally only store a single kind of data.) In the third line we access one of the array class's operations, append(), treating the object (myArray) as if it were a module and the operation were in the module. Next we use indexing to retrieve the data that we just added. Finally, we access some internal data (the typecode stores the type that we passed in when creating the array) from within the myArray object using the same module like syntax. We will be looking at many more examples of this later in the course.

Other than the need to create an instance, there’s no real difference between using objects provided within modules and functions found within modules. Think of the object name simply as a label which keeps related functions and variables grouped together.

Another way to look at it is that objects represent real world things, to which we as programmers can do things. That view is where the original idea of objects in programs came from: writing computer simulations of real world situations.

Both VBScript and JavaScript work with objects and in fact that's exactly what we have been using in each of the Address examples above. We have defined a class and then created an instance which we assigned to a variable so that we could access the instance's properties. Go back and review the previous sections in terms of what we've just said about classes and objects. Think about how classes provide a mechanism for creating new types of data in our programs by binding together the data and operations of the new type.

Python Specific Operators

In this tutor my primary objective is to teach you to program and, although I use Python in the tutor, there is no reason why, having read this, you couldn't go out and read about another language and use that instead. Indeed that's exactly what I expect you to do, since no single programming language, even Python, can do everything. However, because of that objective, I do not teach all of the features of Python but focus on those which can generally be found in other languages too. As a result there are several Python specific features which, while they are quite powerful, I don't describe at all, and that includes special operators. Most programming languages have operations which they support and other languages do not. It is often these 'unique' operators that bring new programming languages into being, and certainly are important factors in determining how popular the language becomes.

For example Python supports such relatively uncommon operations as list slicing ( spam[X:Y] ) for extracting a section (or slice) out from the middle of a list (or string, or tuple) and tuple assignment ( X, Y = (12, 34) ) which allows us to assign multiple variable values at one time.

It also has the facility to perform an operation on every member of a collection using its map() function which we describe in the Functional Programming topic. There are many more and it’s often said that "Python comes with the batteries included". For details of how most of these Python specific operations work you’ll need to consult the Python documentation.

Finally, it’s worth pointing out that although I say they are Python specific, that is not to say that they can’t be found in any other languages but rather that they will not all be found in every language. The operators that we cover in the main text are generally available in some form in virtually all modern programming languages.

That concludes our look at the raw materials of programming, let’s move onto the more exciting topic of technique and see how we can put these materials to work.

More information on the Address example

Although, as I said earlier, the details of this example are explained later, some readers have found difficulty getting the Python example to work. This note gives a line by line explanation of the Python code. The complete code for the example looks like this:

>>> class Address:
...   def __init__(self, Hs, St, Town, Zip):
...     self.HsNumber = Hs
...     self.Street = St
...     self.Town = Town
...     self.Zip_Code = Zip
...
>>> addr = Address(7,"High St","Anytown","123 456")
>>> print( addr.HsNumber, addr.Street )

Here is the explanation:

>>> class Address:

The class statement tells Python that we are about to define a new type called, in this case, Address. The colon indicates that any indented lines following will be part of the class definition. The definition will end at the next unindented line. If you are using IDLE you should find that the editor has indented the next line for you, if working at a command line Python prompt in an MS DOS window then you will need to manually indent the lines as shown. Python doesn't care how much you indent by, just so long as it is consistent.

...   def __init__(self, Hs, St, Town, Zip):

The first item within our class is what is known as a method definition. One very important detail is that the name has a double underscore at each end, this is a Python convention for names that it treats as having special significance. This particular method is called __init__ and is a special operation, performed by Python, when we create an instance of our new class, we'll see that shortly. The colon, as before, simply tells Python that the next set of indented lines will be the actual definition of the method.

...     self.HsNumber = Hs

This line plus the next three, all assign values to the internal fields of our object. They are indented from the def statement to tell Python that they constitute the actual definition of the __init__ operation. The blank line tells the Python interpreter that the class definition is finished so that we get the >>> prompt back.

>>> addr = Address(7,"High St","Anytown","123 456")

This creates a new instance of our new Address type and Python uses the __init__ operation defined above to assign the values we provide to the internal fields. The instance is assigned to the addr variable just like an instance of any other data type would be.

>>> print( addr.HsNumber, addr.Street )

Now we print out the values of two of the internal fields using the dot operator to access them.

As I said we cover all of this in more detail later in the tutorial. The key point to take away is that Python allows us to create our own data types and use them pretty much like the built in ones.

Points to remember

Variables refer to data and may need to be declared before being defined.
Data comes in many types and the operations you can successfully perform will depend on the type of data you are using.
Simple data types include character strings, numbers, Boolean or 'truth' values.
Complex data types include collections, files, dates and user defined data types.
There are many operators in every programming language and part of learning a new language is becoming familiar with both its data types and the operators available for those types.
The same operator (e.g. addition) may be available for different types, but the results may not be identical, or even apparently related!

Previous Next

The Raw Materials

Introduction

Data

Variables

VBScript and JavaScript variables

VBScript

JavaScript

Typing - Strict, dynamic, static, etc

Primitive Data Types

Character Strings

String Operators

VBScript String Variables

JavaScript Strings

Integers

Arithmetic Operators

Shortcut operators

VBScript Integers

JavaScript Numbers

Real Numbers

Complex or Imaginary Numbers

Boolean Values - True and False

Boolean (or Logical) Operators

Collections

List

Tuple

Dictionary or Hash

VBScript Dictionaries

JavaScript Dictionaries

Other Collection Types

Array or Vector

VBScript Arrays

JavaScript Arrays

Stack

Bag

Set

Queue

Files

Dates and Times

User Defined

VBScript

Python

JavaScript

Accessing User Defined Types

Using VBScript

And in Python

JavaScript too

User Defined Operators

Python Specific Operators

More information on the Address example