What will we cover?
The earliest web servers simply delivered static HTML pages on demand. There was very little dynamic content involved. Some sites used batch processes to create the HTML pages from data sources thus ensuring that the information was updated regularly, but the users had no influence on this process.
Over time it became common for these programs to be incorporated into the Web Server so that the pages were generated dynamically whenever a user requested them. The mechanism used for this was called the Common Gateway Interface or CGI. This basic mechanism still sits behind the wealth of dynamic web sites that we see today.
As programmers became more adept at exploiting the potential of CGI (and in combination with the <form> tag of HTML) more and more sophisticated web sites appeared including shopping malls, games, technical support and all the other services we now take for granted.
The CGI concept is really quite simple. The input data is sent as part of the GET or POST http message. The URL points at a special folder on the web server, which thus knows to execute the resource rather than to simply return it. The resource is a program, written in any language, which sends its output to stdout in the form of an HTML page. The server reads the stdout and forwards it on to the requesting client.
We will now look at this early stage of web server development by reproducing it on our local computer. First we start up a simple web server and get it to serve up a basic static "Hello World" style HTML page. Next we will extend the page to include a form to capture the user's name and submit it to the server. Finally we will create a CGI program to read that user name and return a personalised welcome message from the server.
Python provides a very simple web server module that we can use for test and development before moving our code to more robust web hosting platform, either on a company network or the internet. Let's look at how to get it running and serving simple static web pages.
The first thing to do is to create a web page to serve! This tutorial will not teach you how to write HTML but we will present the bare bones, just enough to run the examples. Our "Hello World" web page looks like this:
<!doctype html> <html> <head> <title>Hello world web page</title> </head> <body> <h1>Hello World</h1> </body> </html>
HTML is composed of tags which are simply a set of defined names enclosed in <> markers. Most tags come as a matching pair with the closing tag name being preceded by a /. Tags can also contain embedded data known as attributes. The example above has no attributes but we will see several of them as the examples become more complex. The main things to note are are the compulsory elements. Every valid HTML page should have these elements as a minimum (although most browsers will do a pretty good job even if they are missing!). These are:
Those are the bare bones of a page. The only extra elements we have in our example are a pair of <h1> tags with a "Hello World" message in between. There are a whole set of "h" tags (for heading) numbered from 1 to 6 and each gets progressively less dominant in style (although you can change that via CSS settings if you wish).
If you save the HTML as a file called index.htm in a folder called hello we can check that it works by loading it into a web browser. If it all looks OK then we can set about serving it from our web server.
The Python web server is found in the http.server module and can be run without any modifications from an OS command line. Start an OS console and change the working directory to be your newly created hello folder. Type:
$ python -m http.server --cgi 8000
The -m option specifies a module for Python to run. The --cgi flag enables CGI operations which we'll need later. The 8000 specifies the network port for it to use.
You can now access the server by loading the following URL in the address bar of the browser:
http://localhost:8000
If you have the server running then that should result in your web page being loaded and displayed exactly as before, the only difference being that it is now being sent from the server, as evident by the address displayed in the browser. Congratulations, you have just served your first web page.
By the way, the reason that this works without you specifying the file name is that web servers look for a file with the name index.htm as a default. If there is no file specified but an index.htm exists they will display that. (The server may also be configured to look for other default files, such as index.html or index.php)
We now want to extend our previous web page to capture the user's name and send it to the server. The server will respond by sending back a personalised welcome message. The first step is to modify our HTML file to include an entry field for the user's name:
<!doctype html> <html> <head> <title>Hello user page</title> </head> <body> <h1>Hello, welcome to our website!</h1> <form name="hello" method="get" action="http://localhost:8000/cgi-bin/sayhello.py"> <p>Please tell us your name:</p> <input type="text" id="username" name="username" size="30" required autofocus/> <input type="submit" value="Submit" /> </form> </body> </html>
Just a few things to notice here. The first is that the <form> elements all have attributes in their tags. The name attributes all form part of the data sent to the server alongside the value in the input text box. The method attribute of the form tag tells the browser which kind of http message to send (GET in this case), and the action tells it where to send it. The input attributes should be pretty self explanatory, with the last two telling the browser not to submit the form unless the text field has a value and to put the cursor into this field ready to receive input. The input type of submit tells the browser to render this element as a button with the text from the value attribute displayed. Then, when the button is pressed, it will submit the form to the address in the action attribute of the form.
Save that in the same folder as before but this time call it hello.htm. Test that it looks OK by entering the following in the browser address bar:
http://localhost:8000/hello.htm
You should see a message with an entry field below prompting you to enter your name and a button below that. Pressing the button at this point does nothing because we haven't written any code to process it on the server yet.
Now we need to create the CGI program that will called when the form is sent to the server by the button's press.
The CGI program is exactly like every other Python program that you have written so far apart from two things:
To meet the second criteria create a new cgi-bin folder under your html one. Inside that new directory create a file called sayhello.py containing the following code:
#!/usr/bin/env python3 import cgi # Extract the data fields data = cgi.FieldStorage() username = data["username"].value # send mandatory http header with trailing \n\n print("ContentType: text/html\n\n") # now send HTML content print("<!doctype html>\n<html><head>") print("<title>Hello %s</title></head>" % username) print(''' <body> <h1>Welcome %s</h1> </body> </html>''' % username)
All the really clever stuff happens in the call to FieldStorage. That's where the cgi module collects all the data from the http request and puts it into a dictionary so that we can easily access it using string based keys. The keys are just the name attributes from our form.
Notice too how Python triple quotes enable us to structure the output HTML in a readable format (Compare the head section with the body).
Once saved, make sure you change the permissions so that it is executable by everyone. Now if you reload the hello.htm webpage and fill in the form you should get a friendly welcome from the server when you click the button.
For very simple applications like this CGI is still a viable technique and has the benefit that pretty much everything is transparent and therefore easy(ish) to debug. Unfortunately, as sites get bigger and the data gets more complex, CGI runs out of steam. That's where web frameworks come in. We'll look at them in the next topic.
Things to remember