What will we cover? |
---|
|
So far we have looked at the operating system and its ability to manage processes. We have also looked at how to get our scripts to execute existing programs and how to clone our own porogram and communicate between the two copies using pipes. In this topic we will look at how we can extend that to communicate with processes running on a completely separate computer over a network. Of course the process could be running on the same computer and the network be a logical network, we'll look at that as an example first.
As an example consider a web server. It runs on a computer somewhere on the network and we can access it from our computer provided we have the right web address or Uniform Resource Locator (URL). The URL is just a specific type of network address which includes information about the location on the network, the particular language (or protocol that it speaks, and the location on the server of the files we want to fetch.
I'll assume that if you are reading this tutorial that you have at least a basic familiarity with the concept of the internet and the fact that computers connected to the internet have addresses. But how does it all hang together? We don't have time here to go into a full discussion of networking but a good place to get the details is here. The high level view is that when two computers connected to a network need to communicate they do so by sending a packet of data from one to the other. That packet is a lot like an envelope sent through the post with a note inside. The note represents the data and the envelope represents the packet header which contains the sender and receivers addresses. A router or switch locates the part of the network where the destination computer lives and forwards the packet to the router machine in that area. Eventually the packet winds up on the same network segment as the destination computer and the destination computer recognises its own address and opens the packet. It then sends an acknowledgement packet back to the sender to let it know that the message has been received.
Unlike the postal service the packets in a computer network have a maximum amount of data they can send, think of it like being able to post a letter with only a single sheet inside. For a long message you need to post many letters with one sheet in each. At the other end the receiver has to assemble the parts into order, so a sequence number is added to the sheet( like a page number). If a page goes missing or does not turn up within a predetermined time of its predecessor and successor then an error message is sent by the receiver to the sender.
Most of the time we don't need to worry about all this, the computer, operating system and networking software handle it all for us, but its worth knowing that this is all happening under the covers since you cannot rely on data being transmitted reliably or constantly when you use a network. You should expect occasional errors and be prepared for data going missing, or arriving corrupted.
Lets leave the abstract theory behind and take a look at the specifics of how to program a networked application. The idea is simple: we need to create a server program that will run on one computer and a client program that can run on one or more computers attached to the same network as the server. To achieve that we need a mechanism to enable communication between two programs that works across the network. As we saw above, each computer on the network has an address. An IP address has 4 digits separated by dots, you will probably have come across these from time to time in web addresses. A networked application adds an extra element to the IP address known as a port.
A port is specified by adding a colon followed by the port number to the normal IP address. Thus port 80 on IP address 127.0.0.1 is accessed as: 127.0.0.1:80.
Some port numbers are reserved for special purposes, specifically for the different internet application protocols(a protocol is just a set of rules and message definitions that defines how a service will work). For example port 80 is the standard web server port for http traffic, port 25 is used for SMTP email and so on. Thus the same computer can act as a server for several different services at the same time simply by exposing those services through their various ports. We can demonstrate this very easily by adding port number 80 to a web server address such as http://www.google.com:80. The web page should open up as normal because the browser normally connects to port 80 anyway if no other port is provided. It's quite common practice for port 8080 to be used as a test port for new versions of web sites before they are publically launched
There are quite a few of these reserved port numbers but for bespoke applications we can generally use port numbers in the range 1000-60000 without conflict. However, because there is always a small chance that another program on the same computer has picked the same port number it is best to make the port number configurable, perhaps via a system environment variable or by a command line parameter. In this tutorial I won't bother with that but in a real world scenario where you don't have exclusive control of the server computer it's pretty much essential that you do so.
So having identified our mechanism the obvious question is: how do we get our code to connect to one of these ports?
The most basic communications mechanism is called a socket. A socket is exposed on the network as a port at an IP address. In Python sockets are created and used by importing the socket module.
To use a socket we must write a server that creates the socket, associating it (or binding it) with a port. We then listen to the socket for incoming requests. We must then write a client to connect to the socket on that port. When the client connects to the port and the server accepts the connection then the server creates a new temporary port that is used for the actual send/recv communication with the server during the transaction. This frees the original port for more incoming connection requests. We can show that pictorially like so:
This raises the problem of testing this type of application since, without a client, we can't tell whether the server works, but a client without a server will not do anything, so we must have both client and server programs available. However, once a server has been written any number of other clients can be created provided they communicate with the server socket using the correct message protocol. We can see examples of that in the way that there are many different web browsers that can all connect to any http server. Similarly once a protocol has been published many different servers can be written and any client should work with any server. This is one of the things that has made this type of networked application so popular, it is effectively an open and extensible environment where either end of the client/server pair can be enhanced without breaking the other end. Ok, enough of that, let's write some code!
As a first example we will create a very simple server program that simply responds to requests by returning a welcome message and the count of the number of requests processed.
import socket # create an InterNET, STREAMing socket (aka TCP/IP) serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # Use localhost and port 2007 serversocket.bind(('localhost', 2007)) # Get ready to receive requests serversocket.listen(5) connections = 0 while True: # process connections from clients (clientSocket, address) = serversocket.accept() connections += 1 print "Connection %d using port %d" % (connections, address[1]) # now do something with the clientsocket while True: req = clientSocket.recv(100) if not req: break # client closed connection message = 'Thankyou!, processed connection number %d' % connections clientSocket.send(message) clientSocket.close()
Note 1: The combination of AF_INET and SOCK_STREAM indicates that we will be using the TCP/IP protocol. Other IP protocols are possible using other combinations of contants. TCP/IP is the most popular variety however so that's all I will cover in this tutorial.
Note 2: We pass a value of 5 to listen(). This represents the number of connnections we will allow to build up in the port queue. This is normally adequate be cause we finish processing one request before too many others arrive. We can improve the efficiency of processing by spawning a separate process to do the actual processing (see Note 5 below) which allows the server to get back to pulling messages from the queue as quickly as possible. Only on very busy servers should you need to increase the size of the limit beyond 5.
Note 3:Clients establish a new connection for each transaction. When the transaction is over no new data will be available and we can terminate the inner while loop and go back to waiting for a connection.
Note 4: We have processed the clients request in our server code. This is OK because its a trivial bit of processing but in a real application the processing could take a significant amount of time. In that case we would spawn off another process (maybe using the subprocess module discussed in the OS topic) to handle the specific client transaction and let the server get back to pulling requests out of its queue.
Note 5: There is no way to terminate the server process, it runs forever unless there is an error. In practice we can shut it down using the operating system, for example by using TaskManager in Windows, or kill in Unix.
So now we have a server ready to run and await client requests. But we don't yet have a client to send those requests. Let's build one now.
The client is just as simplistic, it simply sends repeated requests to the server at one second intervals and prints the responses.
import socket,time # create socket serverAddress = ('localhost',2007) # send some requests for n in range(5): sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect(serverAddress) try: sock.send('dummy request\n') data = sock.recv(100) if not data: break # no data from server print data time.sleep(1) finally: # now tidy up sock.close()
Note 1: We use connect to access the socket. Then we use the same send/recv interface used by the server, but because the client initiates the transaction the sequence is reversed.
Note 2: We could have sent and received more data in a single transaction, we simply did it this way to emphasise the different connectinn numbers coming back from the server. It is the client which decides when to terminate the transaction, not the server - unless there is an error!
Note 3: Notice the use of try/finally to ensure that the socket gets closed even in the event of an exception being raised. This is good housekeeping practice since some OS will keep open sockets alive for a long time consuming system resources.
To run the programs we need to make sure we start the server running first. Once it is happy we can start up one or more client programs which will connect to it. Since we limited the server to being on localhost we can't run this across the network so we need to start a number of console sessions on our local computer.
A screen capture of my PC running the server (on the right) and 2 clients (on the left) is shown below:
Notice the output of the two clients shows the sequence of messages received from the server, and the server messages show the connections and the temporary port numbers assigned by the server.
In the IPC topic we built a server version of our address book and called it address_srv.py. We are going to use that same module in our socket based version. The big difference between the original IPC based model and this version is that we can have more than a single client accessing the address book and indeed the clients can all be running on different computers.
Recall that the functions that we made available in address_srv were:
Although we have turned our address book into server style functions we still need to write a server program that handles receiving the requests from clients and calling the appropriate function. This mechanism is sometimes called dispatching the messages. The code is very similar to the simple examples above.
The main program looks like this:
import socket, address_srv addresses = address_srv.readBook() # set up the socket serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) serversocket.bind(('localhost', 2007)) serversocket.listen(5) print 'Server started and listening on port 2007...' # process connections from clients while True: (clientSocket, address) = serversocket.accept() # now process address book commands while True: s = clientSocket.recv(1024) try: cmd,data = s.split(':') except ValueError: break print 'received request: ', cmd if cmd == "add": details = data.split(',') name = details[0] entry = ','.join(details[1:]) s = address_srv.addEntry(addresses, name, entry) address_srv.saveBook(addresses) elif cmd == "rem": s = address_srv.removeEntry(addresses, data) address_srv.saveBook(addresses) elif cmd == "fnd": s = address_srv.findEntry(addresses, data) else: s = "ERROR: Unrecognised command: " + cmd clientSocket.send(s) clientSocket.close()
The main things to note here are that the main socket handling code is exactly the same as above. The processing of the request data has been put in a try/except construct to catch incomplete data from the client. Otherwise you should find this is virtually identical to the IPC based version in the previous topic.
Now that we have a server running in the background we need to write a client program that can talk to it. This will be very similar to our IPC version but now it is located in its own script and we will be able to run several instances at the same time.
import socket serverAddress = ('localhost', 2007) menu = ''' 1) Add Entry 2) Delete Entry 3) Find Entry 4) Quit ''' while True: print menu try: choice = int(raw_input('Choose an option[1-4] ')) except: continue if choice == 1: name = raw_input('Enter the name: ') num = raw_input('Enter the House number: ') street= raw_input('Enter the Street name: ') town = raw_input('Enter the Town: ') phone = raw_input('Enter the Phone number: ') data = "%s,%s %s, %s, %s" % (name,num,street,town,phone) cmd = "add:%s" % data elif choice == 2: name = raw_input('Enter the name: ') cmd = 'rem:%s' % name elif choice == 3: name = raw_input('Enter the name: ') cmd = 'fnd:%s' % name elif choice == 4: break else: print "Invalid choice, must be between 1 and 4." continue # do all the socket stuff sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect(serverAddress) try: sock.send(cmd) data = sock.recv(250) if not data: break # no data from server print data finally: sock.close()
Again, we see that the menmu processing is exactly as it was before and all the comms stuff is at the bottom in a few lines. It is exactly like the client example above.
So far our sockets have been on a local machine. What do we need to do to move it onto a real network and have true client server operation? In principle its incredibly easy, we just change the addresses used in the bind() call in the server and the connect() call in the client. Simply replace the reference to 'localhost' with the IP address (either the name or number version) of the computer that the server program is running on and it should just work. If you have multiple machines on the same network you can run versions of the client on each machine, at the same time, and the server will process the requests.
In practice we sometimes need to do a little bit more work handling DNS name resolution etc. Also because real networks are inherently less reliable we should add some more error checks and a timeout mechanism so that the server doesn't get locked up. But these are rather more advanced topics that I'm not going to cover in this tutorial, just be aware that you may need to think about these kinds of issues.
There is lots more information on socket programming available. In particular there is the Socket How-To by Gordon McMillan which covers many of the pitfalls of socket programming and suggests ways of dealing with them. And, of course, the Python socket module documentation is essential reading.
A number of books include sections on network programming with sockets too. Particularly noteworthy is the book Python Network Programming by John Goerzen which is all about network programming with extensive coverage of sockets.
The good news is that a lot of common network programming tasks can be done at a higher level if we are using one of the standard internet protocols such as http, smtp, ftp, telnet etc. This is because Python includes modules which implement those protocols at the socket layer so that we don't have to. If you do a lot of network programming the language rebol makes something of a speciality of the task and builds in support for several network tasks in the language itself.
In the next few topics we will look at how web programming using http can be simplified using these higher level modules.
Things to remember |
---|
|
If you have any questions or feedback on this page
send me mail at:
alan.gauld@yahoo.co.uk