Receive full data with the recv socket function in python

In an earlier article we saw how to send and receive data in python using sockets. Lets take a quick example :

#Socket client example in python

import socket	#for sockets
import sys	#for exit

#create an INET, STREAMing socket
try:
	s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
except socket.error:
	print 'Failed to create socket'
	sys.exit()
print 'Socket Created'

host = 'www.google.com';
port = 80;

try:
	remote_ip = socket.gethostbyname( host )
except socket.gaierror:
	#could not resolve
	print 'Hostname could not be resolved. Exiting'
	sys.exit()

#Connect to remote server
s.connect((remote_ip , port))
print 'Socket Connected to ' + host + ' on ip ' + remote_ip

#Send some data to remote server
message = "GET / HTTP/1.1\r\n\r\n"
try :
	#Set the whole string
	s.sendall(message)
except socket.error:
	#Send failed
	print 'Send failed'
	sys.exit()
print 'Message send successfully'

#Now receive data
reply = s.recv(4096)
print reply

The output of the above code might be something like this :

$ python simple_client.py 
Socket Created
Socket Connected to www.google.com on ip 209.85.175.99
Message send successfully
HTTP/1.1 302 Found
Location: http://www.google.co.in/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; 
$ 






The problem ? The output is not complete. Some data has been left out. Communication like the above takes places through the TCP/IP protocol. In this protocol the data transfer takes place in chunks. Lets say a webpage is 500KB in size, but the maximum packet size is only 64KB. Hence the transfer of the web page will take place in parts or chunks and not the whole thing at once.

Now this is where the problem comes in. The recv function can be made to wait till it receives full data, but for this it must know beforehand the total size of the full data. s.recv(4096 , socket.MSG_WAITALL) will wait till it gets full 4096 bytes. Now if the actual response size is less than that size, the function will block for a long time before it returns. This is definitely not the desired behaviour we are looking for.

Solution

The solution is to keep looking for data till a decent timeout occurs. And in the next code example we shall precisely do the same.

Quick example

#Socket client example in python

import socket	#for sockets
import sys	#for exit
import struct
import time

#create an INET, STREAMing socket
try:
	s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
except socket.error:
	print 'Failed to create socket'
	sys.exit()
	
print 'Socket Created'

host = 'www.google.com';
port = 80;

try:
	remote_ip = socket.gethostbyname( host )

except socket.gaierror:
	#could not resolve
	print 'Hostname could not be resolved. Exiting'
	sys.exit()

#Connect to remote server
s.connect((remote_ip , port))

print 'Socket Connected to ' + host + ' on ip ' + remote_ip

#Send some data to remote server
message = "GET / HTTP/1.1\r\n\r\n"

try :
	#Set the whole string
	s.sendall(message)
except socket.error:
	#Send failed
	print 'Send failed'
	sys.exit()

print 'Message send successfully'

def recv_timeout(the_socket,timeout=2):
    #make socket non blocking
    the_socket.setblocking(0)
    
    #total data partwise in an array
    total_data=[];
    data='';
    
    #beginning time
    begin=time.time()
    while 1:
        #if you got some data, then break after timeout
        if total_data and time.time()-begin > timeout:
            break
        
        #if you got no data at all, wait a little longer, twice the timeout
        elif time.time()-begin > timeout*2:
            break
        
        #recv something
        try:
            data = the_socket.recv(8192)
            if data:
                total_data.append(data)
                #change the beginning time for measurement
                begin=time.time()
            else:
                #sleep for sometime to indicate a gap
                time.sleep(0.1)
        except:
            pass
    
    #join all parts to make final string
    return ''.join(total_data)

#get reply and print
print recv_timeout(s)

#Close the socket
s.close()

The above code will have an output similar to this

$ python smart_client.py 
Socket Created
Socket Connected to www.google.com on ip 209.85.175.104
Message send successfully
HTTP/1.1 302 Found
Location: http://www.google.co.in/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com
Set-Cookie: PREF=ID=45c0849c6f176c00:FF=0:TM=1344322385:LM=1344322385:S=Wot1k5lfbnb3H9sK; expires=Thu, 07-Aug-2014 06:53:05 GMT; path=/; domain=.google.com
Set-Cookie: NID=62=1d1x-iiXm8589m8djPowq2kE3SINeGtMtOKd67SmGw2bc1FJXw6IsqAo6O-gzySTxVqdmZyhOgquJekViHibN4Gf3VZSs42zGYJ8KTpoEgXwTjqeiwyHb3RPzxXjp-37; expires=Wed, 06-Feb-2013 06:53:05 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Date: Tue, 07 Aug 2012 06:53:05 GMT
Server: gws
Content-Length: 221
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.co.in/">here</A>.
</BODY></HTML>

Saw the closing html tag at the end ? Now thats complete data and infact that is what the browser displays when google.com is opened. The whole magic takes place inside the function recv_timeout. So lets have a look at how does it work.

def recv_timeout(the_socket,timeout=2):
    #make socket non blocking
    the_socket.setblocking(0)
    
    #total data partwise in an array
    total_data=[];
    data='';
    
    #beginning time
    begin=time.time()
    while 1:
        #if you got some data, then break after timeout
        if total_data and time.time()-begin > timeout:
            break
        
        #if you got no data at all, wait a little longer, twice the timeout
        elif time.time()-begin > timeout*2:
            break
        
        #recv something
        try:
            data = the_socket.recv(8192)
            if data:
                total_data.append(data)
                #change the beginning time for measurement
                begin = time.time()
            else:
                #sleep for sometime to indicate a gap
                time.sleep(0.1)
        except:
            pass
    
    #join all parts to make final string
    return ''.join(total_data)

The steps pointwise are :

1. Make the socket non-blocking. By doing this, the socket wont wait if there is no data in recv calls. It will continue if there is no data available.

2. Do in a loop the following : keep calling recv, till a timeout occurs or recv finishes up on its own.

Now this is a very simple approach to demonstrate how the recv function ought to work in real applications. The same function can be developed further and made more complex according to the protocol it is working with, for example HTTP.

Last Updated On : 7th August 2012

Subscribe to get updates delivered to your inbox

7 Comments + Add Comment

  • Thank you.
    Thank you so much.

  • fell in love with you code

  • I love you. You just perfectly solved the problem I had with this explanation. Thanks!

  • if you get a 302 redirect just make another request. the redirect will contain the url you need to be redirected too. run the same thing ,but have it point to the redirect url.

  • This is very interesting…How do I get a full data like that to be displayed on a broswer..??

  • First of all a lot of Thanks for this beautiful and lucid post. Helps in clarifying lot of stuff so quickly! Great Job!

    I have one question on the response returned from the server:

    If you see the response returned is a 302:
    …………………………………………
    302 Moved
    The document has moved
    …………………………………………

    Now browsers will do an auto forward and urllib2 library also has classes to handle the redirects, but I want to know how can this be natively handled via Sockets. I am getting 302 for almost all the sites.

    Any help is much appreciated!

  • have no idea why, but it works. Thanks.

Leave a comment