Python – How to Receive Full Data with the recv() Socket function

By | August 11, 2020

Socket function - recv()

If you are writing a socket program in python that communicates with some remote machine or server to receive data, then you shall be using the recv() function to receive data on a socket.

The recv() can receive only a specified number of bytes into a string buffer. If the incoming data is larger than the buffer size then only the part of the data is received, and rest gets lost.

This is a problem with the recv function in socket.

Lets take a quick example to understand this:

Code

#Socket client example in python

import socket	#for sockets
import sys	#for exit

#create an INET, STREAMing socket
try:
	s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
except socket.error:
	print 'Failed to create socket'
	sys.exit()
print 'Socket Created'

host = 'www.google.com';
port = 80;

try:
	remote_ip = socket.gethostbyname( host )
except socket.gaierror:
	#could not resolve
	print 'Hostname could not be resolved. Exiting'
	sys.exit()

#Connect to remote server
s.connect((remote_ip , port))
print 'Socket Connected to ' + host + ' on ip ' + remote_ip

#Send some data to remote server
message = "GET / HTTP/1.1\r\n\r\n"
try :
	#Set the whole string
	s.sendall(message)
except socket.error:
	#Send failed
	print 'Send failed'
	sys.exit()
print 'Message send successfully'

#Now receive data
reply = s.recv(4096)
print reply

The output of the above code might be something like this :

$ python simple_client.py 
Socket Created
Socket Connected to www.google.com on ip 209.85.175.99
Message send successfully
HTTP/1.1 302 Found
Location: http://www.google.co.in/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; 
$

The problem ? The output is not complete. Some data has been left out. Communication like the above takes places through the TCP/IP protocol.

In this protocol the data transfer takes place in chunks. Lets say a webpage is 500KB in size, but the maximum packet size is only 64KB.

Hence the transfer of the web page will take place in parts or chunks and not the whole thing at once.

Now this is where the problem comes in. The recv function can be made to wait till it receives full data, but for this it must know beforehand the total size of the full data. s.recv(4096 , socket.MSG_WAITALL) will wait till it gets full 4096 bytes.

Now if the actual response size is less than that size, the function will block for a long time before it returns. This is definitely not the desired behaviour we are looking for.

recv() in chunks - The Solution

The solution is to keep looking for data till a decent timeout occurs. And in the next code example we shall precisely do the same.

Quick example

#Socket client example in python

import socket	#for sockets
import sys	#for exit
import struct
import time

#create an INET, STREAMing socket
try:
	s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
except socket.error:
	print 'Failed to create socket'
	sys.exit()
	
print 'Socket Created'

host = 'www.google.com';
port = 80;

try:
	remote_ip = socket.gethostbyname( host )

except socket.gaierror:
	#could not resolve
	print 'Hostname could not be resolved. Exiting'
	sys.exit()

#Connect to remote server
s.connect((remote_ip , port))

print 'Socket Connected to ' + host + ' on ip ' + remote_ip

#Send some data to remote server
message = "GET / HTTP/1.1\r\n\r\n"

try :
	#Set the whole string
	s.sendall(message)
except socket.error:
	#Send failed
	print 'Send failed'
	sys.exit()

print 'Message send successfully'

def recv_timeout(the_socket,timeout=2):
    #make socket non blocking
    the_socket.setblocking(0)
    
    #total data partwise in an array
    total_data=[];
    data='';
    
    #beginning time
    begin=time.time()
    while 1:
        #if you got some data, then break after timeout
        if total_data and time.time()-begin > timeout:
            break
        
        #if you got no data at all, wait a little longer, twice the timeout
        elif time.time()-begin > timeout*2:
            break
        
        #recv something
        try:
            data = the_socket.recv(8192)
            if data:
                total_data.append(data)
                #change the beginning time for measurement
                begin=time.time()
            else:
                #sleep for sometime to indicate a gap
                time.sleep(0.1)
        except:
            pass
    
    #join all parts to make final string
    return ''.join(total_data)

#get reply and print
print recv_timeout(s)

#Close the socket
s.close()

The above code will have an output similar to this

$ python smart_client.py 
Socket Created
Socket Connected to www.google.com on ip 209.85.175.104
Message send successfully
HTTP/1.1 302 Found
Location: http://www.google.co.in/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com
Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com
Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com
Set-Cookie: PREF=ID=45c0849c6f176c00:FF=0:TM=1344322385:LM=1344322385:S=Wot1k5lfbnb3H9sK; expires=Thu, 07-Aug-2014 06:53:05 GMT; path=/; domain=.google.com
Set-Cookie: NID=62=1d1x-iiXm8589m8djPowq2kE3SINeGtMtOKd67SmGw2bc1FJXw6IsqAo6O-gzySTxVqdmZyhOgquJekViHibN4Gf3VZSs42zGYJ8KTpoEgXwTjqeiwyHb3RPzxXjp-37; expires=Wed, 06-Feb-2013 06:53:05 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Date: Tue, 07 Aug 2012 06:53:05 GMT
Server: gws
Content-Length: 221
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.co.in/">here</A>.
</BODY></HTML>

Saw the closing html tag at the end ? Now thats complete data and infact that is what the browser displays when google.com is opened. The whole magic takes place inside the function recv_timeout. So lets have a look at how does it work.

def recv_timeout(the_socket,timeout=2):
    #make socket non blocking
    the_socket.setblocking(0)
    
    #total data partwise in an array
    total_data=[];
    data='';
    
    #beginning time
    begin=time.time()
    while 1:
        #if you got some data, then break after timeout
        if total_data and time.time()-begin > timeout:
            break
        
        #if you got no data at all, wait a little longer, twice the timeout
        elif time.time()-begin > timeout*2:
            break
        
        #recv something
        try:
            data = the_socket.recv(8192)
            if data:
                total_data.append(data)
                #change the beginning time for measurement
                begin = time.time()
            else:
                #sleep for sometime to indicate a gap
                time.sleep(0.1)
        except:
            pass
    
    #join all parts to make final string
    return ''.join(total_data)

The steps pointwise are :

1. Make the socket non-blocking. By doing this, the socket wont wait if there is no data in recv calls. It will continue if there is no data available.

2. Do in a loop the following : keep calling recv, till a timeout occurs or recv finishes up on its own.

Now this is a very simple approach to demonstrate how the recv function ought to work in real applications. The same function can be developed further and made more complex according to the protocol it is working with, for example HTTP.

If you have any feedback or questions let us know in the comments below.

About Silver Moon

A Tech Enthusiast, Blogger, Linux Fan and a Software Developer. Writes about Computer hardware, Linux and Open Source software and coding in Python, Php and Javascript. He can be reached at [email protected].

19 Comments

Python – How to Receive Full Data with the recv() Socket function
  1. Anibal

    In Python 3.8 this line doesn’t work:

    return ”.join(total_data)

    It throws:

    return ”.join(total_data)
    TypeError: sequence item 0: expected str instance, bytes found

    My data is a bunch of characters (not binary data), but I have control characters in the stream.

    And if I replace it with:

    return ”.join(str(total_data))

    I’m basically changing the control character \x0b to a literal string with a backslash, followed by x, followed by 0 and then b. That’s not what I want. Not sure how to address this.

  2. cyberthereaper

    I solved the problem with a simple method. i am using python3.8

    After extracting the data with s.recv (4096), do the following.

    results = s.recv(4096)

    while (len(results) > 0):
    print(results)
    results = s.recv(4096)

    the above command will give you all the data

  3. Rufus V. Smith

    I’m not sure the comment on the time.sleep(0.1) between read requests is really appropriate. I don’t believe it is to “indicate a gap”, it is probably to reduce load on the CPU caused by spinning for new data (I’ve written a lot of code like this). If you remove that time, you’ll see the CPU load on your computer skyrocket, perhaps to 100% for no reason. The code is quite good however. The only other comment I have is the “Wait for any data” and “wait for end of data” should probably each have their own values, not for one to be twice the other. Usually the “Wait for any data” can be quite long, and the “wait for end of data” is quite short.

  4. Ben

    Hello, i have a question. Is there a reason 8192 is used as the number in recv? What’s its advantage to putting any other number there?

  5. Shuvo

    What modification can be done in this code in order to download a webpage pointed by a URL and all the image objects associated with the base html?

  6. VectorEQ

    if you get a 302 redirect just make another request. the redirect will contain the url you need to be redirected too. run the same thing ,but have it point to the redirect url.

  7. Vishal

    First of all a lot of Thanks for this beautiful and lucid post. Helps in clarifying lot of stuff so quickly! Great Job!

    I have one question on the response returned from the server:

    If you see the response returned is a 302:
    …………………………………………
    302 Moved
    The document has moved
    …………………………………………

    Now browsers will do an auto forward and urllib2 library also has classes to handle the redirects, but I want to know how can this be natively handled via Sockets. I am getting 302 for almost all the sites.

    Any help is much appreciated!

Leave a Reply

Your email address will not be published. Required fields are marked *