Parse Apache Logs

Lets create a simple script to parse Apache Logs and print the 20 most common visited urls, I think this is a very useful script that can be easily ported in every case:




 counter = collections.Counter()  
 for filename in glob.glob('var/log/apache2/*.gz'):  
   for line in gzip.open(filename):  
     mo = re.search(r'GET (.*) HTTP/1', line)  
     if mo is not None:  
       url = mo.group(1)  
       counter[url] +=1  
 result = heapq.nlargest(20, counter.items(), key=lambda(url,cnt): cnt)  
 pprint result  

Σχόλια

Δημοφιλείς αναρτήσεις