Archive

Posts Tagged ‘python’

Hidoku Solver in Python: branch cutting, intelligent successor generation and some simplifications

In this post we are implementing a Hidoku solver (Hidoku is yet another fine number puzzle) that uses a depth first search, branch cutting, limited (intelligent) successor generation and some automatic simplification. Usually, a Hidoku is a quadratic board consisting of n x n fields – but rectangular or other forms would be possible as well. With each Hidoku, some fields are already pre-filled with numbers at the beginning.

Hidoku Sample

Hidoku Sample

The game goal is to fill in all other numbers so that an ascending number queue is built: each number has to be adjacent to it’s successor, with adjacent meaning in an adjacent horizontal, vertical or diagonal field. Those adjacent fields are what we call a field neighborhood. The first and last number (e.g. with a 10×10 board they will be 1 and 100) are usually amongst those pre-filled into the board – for the player knowing where the queue starts and ends. See e.g. here or here to try out some online Hidoku examples for yourself.

Within our solver we use a similar puzzle representation which we read from a text file. E.g. the sample from above would look like this:

__,__,93,__,__,__,___,__,__,57
__,__,__,22,__,53, 54,40,__,__
__,__,18,23,52,__,100,__,__,__
__,__,__,__,__,99, 33,__,__,__
__,__,__,__,__,25,___,__,__,__
__,__,__,__,__,__,___,__,__,29
14,__, 7,__,__,__,___,__,__,__
__,__,__,65,__,__, 46,72,77,__
__, 9,__,__,__,__,___,__,80,76
 1,__,__,11,__,68, 82,__,__,__

Solving a Hidoku

With our solver we want to obtain all possible solutions to a Hidoku (although it should ideally only have 1). Therefore we utilize a depth first search with branch cutting, intelligent successor generation and some automatic simplification. Besides the simple game board representation (positions with numbers) we also use some additional variables for speeding up the search (see Board.__init__() for details):

  • self.nparray: the standard n x m numpy array representation of the game board, with the numbers present on the board written to the y/x position of the array.
  • self.nrs_yx_dict: dictionary of numbers filled to the board and their corresponding tuple of y/x positions. This allows fast lookup of the y/x position a number is filled to.
  • self.nrs_left_to_place: list of numbers that still need to be placed on the board. Allows for slightly faster lookup of still missing numbers.
  • self.nrs_placement_possibilities: list of possible placements as numbers and their corresponding y/x positions (representing positions this numbers could be filled to, given the current state of the board). This list enables much faster access to where numbers could get placed on the board. It is created initially (for the first node in the search tree) and for all successor nodes in the search tree possibilities that became impossible just get removed.
  • self.field_connection_status: lists how many connections each field in the board currently has (0 to 2). A connection exists if the number preceding or succeeding to the number in the field is present in the field neighborhood (with the neighborhood including all adjacent fields: vertical, horizontal and diagonal). These connection stats enable quick checks if the board is already invalid (more on that later). The connection stats get handed over to preceding nodes in the search tree by updating the neighborhood and the field itself the number was filled to.

We now look closer at how some of these are created and later used inside branch cutting.

Determining those fields that a number could be filled to

With the first node in the search tree, self.nrs_placement_possibilities gets created and filled. Succeeding nodes obtain a copy of it with those possibilities removed that became invalid by placing preceding number to the board (by doing the distance checks described below again).

Given the numbers pre-filled to the board, all other numbers can be filled to those fields only that lie within the allowed distance of numbers already filled to the board. An example: if nr. 2 if filled to the top left field (position 0/0), nr. 2 can only be placed within vertical, horizontal and diagonal distance 1 of field 0/0, nr. 3 within distance 2 etc. Hence, this boils down to a simple (but pretty effective in reducing computation complexity) series of distance checks, in which each number X gets checked against each number Y already filled to the board. These checks are only performed if abs(X-D) < max(board_height, board_width) - 2 (otherwise the numbers can be placed on the board without any restrictions caused by each other, as the board is too small for such). The core of such a distance check is rather simple: nr. X can be placed to a certain y/x position on the board, if the horizontal and the vertical distance to the position of nr. Y is at max. abs(X-Y).

Branch cutting and its requirements

In our search tree we need to detect dead ends early (boards that cannot yield a valid solution anymore) in order to exclude them from further search, hence speeding up the complete search.

With keeping track of possible placements of numbers to fields on the board using self.nrs_placement_possibilities, we can do 2 types of checks:

  • for each number left to be placed on the board: is there still at least one field left that number can be placed on?
  • for each field still empty: is there at least one number left that can be placed on that field?

If at least one is not fulfilled: the board is invalid an can be excluded from further search.

We also use field connection stats (self.field_connection_status) within branch cutting. Fields initially have 0 connections. If the field is filled with a number, for each of the preceding and succeeding number present in its neighborhood (the adjacend fields, maximum 8), the number of connections goes up by one – so the maximum amount of connections per field is 2. Using these field connection stats we can check:

  • if the field is filled: does it have its preceding and succeeding numbers placed in its neighborhood, or enough free fields in the neighborhood for them to be placed there in the future? If not the board is invalid.
  • if the field is not filled: does it allow to be a connection for any pair of numbers in its neighborhood? Can it still be reached (entered and exited) from the outside? E.g. an empty field with fully filled neighborhood can only be reached if there can a number be placed on that field that has its preceding and succeeding number already filled to the neighborhood (or a free field in the neighborhood in its place). This essentially boils down to neighborhood number permutation check: if we don’t find a valid (still possible) permutation of numbers in the neighborhood for which we can fill a still available and valid number to that field, the board is invalid.

Successor generation

For the solved board the order of how fields got filling does not matter. Therefore, within our depth first search, we do not care for the search path, but only for the obtained board state (we do not care how a board could have evolved to it’s current state).

The question is: how to generate the required subset of successors (as this will influence the future search complexity)? We only generate successors from filling a single field on the board (with all numbers that can be filled to it) – as we can generate all possible board states from this, without searching multiple search paths (which all lead to the same results). As field to be filled we chose the one with the least possible amount N of numbers that can be filled to this field – and generate one successor per number possible. If there is a number that can be placed only on an amount A of fields that is smaller than N, we instead fill this one number to all those fields and generate one successor per filled field (as we only obtain A instead of N successor nodes then). This way we minimize the amount of branches and keep the overall branching factor low – something a rational human player would strive for as well. Note that we do not miss any possible solutions as a result of this limited successor generation. As a side effect, this also results in basic board simplification: if there is a field that only one number can be filled to, it will be filled next – and generate a single successor only, which is usually called simplification. The same is true if we have a number that can only be filled to a single field.

Implementation

# Hidoku solver implementation using depth first search, branch cutting, intelligent successor generation and a little simplification.
# Rainhard Findling
# 2015/04
#
# possible improvements:
# - use a path finding algorithm, e.g. A* + manhattan distance instead of using max(y_dist,x_dist) to determine possible minimum distances.
#
import numpy
import itertools
from copy import copy

# STATICS
EMPTY = -1

def main():
    """gets executed if in __main__ (see bottom of file)."""

    # parameters
    board_file = '../boards/10x10_2.txt'

    # load board, initialize it, print some statistics
    board = Board(read_board(board_file))
    board.initialize()
    print board
    print 'nrs left', board.nrs_left_to_place
    print 'nrs_placement_possibilities', len(board.nrs_placement_possibilities)
    tmp = board.placements_splitted_per_nr()
    print 'placements per nr', map(lambda t:(t,tmp[t][0]), tmp.keys()), '\n'
    board.is_valid()
       
    # depth first search
    list_open = [board]
    solutions = []
    nodes_opened = 0
    nr_of_successors = []
    while(len(list_open) > 0):
        cur = list_open.pop()
        nodes_opened = nodes_opened + 1
        print 'looking at', cur
        print 'list_open', len(list_open)
        successors = cur.successors()
        if(len(successors)==0):
            print 'all successors invalid, reverting.'
        for s in successors:
            nr_of_successors = nr_of_successors + [len(successors)]
            if s.is_solved():
                solutions = solutions + [s]
                #break # enable to quit search after the first solution, disable to search the complete search space
            else:
                list_open = list_open + [s]
        else:
            continue
        break
    print 'done.'
    if len(solutions) == 0:
        print 'no solutions found.'
    else:
        print 'solution paths:', map(lambda s:s.solution_path(), solutions)
        print 'solutions:', solutions
        print 'solutions', len(solutions)
        print 'nodes_opened', nodes_opened
        print 'nr_of_successors', sum(nr_of_successors)
        print 'avg branching (not including reduction by simplification-drive-successor-generation)', sum(nr_of_successors)/float(len(nr_of_successors))
    
class Board:
    
    def __init__(self, nparray = None):
        self.nparray = nparray # the gameboard as board
        self.nrs_yx_dict = None # the gameboard as dict of nrs and corresponding y,x positions
        self.nr_range = None # range of numbers to be contained in board
        self.nrs_left_to_place = None # numbers left to be placed on the board
        self.nrs_placement_possibilities = None # list of possible placement possibilities
        self.field_connection_status = None # how many connections fields have left [0,2]
        self.parent = None
    
    def __str__(self):
        """str representation of nparray board."""
        sep = '   '
        top1 = '===' + ''.join(['====' for _ in range(len(self.nparray[0]))])
        top2 = '==='  + ''.join(['=='  for _ in range(len(self.nparray[0]))])
        s = '\n' + top1 + sep + top2 + '\n'
        for row_nr in range(len(self.nparray)):
            row = self.nparray[row_nr,:]
            s = s + '| ' + ' '.join(map(lambda x:'___' if x == EMPTY
                                        else '{:3.0f}'.format(x), row)) + ' |'
            fillings = self.field_connection_status[row_nr,:]
            s = s + sep + '| ' + ' '.join(map(lambda x:'x' if x == 0 else
                                              '.' if x == 1 else
                                              ' ', fillings)) + ' |\n'
        s = s + top1 + sep + top2 + '\n'
        return s
    
    def __copy__(self):
         other = Board()
         other.nparray = copy(self.nparray)
         other.nrs_yx_dict = copy(self.nrs_yx_dict)
         other.nr_range = self.nr_range # no need to copy
         other.nrs_left_to_place = copy(self.nrs_left_to_place)
         other.nrs_placement_possibilities = copy(self.nrs_placement_possibilities)
         other.field_connection_status = copy(self.field_connection_status)
         return other
     
    def __repr__(self):
        return self.__str__()

    def __eq__(self, other):
        return (self.nparray == other.nparray).all()
        
    def __ne__(self, other):
        return not self.__eq__(other)
    
    def initialize(self):
        """initialize the board. meant to be called on first board in search tree only."""
        self.nr_range = range(1, len(self.nparray) * len(self.nparray[0])+1)
        self.nrs_left_to_place = self.generate_nrs_left_to_place()
        self.nrs_yx_dict = self.generate_nrs_yx_dict()
        self.nrs_placement_possibilities = self.generate_nrs_placement_possibilities()
        self.field_connection_status = self.generate_all_field_connection_status()
        
    def generate_nrs_yx_dict(self):
        """generate yx position index of nrs."""
        nrs_yx_dict = {}
        for y in range(len(self.nparray)):
            for x in range(len(self.nparray[0])):
                if not self.nparray[y,x] == EMPTY:
                    nrs_yx_dict[self.nparray[y,x]] = (y,x)
        return nrs_yx_dict
        
    def generate_nrs_placement_possibilities(self):
        """1 huge list containing tuples about which nrs are placeable on which fields."""
        nr_placements = []
        for nr in self.nrs_left_to_place:
            for y in range(len(self.nparray)):
                for x in range(len(self.nparray[0])):
                   if(self.can_nr_be_placed_on_field(nr,y,x)):
                       nr_placements = nr_placements + [(nr,y,x)]
        return nr_placements
    
    def generate_all_field_connection_status(self):
        """for all filled fields: generate field connection status."""
        field_connection_status = numpy.ones([len(self.nparray), len(self.nparray[0])], int) * 2
        for y in range(len(self.nparray)):
            for x in range(len(self.nparray[0])):
                field_connection_status[y,x] = self.generate_field_connection_status(y,x)
        return field_connection_status
    
    def generate_field_connection_status(self,y,x):
        """generate field connection status for the given nr on y and x pos."""
        neighborhood = self.get_neighborhood_nrs(y,x)
        nr = self.nparray[y,x]
        dist = map(lambda n:abs(nr-n), neighborhood)
        field_connection_status = 2 - dist.count(1)
        if nr == self.nr_range[0] or nr == self.nr_range[-1]:
            field_connection_status = field_connection_status - 1
        return field_connection_status
    
    def fill_nr_to_field(self, nr, y, x):
        """fill given nr to given field."""
        if not nr in self.nrs_left_to_place:
            raise Exception('cannot place a nr not left to place.')
        self.nrs_left_to_place.remove(nr)
        if self.nparray[y,x] != EMPTY:
            raise Exception('cannot place a nr on a non empty field.')
        self.nparray[y,x] = nr
        self.nrs_yx_dict[nr] = (y,x)
        self.field_connection_status[y,x] = self.generate_field_connection_status(y,x)
        for (iy,ix) in self.get_neighborhood_pos(y,x):
            self.field_connection_status[iy,ix] = self.generate_field_connection_status(iy,ix)
        self.nrs_placement_possibilities = filter(lambda p:self.can_nr_be_placed_on_field(*p), self.nrs_placement_possibilities)
    
    def placements_splitted_per_nr(self):
        """split placement possibilities per nr."""
        splitted = {}
        for nr in self.nrs_left_to_place:
            for_nr = filter(lambda p:p[0]==nr, self.nrs_placement_possibilities)
            splitted[nr] = (len(for_nr), for_nr)
        return splitted
       
    def can_nr_be_placed_on_field(self,nr,y,x):
        """check if given nr is possible on y,x pos in field."""
        # only possible if empty
        if not self.nparray[y,x] == EMPTY:
            return False
        # only possible if nr still available
        if not nr in self.nrs_left_to_place:
            return False
        # check that nr is within range of other nrs on board
        return self.check_distance_to_other_nrs_ok(nr,y,x)

    def check_distance_to_other_nrs_ok(self,nr,y,x):
        """check if given nr on the given y,x is within range of other nrs on the board."""
        # -2 because -1 is max possible distance, therefore no cutoff effect
        d_max = max(len(self.nparray), len(self.nparray[0])) - 2 
        for nr2 in range(nr - d_max, nr + d_max + 1):
            if nr == nr2 or not nr2 in self.nrs_yx_dict:
                continue
            pos2 = self.nrs_yx_dict[nr2]
            # ensure that nr and nr2 are within range
            d = self.distance((y,x),pos2)
            if d > abs(nr-nr2):
                # the distance between the nrs is bigger than the difference between nrs - impossible to place here
                #print 'dist too big (nr, nr2, y, x, dist):', nr, nr2, y, x, abs(nr-nr2)
                return False
        return True

    def generate_nrs_left_to_place(self):
        """generate list of nrs still to placed on the field."""
        nrs = []
        for nr in self.nr_range:
            if not nr in self.nparray:
                nrs = nrs + [nr]
        return nrs
    
    def get_neighborhood_pos(self,y,x):
        """tuples of (y,x) pos of neighborhood."""
        yx = []
        for iy in range(max(y-1,0), min(y+2, len(self.nparray))):
            for ix in range(max(x-1,0), min(x+2, len(self.nparray[0]))):
                if not (y == iy and x == ix):
                    yx = yx + [(iy,ix)]
        return yx
    
    def get_neighborhood_nrs(self,y,x):
        """get all fields that surround the specified field."""
        n = []
        for (iy,ix) in self.get_neighborhood_pos(y,x):
            n = n + [self.nparray[iy,ix]]
        return n
    
    def neigbors_not_fully_connected(self, y, x):
        """get neighbors to the given pos that are not yet fully connected."""
        neighborhood = self.get_neighborhood_nrs(y,x)
        not_fully_connected = []
        for n in neighborhood:
            if n == EMPTY:
                not_fully_connected = not_fully_connected +  [n]
            else:
                (ny,nx) = self.nrs_yx_dict[n]
                if self.field_connection_status[ny,nx] > 0:
                    not_fully_connected = not_fully_connected + [n]
        return not_fully_connected
    
    def check_empty_field_neighbor_permutations_ok(self, y, x):
        """check if the field can be reached from it's neighbors permutations."""
        # attention: this does not handle first and last nr if they are not initially placed on the board
        if(self.nparray[y,x] != EMPTY):
            raise Exception('trying to check a non empty field with the exhausting empty-field check...')
        not_fully_connected = self.neigbors_not_fully_connected(y, x)
        amount_empty = not_fully_connected.count(-1)
        if(amount_empty >= 2):
            # we still have empty entrance and exit, too much effort to check
            return True
        if(amount_empty == 1):
            # if only 1 possibility: plugin + redo check
            possible_fillings1 = set(reduce(lambda a,b:a if b == EMPTY else a + [b-1,b+1], not_fully_connected, []))
            possible_fillings2 = filter(lambda x:x in self.nrs_left_to_place, possible_fillings1)
            if(len(possible_fillings2) > 1):
                # more than 1 possibility, still valid
                return True
            if(len(possible_fillings2)==1):
                # only this one possibility left! plugin, recheck
                self.fill_nr_to_field(possible_fillings2[0], y, x)
                print 'only one possibility left for field (y,x,nr)', y, x, possible_fillings2[0]
                return self.is_valid()
        else:
            if len(not_fully_connected) < 2:
                # impossible to connect if we have only 1 connectable field
                #print 'invalid, as permutations invalid', y, x                
                return False
            permutations = list(itertools.combinations(not_fully_connected, 2))
            diff2 = map(lambda t:(abs(t[0]-t[1]), (t[0]+t[1])/2), permutations)
            # permutations where diff2 is 2 are valid if the nr between them is still placeable. if there is only one: only possibility for that field. if there is none board is invalid
            diff2_valid = filter(lambda t:t[0] == 2, diff2)
            # check which nrs are still available to place
            diff2_available = filter(lambda t:t[1] in self.nrs_left_to_place, diff2_valid)
            if(len(diff2_available) > 1):
                # more than 1 possibility, still valid
                return True
            if(len(diff2_available) == 1):
                # only this one possibility left! plugin, recheck
                print self 
                print 'only one possibility left for field (y,x,nr)', y, x, diff2_available[0][1]
                self.fill_nr_to_field(diff2_available[0][1], y, x)
                return self.is_valid()
        #print 'invalid, as permutations invalid', y, x
        return False
    
    def check_filled_field_valid_connections(self,y,x):
        """check that the given field can still be reached by means of predecessing and successing nr."""
        nr = self.nparray[y,x]
        if(nr == EMPTY):
            raise Exception('trying to check an empty field with too inexact filled field check...')
        neighborhood = self.get_neighborhood_nrs(y,x)
        nr_frees = neighborhood.count(EMPTY)
        #print 'frees', nr_frees
        connected_to_me = [abs(n-nr) for n in neighborhood].count(1) # empty do not count as the will not result in 1 distance as nr 0 does not exist
        #print 'connected_to_me', connected_to_me
        need = 2
        if(nr == self.nr_range[0] or nr == self.nr_range[-1]):
            need = 1
        if(nr_frees + connected_to_me >= need):
            return True
        #print 'invalid, as too less connections filled field (nr, y, x)', nr, y, x
        return False
    
    def successors(self):
        """generate successors of."""
        # nr with least placement possibilities
        placements_splitted_per_nr = self.placements_splitted_per_nr()
        p_amount = map(lambda t:placements_splitted_per_nr[t][0], placements_splitted_per_nr.keys())
        #print 'p_amount', p_amount
        i_min = p_amount.index(min(p_amount))
        #print 'i_min', i_min
        nr = placements_splitted_per_nr.keys()[i_min]
        print 'placing nr', nr
        
        # placement possibilities for nr
        # (nr,y,x)
        nr_possibilities = filter(lambda t:t[0] == nr, self.nrs_placement_possibilities)
        print 'possibilities to place (nr, len(p), p)', nr, len(nr_possibilities), nr_possibilities
        
        # sanity check: would there be a field that has only 1 nr to be filled with?
        yx_amount = {}
        for p in self.nrs_placement_possibilities:
            key = (p[1],p[2])
            if not p in yx_amount:
                yx_amount[key] = [p]
            else:
                yx_amount[key] = yx_amount[key] + [p]
        yx_ones = filter(lambda t:yx_amount[t] == 1, yx_amount.keys())
        if(len(nr_possibilities) > 1 and len(yx_ones) > 1):
            raise Exception('there is a field to fill with one nr instead. implement generating that successor therefore.')
                    
        successors = []
        for p in nr_possibilities:
            suc = copy(self)
            suc.parent = self
            suc.fill_nr_to_field(*p)
            if suc.is_valid():
                successors = successors + [suc] 
        return successors
    
    def solution_path(self):
        """get solution path up to this point."""
        l = [self]
        cur = self
        while(not cur.parent is None):
            l = [cur.parent] + l
            cur = cur.parent
        return l
          
    def distance_nrs(self, nr1, nr2):
        """get distance between nr1 and nr2 on board or None if at least one of the nrs is not present."""
        if not nr1 in self.nrs_yx_dict or not nr2 in self.nrs_yx_dict:
            return None
        pos1 = self.nrs_yx_dict[nr1]
        pos2 = self.nrs_yx_dict[nr2]
        return self.distance(pos1, pos2)
    
    def distance(self, pos1, pos2):
        """distance between pos1 and pos2."""
        return max(abs(pos1[0]-pos2[0]), abs(pos1[1]-pos2[1]))
               
    def is_valid(self):
        """check if board is still valid."""
        #check if each nr can be placed
        for nr in self.nrs_left_to_place:
            if len(filter(lambda t:t[0] == nr, self.nrs_placement_possibilities)) == 0:
                #print 'invalid as nr', nr, 'cannot be placed.'
                return False
        # check that each field if filled is connected or connectable and if not set has permutations of neighbors that allow it to be filled
        for y in range(len(self.nparray)):
            for x in range(len(self.nparray[0])):
                if self.nparray[y,x] == EMPTY:
                    if not self.check_empty_field_neighbor_permutations_ok(y,x):
                        return False
                else:
                    if not self.check_filled_field_valid_connections(y,x):
                        return False
        return True
    
    def is_solved(self):
        """checks if the field is solved."""
        if len(self.nrs_left_to_place) == 0:
            # sanity checks
            if EMPTY in self.nparray:
                raise Exception('no numbers left to place, but still empty fields?')
            return True
        return False
            
def read_board(filepath):
    """read board from a textfile."""
    f = open(filepath, 'r')
    rows = []
    w = -1 # same width for all lines
    for line in f.read().splitlines():
        values = line.split(',')
        if w == -1:
            w = len(values)
        elif w != len(values):
            raise Exception('nr. of columns differ amongst rows.')
        values = map(lambda x:EMPTY if '_' in x else int(x), values)
        rows = rows = rows + [values]
    f.close()
    return numpy.array(rows)
    
if __name__ == "__main__":
    main()

When calling the solver on the sample board from above, it finds and prints the single possible solution (as well as the path how to get there, which we don’t show here for space reasons):

[...]
solutions: [
===========================================   =======================
|  91  92  93  20  21  36  37  38  39  57 |   | x x x x x x x x x x |
|  90  94  19  22  35  53  54  40  56  58 |   | x x x x x x x x x x |
|  89  95  18  23  52  34 100  55  41  59 |   | x x x x x x x x x x |
|  88  17  96  51  24  99  33  61  60  42 |   | x x x x x x x x x x |
|  16  87  50  97  98  25  62  32  43  30 |   | x x x x x x x x x x |
|  15   6  86  49  48  63  26  44  31  29 |   | x x x x x x x x x x |
|  14   5   7  85  64  47  45  27  28  78 |   | x x x x x x x x x x |
|   4  13   8  65  84  70  46  72  77  79 |   | x x x x x x x x x x |
|   3   9  12  66  69  83  71  73  80  76 |   | x x x x x x x x x x |
|   1   2  10  11  67  68  82  81  74  75 |   | x x x x x x x x x x |
===========================================   =======================
]
solutions 1
nodes_opened 1220
nr_of_successors 1646
avg branching (not including reduction by simplification-driven successor generation) 1.3491803

Branching factor and search speed

Limited successor generation and branch cutting greatly influence the branching factor, therefore the overall runtime. With enabling both, you can see that the branching factor is around 1.35, and that the complete search space can successfully be searched. For me, disabling both resulted in the solver not even being able to find the solution in feasible time – but just try and see for yourself! ūüėČ

Cracking RC4 messages that use weak RC4 reinitialization for each message part

April 12, 2015 Leave a comment

This post deals with a short excerpt of a security and hacking related exercise at the University of Applied Sciences Upper Austria, Campus Hagenberg: we obtain the plaintext from an intercepted RC4 encoded message, from which we know that it uses an easy to crack re-initialization of the RC4 cipher for each message part.

The problem

Imagine we have intercepted a secret message split into 11 message parts, each (but the last) consisting of 40 1-byte characters:

0xd9, 0xef, 0x7b, 0x6e, 0xca, 0xb5, 0x12, 0xa0, 0x4f, 0x4b, 0x56, 0xb4, 0x94, 0xdf, 0x27, 0xed, 0xf8, 0xcc, 0x64, 0x94, 0xeb, 0x5a, 0x77, 0xea, 0x9b, 0x76, 0xdf, 0xe9, 0x18, 0x02, 0xc6, 0x36, 0x19, 0x1f, 0xc8, 0xf2, 0x5a, 0xa8, 0xda, 0xe7,
0xb7, 0xd3, 0x5f, 0x4e, 0x89, 0x9f, 0x01, 0xa9, 0x0e, 0x4c, 0x5c, 0xb6, 0xc0, 0xdc, 0x2e, 0xfa, 0xbd, 0xbf, 0x63, 0xb4, 0xa4, 0x42, 0x77, 0xb3, 0x98, 0x62, 0xd5, 0xf0, 0x5d, 0x07, 0xdc, 0x62, 0x17, 0x4d, 0xca, 0xe6, 0x5a, 0xa8, 0xdd, 0xec,
0xf9, 0xd3, 0x5c, 0x47, 0x85, 0x86, 0x05, 0xab, 0x5d, 0x4c, 0x5b, 0xf0, 0xc7, 0xc2, 0x22, 0xf8, 0xb4, 0xe6, 0x17, 0xb8, 0xa4, 0x5f, 0x66, 0xb3, 0x80, 0x79, 0xd9, 0xec, 0x18, 0x1d, 0xc0, 0x35, 0x51, 0x4b, 0xca, 0xab, 0x19, 0xaf, 0xc2, 0xe6,
0xe4, 0xc5, 0x0f, 0x5f, 0x82, 0x93, 0x40, 0xb7, 0x47, 0x58, 0x5b, 0xa4, 0x94, 0xdb, 0x2e, 0xfa, 0xb9, 0xf2, 0x52, 0xa8, 0xae, 0x43, 0x61, 0xb3, 0xbc, 0x7e, 0xc2, 0xf3, 0x18, 0x18, 0xd6, 0x62, 0x1c, 0x50, 0xc1, 0xe2, 0x1c, 0xae, 0xce, 0xe8,
0xe3, 0xc9, 0x40, 0x45, 0x99, 0xd6, 0x32, 0x96, 0x6f, 0x1f, 0x56, 0xbe, 0xd7, 0xd9, 0x36, 0xf8, 0xac, 0xf6, 0x58, 0xb2, 0xeb, 0x46, 0x7d, 0xe1, 0x80, 0x64, 0x96, 0xec, 0x59, 0x0c, 0x8f, 0x24, 0x10, 0x4c, 0xd1, 0xee, 0x08, 0xe7, 0x8d, 0xa9,
0xc3, 0xef, 0x6b, 0x64, 0xca, 0xb3, 0x0e, 0xa6, 0x5c, 0x46, 0x43, 0xa4, 0x94, 0xc9, 0x26, 0xfc, 0xbb, 0xf0, 0x5e, 0xb2, 0xeb, 0x41, 0x60, 0xfa, 0x9d, 0x76, 0xc2, 0xfe, 0x18, 0x1e, 0xca, 0x3b, 0x02, 0x1f, 0xd2, 0xe2, 0x0e, 0xaf, 0x8d, 0xce,
0xc7, 0xe7, 0x0f, 0x41, 0x9f, 0x85, 0x14, 0xe5, 0x4c, 0x5a, 0x13, 0xb2, 0xd1, 0x8b, 0x3c, 0xfd, 0xaa, 0xfa, 0x17, 0xbe, 0xae, 0x52, 0x73, 0xe6, 0x98, 0x72, 0x96, 0xef, 0x50, 0x10, 0xd6, 0x62, 0x19, 0x5e, 0xd3, 0xee, 0x5a, 0xa5, 0xc8, 0xec,
0xf9, 0x80, 0x4a, 0x53, 0x9a, 0x99, 0x12, 0xb1, 0x4b, 0x5b, 0x13, 0xa5, 0xda, 0xdb, 0x3d, 0xe7, 0xac, 0xfa, 0x54, 0xa8, 0xae, 0x55, 0x32, 0xc4, 0x82, 0x7b, 0xda, 0xbb, 0x4b, 0x10, 0xc1, 0x26, 0x51, 0x4b, 0xcd, 0xee, 0x17, 0xe7, 0xf9, 0xc5,
0xc4, 0x80, 0x5f, 0x59, 0x85, 0x82, 0x05, 0xa6, 0x5a, 0x5a, 0x57, 0xf0, 0xd6, 0xde, 0x3b, 0xa8, 0xbb, 0xfe, 0x59, 0xfc, 0xb2, 0x5e, 0x67, 0xb3, 0x9f, 0x65, 0xc3, 0xe8, 0x4c, 0x55, 0xfb, 0x0e, 0x22, 0x1f, 0xf5, 0xea, 0x09, 0xb4, 0xda, 0xe6,
0xe5, 0xc4, 0x0f, 0x4d, 0x85, 0x84, 0x40, 0xa1, 0x4f, 0x49, 0x5a, 0xb4, 0xd0, 0xde, 0x21, 0xe6, 0xb6, 0xfa, 0x43, 0xfc, 0xa0, 0x54, 0x6b, 0xb3, 0x82, 0x64, 0x96, 0xc2, 0x48, 0x33, 0xcd, 0x26, 0x25, 0x73, 0xce, 0xe8, 0x2e, 0x8a, 0xc8, 0xcb,
0xff, 0xf6, 0x5a, 0x0b, 0xca, 0xb2, 0x01, 0xb3, 0x47, 0x5b

We know all these message parts have been encrypted using RC4 with a known to be easy to crack reinitialization of the RC4 stream for each message part. We also know the message plaintext only contains letters and whitespaces. The task is: obtain the plaintext. How to do so?

The solution

We know all 11 message parts have been encrypted using the same key stream (as the RC4 cipher has been re-initialized for each message using the same secret key). To avoid confusion here: as stream cipher, RC4 gets initialized from a secret key (e.g. the user’s password) and generates a continuous key stream from it. Each time a RC4 cipher is initialized this way, the generated key stream is exactly the same. Therefore, the RC4 reinitialization for each package causes the key stream to be exactly the same for all messages. RC4 does an XOR on the key stream and the plaintext to obtain the ciphertext. Hence, an XOR with the key stream used for all 11 message ciphertexts causes these to become correct plaintext. We further know that valid plaintext only allows certain characters. Therefore it’s possible to try through all 255 possibilities per keystream character: XOR each one with all corresponding characters of all ciphertexts and keep only those, where all resulting plaintext characters are actually allowed ones. By doing so we generate a list of valid keystreams in an effective way (which by doing an XOR on the ciphertext all result in valid plaintext). Trying through these keys (XOR with plaintext) until we see a message that “makes sense” is now very easy – and could even be done by hand (not posting the actual solution here).

# Cracking RC4 messages that use weak RC4 reinitialization for each message part
#
# Rainhard Findling
# University of Applied Sciences Upper Austria, Campus Hagenberg
# 2015/03
# 
# all ciphertext message parts
c1 = [0xd9, 0xef, 0x7b, 0x6e, 0xca, 0xb5, 0x12, 0xa0, 0x4f, 0x4b, 0x56, 0xb4, 0x94, 0xdf, 0x27, 0xed, 0xf8, 0xcc, 0x64, 0x94, 0xeb, 0x5a, 0x77, 0xea, 0x9b, 0x76, 0xdf, 0xe9, 0x18, 0x02, 0xc6, 0x36, 0x19, 0x1f, 0xc8, 0xf2, 0x5a, 0xa8, 0xda, 0xe7]

c2 = [0xb7, 0xd3, 0x5f, 0x4e, 0x89, 0x9f, 0x01, 0xa9, 0x0e, 0x4c, 0x5c, 0xb6, 0xc0, 0xdc, 0x2e, 0xfa, 0xbd, 0xbf, 0x63, 0xb4, 0xa4, 0x42, 0x77, 0xb3, 0x98, 0x62, 0xd5, 0xf0, 0x5d, 0x07, 0xdc, 0x62, 0x17, 0x4d, 0xca, 0xe6, 0x5a, 0xa8, 0xdd, 0xec]

c3 = [0xf9, 0xd3, 0x5c, 0x47, 0x85, 0x86, 0x05, 0xab, 0x5d, 0x4c, 0x5b, 0xf0, 0xc7, 0xc2, 0x22, 0xf8, 0xb4, 0xe6, 0x17, 0xb8, 0xa4, 0x5f, 0x66, 0xb3, 0x80, 0x79, 0xd9, 0xec, 0x18, 0x1d, 0xc0, 0x35, 0x51, 0x4b, 0xca, 0xab, 0x19, 0xaf, 0xc2, 0xe6]

c4 = [0xe4, 0xc5, 0x0f, 0x5f, 0x82, 0x93, 0x40, 0xb7, 0x47, 0x58, 0x5b, 0xa4, 0x94, 0xdb, 0x2e, 0xfa, 0xb9, 0xf2, 0x52, 0xa8, 0xae, 0x43, 0x61, 0xb3, 0xbc, 0x7e, 0xc2, 0xf3, 0x18, 0x18, 0xd6, 0x62, 0x1c, 0x50, 0xc1, 0xe2, 0x1c, 0xae, 0xce, 0xe8]

c5 = [0xe3, 0xc9, 0x40, 0x45, 0x99, 0xd6, 0x32, 0x96, 0x6f, 0x1f, 0x56, 0xbe, 0xd7, 0xd9, 0x36, 0xf8, 0xac, 0xf6, 0x58, 0xb2, 0xeb, 0x46, 0x7d, 0xe1, 0x80, 0x64, 0x96, 0xec, 0x59, 0x0c, 0x8f, 0x24, 0x10, 0x4c, 0xd1, 0xee, 0x08, 0xe7, 0x8d, 0xa9]

c6 = [0xc3, 0xef, 0x6b, 0x64, 0xca, 0xb3, 0x0e, 0xa6, 0x5c, 0x46, 0x43, 0xa4, 0x94, 0xc9, 0x26, 0xfc, 0xbb, 0xf0, 0x5e, 0xb2, 0xeb, 0x41, 0x60, 0xfa, 0x9d, 0x76, 0xc2, 0xfe, 0x18, 0x1e, 0xca, 0x3b, 0x02, 0x1f, 0xd2, 0xe2, 0x0e, 0xaf, 0x8d, 0xce]

c7 = [0xc7, 0xe7, 0x0f, 0x41, 0x9f, 0x85, 0x14, 0xe5, 0x4c, 0x5a, 0x13, 0xb2, 0xd1, 0x8b, 0x3c, 0xfd, 0xaa, 0xfa, 0x17, 0xbe, 0xae, 0x52, 0x73, 0xe6, 0x98, 0x72, 0x96, 0xef, 0x50, 0x10, 0xd6, 0x62, 0x19, 0x5e, 0xd3, 0xee, 0x5a, 0xa5, 0xc8, 0xec]

c8 = [0xf9, 0x80, 0x4a, 0x53, 0x9a, 0x99, 0x12, 0xb1, 0x4b, 0x5b, 0x13, 0xa5, 0xda, 0xdb, 0x3d, 0xe7, 0xac, 0xfa, 0x54, 0xa8, 0xae, 0x55, 0x32, 0xc4, 0x82, 0x7b, 0xda, 0xbb, 0x4b, 0x10, 0xc1, 0x26, 0x51, 0x4b, 0xcd, 0xee, 0x17, 0xe7, 0xf9, 0xc5]

c9 = [0xc4, 0x80, 0x5f, 0x59, 0x85, 0x82, 0x05, 0xa6, 0x5a, 0x5a, 0x57, 0xf0, 0xd6, 0xde, 0x3b, 0xa8, 0xbb, 0xfe, 0x59, 0xfc, 0xb2, 0x5e, 0x67, 0xb3, 0x9f, 0x65, 0xc3, 0xe8, 0x4c, 0x55, 0xfb, 0x0e, 0x22, 0x1f, 0xf5, 0xea, 0x09, 0xb4, 0xda, 0xe6]

c10 = [0xe5, 0xc4, 0x0f, 0x4d, 0x85, 0x84, 0x40, 0xa1, 0x4f, 0x49, 0x5a, 0xb4, 0xd0, 0xde, 0x21, 0xe6, 0xb6, 0xfa, 0x43, 0xfc, 0xa0, 0x54, 0x6b, 0xb3, 0x82, 0x64, 0x96, 0xc2, 0x48, 0x33, 0xcd, 0x26, 0x25, 0x73, 0xce, 0xe8, 0x2e, 0x8a, 0xc8, 0xcb]

c11 = [0xff, 0xf6, 0x5a, 0x0b, 0xca, 0xb2, 0x01, 0xb3, 0x47, 0x5b]

# concatenate all ciphertext parts (except last) 
ciphertext = [c1, c2, c3, c4, c5, c6, c7, c8, c9, c10]

# our brute forcing method
def try_keys(c):
    # each key is one byte, so 255 possibilities here
    k_ok = []
    for k in range(255):
        plaintext = [x^k for x in c] 
        # all chars of one position have to translate to chars allowed in plaintext: 41-5a, 61-7a
        ok = [p_chr == 0x20 or p_chr >= 0x41 and p_chr <= 0x5a or p_chr >= 0x61 and p_chr <= 0x7a for p_chr in plaintext]
        if(all(ok)):
            k_ok += [k]
    return k_ok

# brute force all keystream positions
xor = []
for i in range(40):
    c = [x[i] for x in ciphertext]
    xor += [try_keys(c)]

# see amount of possibilities left per keystream char position
print filter(lambda x:x[1] > 1, [(i, len(xor[i])) for i in range(len(xor))])

# use "first" key
key = [x[0] for x in xor]
# adapt specific keystream positions per hand so that all plaintext is correct (could be automated)
key[3] = xor[3][7]
key[14] = xor[14][1]
key[21] = xor[21][1]
key[24] = xor[24][3]
key[25] = xor[25][5]
key[34] = xor[34][14]

# decode (now include last message part)
plaintext = []
for c in (ciphertext + [c11]):
    t = []
    for i in range(len(c)):
        t += [chr(c[i]^key[i])]
    plaintext += t

print ''.join(plaintext)

Note: the more plaintext messages encoded by the same, re-initialized RC4 cipher we have, the less possible keystreams will remain, and the faster we obtain the plaintext. Btw: this is pretty much the problem WiFi WEP had – this is the core problem of why it can be cracked so easily, and why it is considered very insecure by now.

Draught board puzzle / checkerboard puzzle solver in Python

January 20, 2015 Leave a comment

The checkerboard puzzle or draught board puzzle (also called Krazee Checkerboard Puzzle, Banzee Island checkerboard puzzle, Zebas puzzle, etc.) is a mutilated chessboard problem, which further is a tiling puzzle/dissection puzzle. Hence, the core problem is similar to the one of solving the well known Tangram, which some of you might be familiar with. The checkerboard puzzle we are looking at in this post is a 8×8 chess board imitation with black and white fields. It is split into 12 stones (left figure). The puzzle goal is to obtain at least one valid solution (right figure) – which is done by our solver. Although we are discussion this specific 8×8 board, the solver is designed to work with any board size (even with unequal width and height).

Checkerboard puzzle representation

The stones the board consists of are stated in a text file, which the solver loads at the start. In the text file, stones are represented as an 2D array:

  • B represents a black field
  • W represents a white field
  • _ represents an “empty” field (the stone does not cover that field)
  • === represents separator between stones
B,_,_
W,B,W
_,_,B
============
W,B,W,B
_,W,_,_
============
_,W,B,_
W,B,W,B
_,W,B,_
============
B,W,_,_
_,B,W,B
============
B,W,B,W
W,_,_,_
============
W,B,W,B
B,_,_,_
============
B,W
_,B
_,W
============
W,B,W
B,W,_
W,_,_
============
B,W,B,W
_,B,_,_
============
W,B,_,_
_,W,B,W
============
B,W,B,W
_,B,W,_
============
B,_,_
W,B,W
_,_,B
============

Within the solver, stones are represented as an 2D array as well:

  • -1 represents a black field
  • 1 represent a white field
  • 0 represents an empty (not covered) field

As each stone can be rotated and flipped (“mirrored”), that stone can be put to the board in one of at maximum 8 versions (which we call “rotations” from now on). When loading stones, the solver generates and stores all different rotations for each stone for speedup.

The gameboard is represented as an 2D array as well:

  • -1 is a black field, hence to fill with the black part of a stone
  • 1 is a white white, hence needs a white part of a stone

This representation gives us the advantage of putting stones to the board fast (and checking, if that is possible in the first place). To check if a stone can be put to a specific xy-position on the board, we use a pointwise array sum. Putting a black part of a stone to a black field on the board results in a sum of -2, putting a white one to a white field results in 2. Instead, putting a black one to a white field or a white one to a black field results in a sum of 0. If a stone does not cover a certain field, the sum will be -1 or 1 (depending on if the field on the board is black or white). Hence, when checking if a stone can be placed on the board, -2 and 2 as well as -1 and 1 are valid sums, while 0 is an invalid sum.

We also have to account for other stones already placed on the board. When we put a stone to the board we change newly covered black fields to -10 and newly covered white fields to 10. This way we ensure that putting a stone ontop another one results in different sums. We consequently have to extend valid sums to also include -10 and 10, and invalid sums -11, 11, -9, 9 (would be the result of trying to put a stone ontop another).

Solving the checkerboard puzzle

As we want to obtain all possible solutions our solver is designed as depth first search with branch cutting and limited successor generation. If a solution is found it gets appended to a solution file – and the search goes on. The search terminates if we have checked all possible states. We further do some inter-state caching of data for speedup.

Successor generation

We do not generate all possible successors, but only those of placing a single stones to the field, in all rotations and to all possible positions. This avoids arriving at same successor states by putting stone A and B in different order, hence reduces branching. Consequently, we have to decide on the one stone to put to the board. Imagine that for one stone there is only one placement possibility left on the board. A human player would always put that stone to the board immediately – which is a good idea, as it further reduces branching. Our search does the same by always putting the stone to the board with least placement possibilities left.

Branch cutting

We do branch cutting using a validity check. A failed validity check indicates that a board cannot yield a valid solution any more, hence it gets excluded from further search (which means none of it successors get generated or looked at, which cuts the complete branch). This check is the main cause for decreasing the branching factor – hence it’s OK if a noteworthy part of computations is done for it (anyway the implementation is done in a way that the validity check itself is quick). The validity check ensures:

  • Each not yet placed stone must have at least one position on the board left where it could be placed
  • Each field must have at least one stone that could still cover that field (based on stone placements still possible with the board)

Caching for speedup

Naturally, each board contains information about:

  • Which stones have been placed where so far (as array for faster access to the board, as well as list of placed stones, their rotation and position)
  • Which stones are left to be placed

Additionally, each board also contains:

  • For all stones left to be placed on the board: where they could possibly be placed
  • For all yet empty fields on the board: all stones left to be placed that could possibly fill this field

For the latter two, all posibilities are generated at the game start. Afterwards, it is only necessary to update them, depending on which stones are put where to the board.

Implementation

The implementation source code is available in this repository: https://github.com/pirius/draught-board-puzzle-aka-checkerboard-puzzle-solver

When called on our sample board, the solver finds multiple solutions for our 8×8 board, e.g. the following one:

===================================
|  7B 11W 11B 11W  2B  3W  3B  5W |
|  7W 11B  2W  2B  2W  3B  5W  5B |
|  7B  7W  2B 10W  3B  3W  5B  5W |
|  4W  7B  6W 10B  1W  1B  1W  5B |
|  4B  6W  6B 10W 10B  1W  1B  8W |
|  4W  4B  6W 10B  0W  0B  1W  8B |
|  9B  4W  6B  0W  0B  0W  0B  8W |
|  9W  9B  9W  9B  0W  0B  8W  8B |
===================================

Findings

It’s not easy to estimate the branching factor for this checkerboard puzzle. Depending on which stone is chosen to be the one to place on the board, the initial branching can range up to above 300 possibilities (8×8 field and 2×3 stone). Which each successive stone, the branching factor gets smaller, as less possibilities to place stones remain. For an 8×8 board, without immediately excluding successors eliminated by the branch cutting validity check, the measured average approximation of the branching factor is in the range of 6.5. This includes all stones placed to the board within search as well as invalid successor states, which would be removed by validity checks immediately. If excluding successors failing validity checks, the measured average approximation of the branching factor is reduced to about 1.2 – which is a huge gain (hence, the validity check is really worth its time).

Jodici solver: Python vs Prolog

November 8, 2014 1 comment
In the equation puzzle solver and the flower disk rotation puzzle we took a look at differences of solving problems in both Python and Prolog. In this followup post we look at Python vs Prolog again, but deal with a different problem: a Jodici solver.

Jodici

Jodici is a fun and intuitive number placement puzzle from Austria (sadly, there is only a homepage in German language available by 10/2014). Jodici is comparable to Sudoku by filling numbers into a gamefield so that certain rules are satisfied. Jodici differs from Sudoku by a) including calculation by addition and b) spanning a noteworthy smaller solution space (hence it’s easier to solve using brute force).
Jodici exampleA Jodici gamefield consists of a circle which a) contains 3 nested rings and b) is divided into 6 cake-piece-like sectors. Each intersection of sector and ring is a field – therefore each Jodici¬† gamfield contains 3×6=18 such fields in total. As with Sudoku, the goal is to fill in all numbers, while satisfying certain rules: each field must contain an integer [1,9], with each such integer being used twice in total. Further, each sector sums up to 15 and each ring to 30.

Jodici solver (brute force)

Given the maximum of 18 initially empty fields (which of course defeats the purpose, but can be used as upper limit of effort required to solve Jodici), the solution space is 9^18 (corresponds to 1.5*10^17 solutions and ~57bit entropy). Given a more typical setup of 6 initially filled fields (6 initially known numbers), the solution space is reduced to 9^12 (corresponds to 2.8*10^11 solutions and ~38bit entropy). This solution space could be brute forced even without using heuristics or branch cutting during search (at the cost of arguable runtime). But as for the previous equation puzzle solving problem, branch cutting can be used to notably reduce search space and therefore runtime (we’re not making use of order-of-variables-heuristics as we did for the equation puzzle solver for reasons of simplicity, although this would of course further speed up search).

Python approach

With Python we at first state the Jodici gamefield representation as a 6×3 array. For reasons of easiness empty fields are represented by the number 0 (does not influence the sums we’re going to calculate later). For brute forcing the game, we then implement a depth first search with branch cutting. Branch cutting is caused by doing validity check after each added number, and aborting further search with the current configuration if this configuration turned out to be invalid already. The validity check consists of several rules, which all must be satisfied for the gamefield to (still) be valid:
  1. each number must at maximum be used twice
  2. each ring must sum to < 30 if it’s not fully filled yet, and sum to 30 if it’s already fully filled
  3. each sector must sum to < 15 if it’s not fully filled yet and sum to 15 if it’s already fully filled.

Finally, for ease of use, the gamefield initially gets loaded from a .csv file like the one shown below (which corresponds to the Jodici sample from above):

3,7,_,_,_,_
_,_,1,5,9,_
6,_,_,_,_,_

Loading the gamefield and solving the game:

#!/usr/local/bin/python2.7
# Rainhard Findling
# 11/2014
# 
import copy
import csv

# generate gamefield
field = []
for i in range(6):
  field.append([0, 0, 0])

def load_gamefield(f):
    print('loading gamefield...')
    f = open(f, 'r')
    reader = csv.reader(f)
    rows = [row for row in reader]
    field = []
    for inner in range(6):
        # reorder to "cake pieces" and replace '_' with 0
        field.append([int(rows[outer][inner]) if rows[outer][inner] != '_' else 0 for outer in range(3)])
    return field

# load gamefield
field = load_gamefield('gamefields/1.csv')
print field

print('searching for solution...')

def field_valid(suc):
    """check if field is valid. valid != solved, field must be filled and valid to be solved."""
    valid = True
    # all sectors <= 15 or == 15 if they are already set
    for i in range(6):
        if sum(suc[i]) > 15 or not 0 in suc[i] and sum(suc[i]) != 15:
            return False
    # all circles <= 30 or == 30 if they are already set
    for inner in range(3):
        circ = [suc[x][inner] for x in range(6)]
        if sum(circ) > 30 or not 0 in circ and sum(circ) != 30:
            return False 
    # each nr used twice at max
    for nr in range(1,10):
        if(sum([row.count(nr) for row in suc]) > 2):
            return False
    return True

def field_filled(suc):
    """check if field is fully filled. filled != solved, field must be filled and valid to be solved."""
    if(sum(0 in tmp for tmp in suc) == 0):
        return True
    return False

# list of fields to try
l = [field]
# bruteforce (depth first) all 0 positions
while(len(l) > 0):
  cur_field = l.pop(0)
  # find position to fill in
  found = False
  for s in range(6):
    for f in range(3):
      if cur_field[s][f] == 0:
        found = True
        for nr in reversed(range(1,10)):
          # creade successor
          suc = copy.deepcopy(cur_field)
          suc[s][f] = nr
          # check if successor is valid
          if not field_valid(suc):
              continue
          # if successor is not filled add to list
          if not field_filled(suc):
            l.insert(0, suc)
          elif field_valid(suc):
                # we found solution
                print 'solution:', suc
      if found:
        break
    if found:
      break

print 'done.' 

The python implementation finds the (single possible) solution:

loading gamefield...
[[3, 0, 6], [7, 0, 0], [0, 1, 0], [0, 5, 0], [0, 9, 0], [0, 0, 0]]
searching for solution...
solution: [[3, 6, 6], [7, 1, 7], [5, 1, 9], [8, 5, 2], [4, 9, 2], [3, 8, 4]]
done.

Prolog approach

In contrast to the Python implementation, with the Prolog implementation we even leave out branch cutting during search (we would simply need to state rules in specific orders to cause branch cutting). In our knowledge database we state some rules: one for the valid number range, two for valid rows and sectors, another for counting frequency of an element in a list, and a last one for a valid Jodici gamefield. The last rule defines how many variables we’re searching for (correspond to the fields in a Jodici) and checks for correct sector and ring sums, as well as for each number being used twice exactly.

% Rainhard Findling 
% 11/2014
% Written to run in swipl
% 
% 1. load using ['jodici.pl'].
% 2. try a jodici using gamefield, e.g.
% 
%  R1=[3,7,_,_,_,_],R2=[_,_,1,5,9,_],R3=[6,_,_,_,_,_],gamefield([R1,R2,R3]).
% 
% define which numbers are allowed and what rows and circles have to look like to be valid
nr(X) :- between(1,9,X). % member(X,[1,2,3,4,5,6,7,8,9]).
row(A,B,C) :- nr(A), nr(B), nr(C), sum_list([A,B,C],15).
circle(A,B,C,D,E,F) :- nr(A), nr(B), nr(C), nr(D), nr(E), nr(F), sum_list([A,B,C,D,E,F],30).

% count nr of occurences of X in list
count([],_,0).
count([X|T],X,Y) :- count(T,X,Z), Y is 1+Z.
count([H|T],X,Y) :- H\=X, count(T,X,Y).

% definition of valid gamefield
gamefield([R1,R2,R3]) :-
    % check: correct nr of elements in gamefield
    R1=[X11,X21,X31,X41,X51,X61],
    R2=[X12,X22,X32,X42,X52,X62],
    R3=[X13,X23,X33,X43,X53,X63],
    append(R1,R2,Tmp),
    append(Tmp,R3,All),
    % check: correct rows
    row(X11,X12,X13),
    row(X21,X22,X23),
    row(X31,X32,X33),
    row(X41,X42,X43),
    row(X51,X52,X53),
    row(X61,X62,X63),
    % check: correct circles
    circle(X11,X21,X31,X41,X51,X61),
    circle(X12,X22,X32,X42,X52,X62),
    circle(X13,X23,X33,X43,X53,X63),
    % check: each nr used twice
    count(All, 1, 2),
    count(All, 2, 2),
    count(All, 3, 2),
    count(All, 4, 2),
    count(All, 5, 2),
    count(All, 6, 2),
    count(All, 7, 2),
    count(All, 8, 2),
    count(All, 9, 2).

After loading this knowledge database, we can ask for solutions to a given Jodici (like for the Jodici example from above) and Prolog presents us the same, single possible solution:

?- R1=[3,7,_,_,_,_],R2=[_,_,1,5,9,_],R3=[6,_,_,_,_,_],gamefield([R1,R2,R3]).
R1 = [3, 7, 5, 8, 4, 3],
R2 = [6, 1, 1, 5, 9, 8],
R3 = [6, 7, 9, 2, 2, 4] ;
false.

Conclusion

With Python, we specify the gamefield representation, search algorithm, validity checks and order of operations for speedup (the last one is optional). Similarly, with Prolog we specify the gamefield representation and validity checks – but leave out the search algorithm. Therefore, as for the equation puzzle solver, the main conceptual difference between implementations is that with Prolog, the search algorithm needs not be implemented explicitly, while with Python it must be stated explicitly.

Flower disk rotation puzzle solver: Python vs Prolog

November 2, 2014 2 comments

This post is a followup to the Equation puzzle solver: Python vs Prolog. We again compare Python and Prolog, but look at a different problem: a disk rotation puzzle.

The flower disk rotation puzzle

The flower disk rotation puzzle consists of 4 wooden, stacked disks. The disks are connected at their center via a pole, so that they can be rotated. Each disk contains holes that are arranged around the disk center in the form of a flower. The holes are uniformly spread, so that there is space for 12 holes – but each disk only has 9 of these 12 possible holes (the position of holes differ per disk). The remaining 3 areas are instead made of solid wood. The goal is to rotate the disks so that all holes are covered by at least one of the disks (as we have a total amount of 4*3=12 solid areas for a total of 12 holes, each solid area must cover exactly one hole).

For purpose of purchasing, this puzzle can be found online in appropriate stores, and there exist online descriptions looking at the “hardware” puzzle in more detail (e.g. here).

Solving the flower disk rotation puzzle

There exist 12 possible rotations per disk and 4 disks in total, which leads to a solution space of 12^4 (20.7K solutions and 14.3 bit entropy). But as one disk can be thought of being fixed (because rotation of the other disks is relative to this disk), the relevant solution space is 12^3 (1728 solutions and 10.8 bit entropy). As the solution space is so small, a solver using pure, uninformed brute force search is sufficient (even a breath first search will not cause memory troubles). When adding branch cutting to the search, the solution space gets even smaller: in fact so small, that even a human player can perform an exhaustive search. For our puzzle, branch cutting is done via skipping search building on top of an already invalid solution, e.g. when one hole is covered by more than one solid area.

Of course different “harware” versions of the puzzle exist (holes in the disks are arranged differently). However, the puzzle I purchased was special: when playing around with it I noticed I couldn’t come up with a solution, although the solution space is very small. My puzzle has the following disk setup (each disk is represented by one line, holes are represented by 0, solid areas by 1, and all disks are rotated randomly):

[0,0,0,0,0,1,0,1,0,0,1,0]
[1,0,0,1,0,0,0,0,0,0,0,1]
[0,1,0,0,0,0,0,1,0,1,0,0]
[0,0,0,1,0,0,1,0,0,0,0,1]

Why is that puzzle special? Later we’re going to see: it’s because it has no solution. My guess would be that the first/last disk (depending on the side of the puzzle you’re looking at) was flipped before it got attached to the other disks. When flipping the first/last disk, we later on see that the puzzle has exactly one solution:

[0,0,0,0,0,1,0,1,0,0,1,0]
[1,0,0,1,0,0,0,0,0,0,0,1]
[0,1,0,0,0,0,0,1,0,1,0,0]
[1,0,0,0,0,1,0,0,1,0,0,0]

Python approach

The Python script implements a breadth first search to explore the complete solution space. During search, the implementation counts the nodes in the search tree it visits. The final amount of nodes can be used to evaluate the implementation, as we know there are exactly 12^3 leave nodes (if we keep one of the disks fixed) and 12^0+12^1+12^2 branching nodes on the way (node counting is only implemented in Python and left out in Prolog). When commenting in the line in the script that sorts the “states list” by the state heuristic, the search becomes a best first search – which is implemented in a way that it effectively is a depth first search with branch cutting. If we do branch cutting we of course visit and count less nodes during search.

# Implementation of the flower disk rotation puzzle solver: 4 disks stacked on top of each other, each disk containing 12 fields (3 of them are solid and 9 are holes). The task is to rotate the disks so that all holes are covered by at least one layer of wood. As the disks contain 12 fields, and the sum of all solid fields on all disks is 12 too, each field will be coverd by a single solid field. This scripts solves for possible rotations so that all fields are covered. Solid disk fields are represented by 1, holes by 0.
# Besides searching for solutions this script counts the nodes visited in the search three for evaluation purposes
# 
# Rainhard Findling
# 2014/10
# 
class State():
    &amp;quot;&amp;quot;&amp;quot;represents a state in the state search tree&amp;quot;&amp;quot;&amp;quot;
    def __init__(self, gamefield, rotations=None):
        # sanity checks: gamefield not empty and all rows are of same length
        if(len(gamefield)==0):
            raise ValueError('gamefield empty.')
        l = len(gamefield[0])
        if(l == 0):
            raise ValueError('row length is 0.')
        for row in gamefield:
            if(len(row) != l):
                raise ValueError('rows differ in length.')
        self.gamefield = gamefield
        self.rotations = rotations
        self.heuristics_cache = None
        if(rotations!=None and (len(rotations) != len(gamefield))):
            raise ValueError('rotations and gamefield differ in length.')
        if(rotations==None):
            rotations = []
            for _ in gamefield:
                rotations+=[None]
            # fix first rotation to one to not get redundant solutions
            rotations[0] = 0
            self.__init__(gamefield, rotations)
        
    def shift(self, seq, n):
        n = n % len(seq)
        return seq[n:] + seq[:n]
        
    def __str__(self):
        s = '\n====================================\n'
        for row in self.gamefield:
            s+=str(row)
            s+='\n'
        s+='rotations='
        s+=str(self.rotations)
        s+='\n'
        rows = self.rows_rotated()
        for row in rows:
            s+=str(row)
            s+='\n'
        s+='\n===================================='
        return s
    
    def rows_rotated(self):
        &amp;quot;&amp;quot;&amp;quot;get rows = gamefield in the rotation caused by this state&amp;quot;&amp;quot;&amp;quot;
        rows = []
        for row_index in range(0, len(self.gamefield)):
            row = self.gamefield[row_index]
            rot = self.rotations[row_index]
            if rot == None:
                rot = 0
            rows+=[self.shift(row, rot)]
        return rows
        
    def __repr__(self):
        return self.__str__()
    
    def successors(self):
        &amp;quot;&amp;quot;&amp;quot;derive all successors of this state by shifting the next row to all possible positions. does not shift the first row, as this would be redundant&amp;quot;&amp;quot;&amp;quot;
        # get next row to shift
        row_index = -1
        for r in range(0, len(self.gamefield)):
            if self.rotations[r] == None:
                row_index = r
                break
        if row_index == -1:
            # no more rows to shift, return empty = no successors
            return []
        # generate all successors
        successors = []
        for rot in range(0,len(gamefield[row_index])):
            # generate new rotation set
            rotations = self.rotations[:]
            rotations[row_index] = rot
            suc = State(self.gamefield, rotations) 
            successors+=[suc]
        return successors
    
    def amount_covered(self, col_nr):
        &amp;quot;&amp;quot;&amp;quot;count how often a certain position is covered by the levels [0,N]. only takes into account levels that have been rotated already.&amp;quot;&amp;quot;&amp;quot;
        amount = 0
        rows = self.rows_rotated()
        for row_nr in range(0, len(rows)):
            # only count if this level has already been rotated
            if self.rotations[row_nr] != None:
                row = rows[row_nr]
                if row[col_nr] == 1:
                    amount += 1
        return amount
    
    def amount_opened_total(self):
        &amp;quot;&amp;quot;&amp;quot;amount of fields still openend with current lock position&amp;quot;&amp;quot;&amp;quot;
        amount = 0
        for col_nr in range(0,len(self.gamefield[0])):
            if self.amount_covered(col_nr) == 0:
                # this col does not contain a stopper
                amount += 1
        return amount
    
    def is_leave(self):
        &amp;quot;&amp;quot;&amp;quot;check if the position of all levels is set, so this is a leave state&amp;quot;&amp;quot;&amp;quot;
        for r in self.rotations:
            if r == None:
                return False
        return True
    
    def is_solved(self):
        &amp;quot;&amp;quot;&amp;quot;check if lock is closed so that all positions are covered&amp;quot;&amp;quot;&amp;quot;
        return self.amount_opened_total()==0
    
    def is_valid(self):
        &amp;quot;&amp;quot;&amp;quot;heuristics: check if a solution makes sense. e.g. if a part is covered by two levels the solution is invalid, no matter how much levels have already been aligned.&amp;quot;&amp;quot;&amp;quot;
        for col_nr in range(0, len(self.gamefield[0])):
            if self.amount_covered(col_nr) &amp;gt; 1:
                # at least one position is covered by multiple layers, solution must be invalid
                return False
        return True
    
    def heuristics(self):
        &amp;quot;&amp;quot;&amp;quot; heuristics of how good this solution is [-1,1], with the lowest value being the worst solution and the highest value being the best solution. neutral value = starting value = 0. value is cached after first calculation for performance reasons.&amp;quot;&amp;quot;&amp;quot;
        if self.heuristics_cache == None:
            # calculate and cache
            if not self.is_valid():
                # cannot be good as it is invalid
                return -1
            h = 0
            # the deeper the better
            h += (self.level() / len(self.gamefield))
            # cache and return
            self.heuristics_cache = h
        return self.heuristics_cache
    
    def level(self):
        &amp;quot;&amp;quot;&amp;quot;return the level a state is on. if no level is rotated yet, the level is 0, if 1 level is rotated, the level is 1. if all levels are rotated, the level is the amount of levels in total.&amp;quot;&amp;quot;&amp;quot;
        for row_nr in range(0, len(self.gamefield)):
            if self.rotations[row_nr] == None:
                return row_nr
        return len(self.gamefield)

## original gamefield : no solution
#gamefield = [[0,0,0,0,0,1,0,1,0,0,1,0],
             #[1,0,0,1,0,0,0,0,0,0,0,1],
             #[0,1,0,0,0,0,0,1,0,1,0,0],
             #[0,0,0,1,0,0,1,0,0,0,0,1]] 
# modified gamefield: has one solution
gamefield = [[0,0,0,0,0,1,0,1,0,0,1,0],
             [1,0,0,1,0,0,0,0,0,0,0,1],
             [0,1,0,0,0,0,0,1,0,1,0,0],
             [1,0,0,0,0,1,0,0,1,0,0,0]] 
# measure some stuff
histogram_amount = [0] * (len(gamefield[0]) + 1)
histogram_rotation = []
histogram_tried = [0] * (len(gamefield) + 1) 
for row in gamefield:
    histogram_rotation += [[0]*len(row)]
# do best first search
init_state = State(gamefield)
states = [init_state]
while len(states) &amp;gt; 0:
#     states.sort(key=lambda s:s.heuristics(), reverse=True) # sort states by heuristics. comment out to perform a breadth first instead of a best first search
    # process best solution
    s = states.pop(0)
    histogram_tried[s.level()] += 1
    # do measurements for leave nodes
    if s.is_leave():
        histogram_amount[s.amount_opened_total()] += 1
        rot = s.rotations
        for r_index in range(0, len(rot)):
            r = rot[r_index]
            histogram_rotation[r_index][r] += 1
    # check if this is a leave state - and if, if it solved the problem
    if s.is_leave() and s.is_solved():
        print 'found solution:'
        print s
#         break
    successors = s.successors()
    for suc in successors:
#         if suc.is_valid(): # only produce children of valid solutions. commend out to process full state tree and see statistics for all solutions
        states += [suc]
print 'done, histogram (tried solutions per level):', histogram_tried, ', total=', sum(histogram_tried)
print 'histogram (amount opened): ', histogram_amount
print 'histogram (rotations per row)', histogram_rotation

When we run the Python implementation with our initial disk setup we don’t find a solution:

done, histogram (tried solutions per level): [0, 1, 12, 144, 1728] , total= 1885
histogram (amount opened):  [0, 16, 126, 515, 672, 336, 61, 2, 0, 0, 0, 0, 0]
histogram (rotations per row) [[1728, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144], [144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144], [144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144]]

… but when we use the modified disk setup we find one solution:

found solution:

====================================
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0]
[1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1]
[0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0]
[1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0]
rotations=[0, 0, 5, 11]
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0]
[1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1]
[0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0]
[0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0]

====================================
done, histogram (tried solutions per level): [0, 1, 12, 144, 1728] , total= 1885
histogram (amount opened):  [1, 9, 146, 485, 697, 325, 63, 2, 0, 0, 0, 0, 0]
histogram (rotations per row) [[1728, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144], [144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144], [144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144]]

Prolog approach

With Prolog, we state what a valid solution – given 4 disks – needs to look like: a) the solution’s first disk will be equal to the initial first disk, all other of the solution’s disks will be rotated versions of the initial disks. b) each hole must be covered by one of the solution’s stacked 4 disks.

% Implementation of the flower disk rotation puzzle solver: 4 disks stacked on top of each other, each disk containing 12 fields (3 of them are solid and 9 are holes). The task is to rotate the disks so that all holes are covered by at least one layer of wood. As the disks contain 12 fields, and the sum of all solid fields on all disks is 12 too, each field will be coverd by a single solid field. This scripts solves for possible rotations so that all fields are covered. Solid disk fields are represented by 1, holes by 0.

% Rainhard Findling
% 10/2014
%
% sample call with no solution:
% solve([0,0,0,0,0,1,0,1,0,0,1,0],[1,0,0,1,0,0,0,0,0,0,0,1],[0,1,0,0,0,0,0,1,0,1,0,0],[0,0,0,1,0,0,1,0,0,0,0,1],R1,R2,R3,R4).
%
% sample call with one solution:
% solve([0,0,0,0,0,1,0,1,0,0,1,0],[1,0,0,1,0,0,0,0,0,0,0,1],[0,1,0,0,0,0,0,1,0,1,0,0],[1,0,0,0,0,1,0,0,1,0,0,0],R1,R2,R3,R4).
% 
solve(L1,L2,L3,L4,R1,R2,R3,R4) :-
  % solution needs to be a rotated version of the original levels (we're not interested in amount of rotation)
  L1=R1,
  rotate(L2,R2,_),
  rotate(L3,R3,_),
  rotate(L4,R4,_),
  % each field needs to be covered by at least one solid field of 1 of the 4 disks (exactly one &amp;quot;1&amp;quot; in our case)
  solid(R1,R2,R3,R4).

% check that each field is coverd by at least one solid part of a disk (exactly one in our case)
solid([],[],[],[]).
solid(H1,H2,H3,H4) :- count([H1,H2,H3,H4],1,1).
solid([H1|D1],[H2|D2],[H3|D3],[H4|D4]) :- solid(H1,H2,H3,H4),
					  solid(D1,D2,D3,D4).

% rotate(+List, +N, -RotatedList)
% True when RotatedList is List rotated N positions to the right
rotate(List, RotatedList, N) :-
    length(List, Length),      % length of list
    append(Front, Back, List), % split L
    length(Back, N),	       % create a list of variables of length N
    Length &amp;gt; N,		       % ensure we don't rotate more than necessary
    append(Back, Front, RotatedList).

% count how often an element occurs in a list
count([],_,0).
count([X|T],X,Y):- count(T,X,Z), Y is 1+Z.
count([X1|T],X,Z):- X1\=X,count(T,X,Z).

As for the Python implementation, the Prolog implementation cannot find a solution to the original disk setup:

?- solve([0,0,0,0,0,1,0,1,0,0,1,0],[1,0,0,1,0,0,0,0,0,0,0,1],[0,1,0,0,0,0,0,1,0,1,0,0],[0,0,0,1,0,0,1,0,0,0,0,1],R1,R2,R3,R4).
false.

… but for the modified disk setup, we find the same single solution as with python:

?- solve([0,0,0,0,0,1,0,1,0,0,1,0],[1,0,0,1,0,0,0,0,0,0,0,1],[0,1,0,0,0,0,0,1,0,1,0,0],[1,0,0,0,0,1,0,0,1,0,0,0],R1,R2,R3,R4).
R1 = [0, 0, 0, 0, 0, 1, 0, 1, 0|...],
R2 = [1, 0, 0, 1, 0, 0, 0, 0, 0|...],
R3 = [0, 0, 1, 0, 1, 0, 0, 0, 1|...],
R4 = [0, 1, 0, 0, 0, 0, 1, 0, 0|...] ;
false.

Conlcusion

As for the last problem, the problem can be solved using both Python and Prolog. And again, Prolog saves us from explicitly implementing the search algorithm, which we need to state explicitly in Python.

Equation puzzle solver: Python vs Prolog

October 25, 2014 2 comments

Recently some of my friends and colleagues asked me: how is¬†the logic-declarative language Prolog (I’ve been playing around with it) comparable – or not comparable – to typical, more well known programming languages? This question made me curious how similar implementations in Prolog typically are to implementations in other, more well known programming languages. Hence, I decided to do some small, comparative AI toy problems (actually search problems) in Python and Prolog. The first is a very simple equation puzzle (this post), the second a flower disk rotation puzzle and the third a Jodici solver.

Python vs Prolog

Python is a well known and widely used programming language (and it’s not only used for scripting). Python supports object oriented as well as aspect oriented programming paradigms – and to some extent functional programming. So Python represents the “typical” programming¬† language, where¬† most programmers will be familiar with it’s concepts.

In contrast to Python, Prolog is a logic and declarative programming language, in which developers state facts and rules to derive new facts by logic inference and a recursive backtracking search. These paradigms are less widespread, therefore typically feel less familiar.

The equation puzzle

equation_puzzleOur equation puzzle (sometimes equation crossword puzzle, although many different forms of such puzzles exist) consists of 3 horizontal and 3 vertical equations. Each equation adds/multiplies 3 numbers between -9 and 99 to a single result number. The equations are arranged so that each such number is used in exactly two of these equations. Not all of these numbers are given at the start: only the top left number and the equation results are known. The now obvious task is to find all numbers for the still unset variables (the free fields marked in blue), so that all equations are correct. We’re going to solve that problem programmatically using search (plugging numbers into variables and checking if equations are still correct) and some heuristics/branch cutting for speedup.

Heuristics: plug into equation with least free variables

It’s easier to plug numbers into equations with less free variables (e.g. the top horizontal or leftmost vertical at the start, as they’re the only equations with two instead of three free variables) and checking affected equations for correctness immediately afterwards than to plug numbers into equations which more free variables. Why does this help? Because of branch cutting in our search tree: the sooner we find out that certain numbers won’t allow for a valid equation X correct, the sooner we can stop trying solutions that would have these numbers in equation X. Therefore we look at less solutions during search, which speeds up the search process.

Heuristics: plug into multiplication before addition

Given addition and multiplication as mathematical operations in the equations, it’s easier to first solve equations containing multiplications. Why does this help? Again to cut down the branching factor of the search tree. Equations involving multiplications have less valid solutions than equations only involving additions (given our number range). Again, the sooner we skip certain parts of the search tree, the faster the total search will become.

Given these two heuristics we’re going to use the following order for plugging numbers into variables and checking for validity of corresponding equations (for both the Python and the Prolog implementation):

  1. Plug number into variable: row 1, column 2
  2. Plug number into variable: row 1, column 3
  3. Evaluate equation: row 1
  4. Plug number into variable: row 2, column 3
  5. Plug number into variable: row 3, column 3
  6. Evaluate equation: column 3
  7. Plug number into variable: row 2, column 1
  8. Plug number into variable: row 2, column 2
  9. Evaluate equation: row 2
  10. Plug number into variable: row 3, column 1
  11. Evaluate equation: column 1
  12. Plug number into variable: row 3, column 2
  13. Evaluate equation: row 3
  14. Evaluate equation: column 2

The Python approach

We define a) the order of variables to plug numbers into and equations to evaluate and b) the starting condition (top left number, equations and their results). We then implement a search ourselves: in our case a depth first search. For fully searching the search tree, a breadth first search would do as well in our case – but assuming we only search for one solution if multiple solutions are possible, using a depth first search is the better idea (note that we’re actually using an informed search by hardcoding the order of variables/equations, which could be automated as well – then more informed algorithms like A* would do the job).

#!/usr/bin/python
#
# python demo: solving an arithmetic game with [-9,99] being allowed as unknowns in equations and 6 equations in total.
# Rainhard Findling
# 2014/10
# call using e.g.
#
# python arithmetic_game.py
#
rules = {
	 # order of when to fill variables and evaluate equations
	 'order': [(12,[]), (13,['row1']), (23, []), (33, ['col3']), (21, []), (22, ['row2']), (31, ['col1']), (32, ['col2','row3'])],
	 # known top left variable
	 'state' : {11:26},
	 # equations and their results
	 'row1' : lambda s:s[11] - s[12] * s[13] == -278,
	 'row2' : lambda s:s[21] * s[22] + s[23] == 216,
	 'row3' : lambda s:s[31] * s[32] + s[33] == 11,
	 'col1' : lambda s:s[11] + s[21] - s[31] == 36,
	 'col2' : lambda s:s[12] + s[22] + s[32] == 27,
	 'col3' : lambda s:s[13] * s[23] - s[33] == 245}

def state_str(s):
	"""graphical representation of solution."""
	st =   '----------------\n'
	st += '|%4d|%4d|%4d|' % (s[11],s[12],s[13])
	st += '\n----------------\n'
	st += '|%4d|%4d|%4d|' % (s[21],s[22],s[23])
	st += '\n----------------\n'
	st += '|%4d|%4d|%4d|' % (s[31],s[32],s[33])
	st += '\n----------------'
	return st

# do search
state_init = rules['state']
list_open = [state_init]
while len(list_open) > 0:
	state = list_open.pop(len(list_open) - 1) # depth first search
	#print 'expanding', str(state)
	# generate successors in given number range
	for nr in range(-9,99)[::-1]: # start with bigger numbers - will more likely cut branches with multiplications
		s = state.copy()
		for tup in rules['order']:
			if not tup[0] in s:
				# slot is still free, set nr there
				s[tup[0]] = nr
				valid = True
				for rule in tup[1]:
					if not rules[rule](s):
						valid = False
						break
				if not valid:
					break
				if len(s) == 9:
					# solved!
					print 'solution:\n', state_str(s)
				else:
					list_open += [s]
				break
print 'done.'

The Python script finds the (single possible) solution:

solution:
----------------
|  26|  19|  16|
----------------
|  25|   8|  16|
----------------
|  15|   0|  11|
----------------
done.

The Prolog approach

In contrast to the Python approach, Prolog uses a built-in recursive backtracking search internally, therefore saves us from implementing the search algorithm ourselves. This of course makes the script notably shorter. Anyway, the equations with their results and the variable known at the start has to be defined. As for the Python script, in Prolog the order of rules plays a major role (it’s how we do branch cutting in our case). Hence, we use the same ordering as for the Python script.

% prolog demo: solving an arithmetic game with [-9,99] being allowed as unknowns in equations and 6 equations in total.
% Rainhard Findling
% 2014/10
% call using e.g.
%
% ['my_database.pl'].
% gamefield(26,X12,X13,X21,X22,X23,X31,X32,X33,-278,216,11,36,27,245).
%
nr(X) :- between(-9,99,X).
gamefield(X11,X12,X13,X21,X22,X23,X31,X32,X33,R1,R2,R3,C1,C2,C3) :-
	nr(X12),
	nr(X13),
	R1 is X11 - X12 * X13,
	nr(X23),
	nr(X33),
	C3 is X13 * X23 - X33,
	nr(X21),
	nr(X22),
	R2 is X21 * X22 + X23,
	nr(X31),
	C1 is X11 + X21 - X31,
	nr(X32),
	R3 is X31 * X32 + X33,
	C2 is X12 + X22 + X32.

As for the Python script, the Prolog script also finds the single possible solution. The variable numbers refer to the grid indexes, e.g. X12 refers to the free variable in row 1, column 2.

?- gamefield(26,X12,X13,X21,X22,X23,X31,X32,X33,-278,216,11,36,27,245).
X12 = 19,
X13 = X23, X23 = 16,
X21 = 25,
X22 = 8,
X31 = 15,
X32 = 0,
X33 = 11 ;
false.

Conclusion

Besides using different programming paradigms, the main conceptual difference between implementations is that with the Python implementation the search is explicitly stated in code, while with the Prolog implementation search is a built-in feature. Search order (which action to do next) matters for both implementations. With both Python and Prolog, the order is stated explicitly (with Python as a list of actions, with Prolog as order or rules to evaluate) – which cause the same behaviour in the end.

convertconditional: convert an image if it fulfills certain conditions

August 28, 2013 Leave a comment

Recently I needed a script to batch convert only those images amongst a large amount of images which fulfil certain criteria, namely of being exactly of a stated size. The script is based on ImageMagick’s convert and basically takes an arbitrary amount of convert parameters. I personally use this script to automatically reduce size and quality of photos taken with a specific camera in order to reduce their hard disk space coverage.

The script

#! /usr/bin/python
#
# convertconditional: Convert an image if it fulfills certain conditions, e.g. is of a certain size. Requires ImageMagick's convert.
# Rainhard Findling 2013/08
#
# example to convert all *JPG within the current directory with are of certain size to a reduced size and quality:
# find . -iname "*JPG" -exec ./convertconditional {} -filtersize 3888x2592 -convertparams "-resize 3500x3000 +repage -quality 80" \;
#
import argparse # check http://docs.python.org/2/howto/argparse.html for help
import subprocess
import sys
# specify arguments
parser = argparse.ArgumentParser()
parser.add_argument('inputfile', help='path to the file that will be converted.')
parser.add_argument('-convertparams', required=True, help='parameters handed to the convert command used internally, e.g. resize, repage, reduce quality etc. Example: "-resize 300x200 +replage -quality 92"')
parser.add_argument('-filtersize', help='only convert if original image is of this size, stated as WIDTHxHEIGHT, e.g. 3500x3200.')
parser.add_argument('-o', '--outputfile', help='path to where the converted image will be stored. if not specified, the original file will be overwritten.')
parser.add_argument('-v','--verbose',help='print verbose output.',action='store_true')
args = parser.parse_args()
# make sure we can process names with spaces
args.inputfile = '"' + args.inputfile + '"'
# check for correct arguments
if not args.outputfile:
args.outputfile = args.inputfile
if args.filtersize:
filter_x = int(args.filtersize.split('x')[0])
filter_y = int(args.filtersize.split('x')[1])
if args.verbose:
print 'inputfile=' + args.inputfile
print 'outputfile=' + args.outputfile
if args.filtersize:
print 'resizing only', str(filter_x) + 'x' + str(filter_y), 'images.'
print 'convertparams=' + args.convertparams
# get size of image
imagesize = subprocess.check_output(["identify -format '%wx%h' '" + args.inputfile + "'"],
stderr=subprocess.STDOUT,
shell=True)
imagesize_x = int(imagesize.split('x')[0])
imagesize_y = int(imagesize.split('x')[1])
# condition: filter for images of certain size
if args.filtersize:
if args.verbose:
print 'size of', args.inputfile, 'is', str(imagesize_x) + "x" + str(imagesize_y)
# check filter criteria
if not imagesize_x == filter_x or not imagesize_y == filter_y:
print 'leaving out ' + args.inputfile + ' as it is of size ' +  str(imagesize_x) + "x" + str(imagesize_y) + " (required: " + args.filtersize + ")"
sys.exit(0)
# passed all conditions: convert image
print 'converting ' + args.inputfile + ' (size: ' +  str(imagesize_x) + "x" + str(imagesize_y) + ')'
# convert image
command="convert '" + args.inputfile + "' " + args.convertparams + " '" + args.outputfile + "'"
if args.verbose:
print 'command:', command
imagesize = subprocess.check_output([command],
stderr=subprocess.STDOUT,
shell=True)

The script is written in Python, so all you need to do is save it (e.g. in a file called “convertconditional”) and make it executable:

chmod +x convertconditional

Then you can either call it with stating it’s path (e.g. “./convertconditional [parameters]”), or add it to your systems PATH to call it from everyhwere.

Script execution

In order to convert input.jpg to output.jpg, you can try

convertconditional input.jpg -filtersize 3888x2592 -convertparams "-resize 3500x3000 +repage -quality 85" -o output.jpg
  • -filtersize optional filtering: only convert the image¬† in case it is exactly of the stated size
  • -convertparams states parameters which should be handed to ImageMagick’s convert
  • -o states where to store the converted image (original image gets overwritten if omitted)

In case you want to conditionally convert multiple files (as with my usecase) you can combine convertconditional with find and overwrite the original files:

find . -iname "*JPG" -exec convertconditional {} -filtersize 3888x2592 -convertparams "-resize 3800x +repage -quality 85" \;