# Designing a product to maximize visibility.

INTRODUCTIONProduct design based on user input is a problem that has attracted recent research interest. The successful creation and introduction of new products into the market is crucial for a company's growth and survival. The search for new product ideas has become an important issue, and the task is to create or design new products according to customer's preference. The purpose of new product design can varies for different companies or organizations such as maximizing sales, maximizing profit, mass customization, high quality, etc.

The research in this study assumes that a product can be designed by selecting a subset of desirable features (or attributes) to add from a large set of possible features. These attributes could be of any type such as Boolean, text, and numeric. For example, in designing a new cellular phone, the manufacturer may have to decide on features such as ring tone, speed dialing, weight, size, camera, etc. Similar applications extend to cameras, electronic products, cars, homes, restaurants and so on.

In particular, the research focuses on a specific and novel problem formulation where a query log (or workload) of user preferences for features, and possibly a profitability constraint are given, and asked to design a product that satisfies the maximum number of queries in the workload. Thus the primary goal is to satisfy as many customers or buyers as possible, i.e., wish to design a product that is widely visible to the pool of potential buyers--mhence the phrase maximize visibility. Because of the vast use of the Internet nowadays, it is very easy to collect such query logs of user preferences, and the new product can be designed based on real users' perception on desirable product features.

This study specifically considers products with Boolean attributes. For example, a car can have Boolean features such as AC, Auto Trans, Power Brakes, Power Doors, Turbo, etc.; a cell phone can have Boolean features such as Bluetooth, Wi-Fi, Speaker Phone, Camera, Video Streaming, etc., and so on. For a Boolean feature or attribute, the value of the attribute is either 0 (feature is not present) or 1 (the feature is present).

To understand the problem better, consider the following scenario: assume an auto manufacturer wants to build a new car such that it is desirable to as many potential customers as possible. Let us assume that auto manufacturer has access to a user query log, such as previous auto search queries posed by past users on its website. Some users might have searched for cars that must have Power Doors, some perhaps did not want Power Doors as they might be concerned about higher price, and some might have "not cared" whether their cars have Power Doors or not. Assuming that future customers will have the same preference patterns as such past users, the manufacturer would like to design the new car (i.e., select features for the new car) such a way that it can satisfy as many queries as possible in the query log. Moreover, since different users might have expressed different preferences for the same feature (such as some want Power Doors some others do not, while some don't care), it is clear that it is not advantageous for the manufacturer to include all possible features in the new car. Instead the manufacturer can design the new car based on trying to satisfy as many users as possible. In addition to the query log, if the manufacturer also has a profitability constraint, then the new car should be designed in a way that it meets the profitability constraint.

To define the problem more formally, a few abstractions need to be developed. Let D be the database of products already being built and advertised in the marketplace. Let A be the set of all possible Boolean attributes of D. In addition, let Q be the set of search queries that have been executed against this database in the recent past--thus Q is the "workload" or "query log". Each query is a conjunction of preferences for a certain subset of features, e.g., a query such as "[a.sub.3] = 0 AND [a.sub.4] = 1 AND [a.sub.6] = 1" implies that the user did not want attribute [a.sub.3] but wanted attributes [a.sub.4] and [a.sub.6], and did not care about the rest. The query log is our primary model of what past potential buyers have been interested in. Consider a new product p that needs to be designed and inserted into this database. The problem can now be defined as follows:

Given a database D, a query log Q, design a new tuple p (assign value [0, 1] for each attribute in p) such that if p is inserted into the database, the number of queries of Q that retrieve p is maximized.

This research thoroughly investigates this unexplored instance of the product design problem. The main contributions are summarized below:

1. The research investigates the problem of designing a product to maximize visibility that benefits a certain class of manufacturers interested in designing and marketing their products. It considers a specific instance of the problem, which is still unexplored as well as extend to other important variants.

2. The research presents fast approximation algorithms based on heuristics that work well in practice for large problem instances and compare the results with Optimal Naive algorithm.

3. The research performs detailed performance evaluations on synthetic data to demonstrate the effectiveness of the developed algorithms.

FORMAL PROBLEM DEFINITION

This section first develops, with a detailed example, a formal definition of the problem. Some useful definitions and notations are given here.

Database: Let A = ([a.sub.1] ... [a.sub.M]) be the set of attributes of a database of tuples (products), where each tuple p is a bit-vector where a 0 implies the absence of a feature and a 1 implies the presence of a feature.

Query (with negation): The study views each query as a subset of attributes and/or negation of attributes. The retrieval semantics is Conjunctive Boolean Retrieval, e.g., a query such as {[a.sub.1], [a.sub.3]} is equivalent to "return all tuples such that [a.sub.1] = 1 and [a.sub.3] = 1". It also consider queries with negations, e.g., {[a.sub.1], ~[a.sub.2]} is equivalent to "return all tuples such that [a.sub.1] = 1 and [a.sub.2] = 0". The remaining attributes for which values are not mentioned in the query are assumed to be "don't care" attributes, i.e., the value can be either 0 or 1. Note that the negation on attribute conditions complicates the problem and precludes solutions based on frequent itemsets, which was used by Miah, Das, Hristidis, & Mannila (2008) for product advertising applications.

Query Log: Let Q = {[q.sub.1] ... [q.sub.S]} be a collection of queries where each query q defines a subset of attributes with or without negation.

The problem is formally defined as follows:

Product Design (PD) Problem: Given a query log Q with Conjunctive Boolean Retrieval semantics, where a query can have negations, design a new tuple p (assign value [0,1] for each attribute for the new tuple) such that the number of queries that retrieve p is maximized.

Thus, for a product designer/manufacturer, the study wishes to ensure that the new product is visible to as many buyers as possible. To understand the problem better, let us consider the Example 1 (our running example).

Example 1: Consider Table 1 which shows a query log for auto search, containing S=6 queries and M=6 attributes where each tuple represents a query asked by a user to search for a car. The Boolean attributes describe features of the car, such as AC, Four Door, etc. Each query represents a user's preferences (either positive or negative) for a certain subset of attributes. Thus a query can be modeled as a row in the query log table with values 1, 0, or ?, where 1 means the attribute must be present, 0 means the attribute must not be present, and "?" means "don't care". For this specific example, it is not hard to see that if a new car is designed with AC = 1, Auto Trans = 0, Four Door = 0, Power Brakes = 1, Power Doors = 1, Turbo = 0 (i.e., new tuple p = [1, 0, 0, 1, 1, 0]), it can satisfy a maximum of 3 queries (q2, q4 and q6). No other selection of attribute values for the new tuple will satisfy more queries.

ALGORITHMS

Optimal Naive Algorithm

It is easy to see that a Naive brute-force optimal approach to design a new tuple p can be designed as follows:

(1) For each of the possible [2.sup.M] combination of attribute values, find the number of queries in Q that will be satisfied by this assignment.

(2) Return the assignment (i.e., tuple) with the highest count.

Clearly, while the naive algorithm is polynomial in S, it is unfortunately exponential in M. Thus it is not feasible when the number of attributes is large since the algorithm has to generate an exponential number of possible combinations of attribute values.

The rest of this section discusses the approximation algorithms which are much more efficient than the Naive algorithm, and work very well for large problem instances as well.

Approximation Algorithm: Forward Selection (FS)

Forward selection heuristic starts with an empty set of attributes, A and adds one attribute with corresponding assigned value (0 or 1) at a time until A has M attributes. Given any ordering of the variables, the forward selection greedy heuristic sequentially selects an assignment for each variable to satisfy the highest number of additional clauses (clauses in Q in PD). Figure 1 displays the pseudocode of the algorithm.

Figure 1 Approximation Algorithm: Forward Selection (FS) Approx Algorithm: FS Let Q be the query log, A ([a.sub.1] ... [a.sub.M]) be the attributes in Q For (int i = 1 to M) If Q not empty Count # of queries satisfied both for [a.sub.i] = 1 and [a.sub.i] = 0. Assign the value of [a.sup.i] that gives the maximum counts Remove queries from Q satisfied by the value of [a.sub.i] Return the attributes assignment

The algorithm FS described above assigning attribute values for the new product or tuple until all queries in the query log are satisfied. Here a threshold of how many queries or what percentage of queries in the query log the new product need to be satisfied can also be given; and the algorithm can stop once it reaches that threshold.

Approximation Algorithm: Backward Elimination (BE)

The research proposes a backward elimination greedy heuristic where all single attributes are considered first to have value as 1 that means considering all features are present in the new product and removing one at a time. In this case the algorithm actually do not remove an attribute, instead it checks which value (either 0 or 1) is better for an attribute and assign correct value (0 or 1) one at a time. The summary of the approach is shown in Figure 2 below.

Figure 2 Approximation Algorithm: Backward Elimination (BE) Approx Algorithm: BE Let Q be the query log, A ([a.sub.1] ... [a.sub.M]) be the attributes in Q For (int i = 1 to M) Assign [a.sub.i] = 1 For (int i = 1 to M) If Q not empty Let num_current = # of queries satisfied with current attributes assignment //with [a.sub.i] = 1 Let num_new = # of queries satisfied if value of at is changed as [a.sub.i] = 0. If num_new > numcurrent Assign [a.sub.i] = 0 //else leave i = 1 Remove queries from Q satisfied by the value of at Return the attributes assignment

Like algorithm FS, the algorithm BE described above assigning attribute values for the new product or tuple until all queries in the query log are satisfied. Here also a threshold of how many queries or what percentage of queries in the query log the new product need to be satisfied can be given; and the algorithm can stop once it reaches that threshold.

Approximation Algorithm: Combination of Forward Selection and Backward Elimination (FSBE)

Now the study proposes another heuristic which combines both the algorithms FS and BE considering bidirectional hill climbing techniques. Hill climbing is one of the best known procedures for sequential attribute selection (Russel & Norvig, 2003). Greedy algorithms such as FS and BE implement so called unidirectional hill climbing, i.e., attributes once added (removed) cannot be later deleted (added). The advantage of bidirectional hill climbing compared to either FS or BE is that one or several previously deleted (added) attributes can be brought back to (removed from) the subset if the accuracy of the algorithm increases. But this technique can be time consuming as both the FS and BE has to perform completely and then somehow combine the results. So, the study proposes a new technique to improve the performance of the algorithm, described as follows:

Once an attribute is removed by BE it is not considered to be added by FS anymore. Similarly, once an attribute is added by FS it is not considered to be removed by BE. So, at every step BE eliminates one attribute and FS adds one. The procedure is repeated until all the queries satisfied in the query log or the threshold (number of queries or percentage of queries need to be satisfied) is met.

Figure 3 Approximation Algorithm: FSBE Approx Algorithm: FSBE Let [A.sub.o] = set of all original attributes present in query log Let A = an empty set For (i = 1 to M) Perform BE on [A.sub.o]//every step assign correct value for one attribute from [A.sub.o] Perform FS on [A.sub.o]//every step adds one attribute to A Remove the attribute from [A.sub.o] which is added by FS to A in previous step End Return A

If there is also a profitability constraint present that means a product designer or manufacturer wants to design a new product such a way that it satisfy as many customers as possible (satisfy as many queries as possible in the query log) and at the same time wants to make a certain profit. The approximation algorithms discussed above can be easily modified to meet the profitability constraints. At each step when the algorithms check the number of satisfied queries in the query log, it also has to check whether the current assignment of attribute values also meet the constraints. The algorithms assign a value to an attribute once it meets both the conditions.

EXPERIMENTS

This section describes the experimental setting and the results. The main performance indicators are (a) the time cost of the proposed approximation algorithms, and (b) the approximation quality of the approximation algorithms.

System Configuration: This study used Microsoft SQL Server 2000 RDBMS on an Intel Core2 Duo 2.93-GHZ PC with 3 GB of RAM and 160 GB HDD for the experiments. Algorithms are implemented in C#, and connected to RDBMS through ADO.

Datasets: The study used synthetic data. In specific, it used synthetic query log based on uniform distribution.

Synthetic query log based on uniform distribution: The study generates synthetic dataset for cars based on uniform distribution. There are 32 Boolean attributes, such as AC, Power Locks, etc. for a car. In the synthetic query log, each query specifies 1 to 5 positive attributes (users like to have these attributes on the cars they search) chosen randomly distributed as follows: 1 attribute--20%, 2 attributes--20%, 3 attributes--20%, 4 attributes--20%, 5 attributes --20%. In addition, each query also specifies 1 negative attribute (users do not like to have this attribute on the cars they search) chosen randomly from rest of the attributes. The study assumes that most of the users specify small number of attributes. It generates a small dataset of 200 queries with 10 Boolean attributes to measure the quality of approximation algorithms comparing with the Naive approach as the naive approach is not feasible for large datasets. It also generates large datasets for varying number of queries with 32 Boolean attributes to measure the time performance of the approximation algorithms.

[FIGURE 4 OMITTED]

[FIGURE 5 OMITTED]

[FIGURE 6 OMITTED]

Quality

Figure 4 shows the quality of the approximation algorithms for a small query log of 200 queries with 10 Boolean attributes. As described in Section 3, the Naive algorithm has to generate exponential number of attributes sets based on the number of attributes, the algorithm is not feasible for larger data set. So the study measures the quality performance of the approximation algorithms on the smaller data set with 200 queries and 10 Boolean attributes. As Figure 4 shows that the approximation algorithms provide good approximation results and the algorithm FSBE provides the best quality results among the proposed three approximation algorithms.

Time performance

Figure 5 shows the performance of the algorithms for the small dataset of 200 queries. As graph shows, approximation algorithms are much more efficient than optimal Naive algorithm.

Figure 6 shows the performance of the approximation algorithms for large datasets with varying query log sizes and 32 Boolean attributes. As graphs show, the approximation algorithm FS performs best in terms of time measurement followed by the algorithm FSBE. The algorithm BE is slower than the other two algorithms as expected.

RELATED WORK

Product design for skyline query semantics

Recent works on dominant relationship analysis (Li, Ooi, Tung, & Wang, 2006) and dominating neighborhood profitably (Li, Tung, Jin, & Ester, 2007) assume that attributes are min/max, that is, all users have the same preference for an attribute (e.g., 2 doors is always better than 4 doors). They use skyline query semantics, which is appropriate given this assumption. Further, they assume there is a profitability plane which simplifies the algorithm given that the optimal solution is a point on the profitability plane. In contrast, in this research users may have opposite preferences for the same attribute, and the algorithms can be used with or without a profitability plane. Li, Tung, Jin, & Ester (2007) also consider spatial, non-preference attributes. The algorithms proposed in this study can be modified to support skyline semantics; however, more efficient algorithms may be possible for this problem variant given its restrictive nature.

Product design in Operations Research and Marketing

Optimal product design or positioning is a very well studied problem in Operations Research and Marketing. The origins of the analytical research on product positioning are traced back to Shocker & Srinivasan (1974) who first represented products and consumer preferences as points in a joint attribute space. After that, several approaches and algorithms such as (Albers & Brockhoff, 1977 & 1980, Albritton & McMullen, 2007, Gavish, Horsky, & Srikanth, 1983, Gruca & Klemz, 2003, Kohli, & Krishnamurti, 1989, Shocker & Srinivasan, 1974, Zufryden, 1979) have been developed to design or position a new product. Works in this domain require direct involvement of users or consumers. Some works like (Albers & Brockhoff, 1977 & 1980, Gruca, & Klemz, 2003, Shocker & Srinivasan, 1974, Zufryden, 1979) require one step user involvement where users are shown a set of existing alternative products (predesigned) to choose or set preferences, and then manufacturer analyzes the user preferences to locate the optimal position for the new product. In this case, users in fact do not select the attributes or features they like and do not like as in our work.

Others like (Gavish, Horsky, & Srikanth, 1983) require the consumer and the producer to be involved in a two-stage decision process where the consumer first decides on his budget for the product class. He then evaluates, within the product class, the subset of competing objects which have prices approximately equal to his budget constraint. The producer is assumed to first identify, in the attribute space which contains ideal points and competing objects, promising product positions which would attract a large number of consumers. He then evaluates these positions in terms of costs and resulting profits. Involving users directly in the product designing process is a difficult task and all the works mentioned above usually involve a small number of users that might not lead to an optimal design of the new product which a large number of users can do. In this work, users are not involved directly in the process of designing of the new product. It uses previous user search queries for the same product and it is easy to collect the preferences (search queries) for large number of users nowadays. To best of the author's knowledge none of the works mentioned above considers a very large query log to design the new product. Also, as mentioned earlier, in the proposed model, the study allow users to express their preferences in attribute or feature level in terms of positive and negative interest as well as "don't care".

Maximizing visibility of existing object

Miah, Das, Hristidis, & Mannila (2008) tackled a related problem of maximizing the visibility of an existing object by selecting a subset of its attributes to be advertised. The main problem was: given a query log with conjunctive query semantics and a new tuple, select a subset of attributes to retain for the new tuple so that it will be retrieved by the maximum number of queries. They did not consider negated conditions. The current research considers designing a product (a new tuple), that is, assign values for all attributes instead of selecting subset of attributes. Also, a query may have negations.

CONCLUSIONS

In this work the author investigated the problem of designing a product, such that, given a query log, this product will be returned by the maximum number of queries in the query log. The study presents approximation algorithms, which are experimentally shown to produce good approximation ratios for large databases. A future direction is to extend the problem to other data types, such as categorical, text and numeric and different query semantics like top-k and skyline retrieval.

REFERENCES

Albers, S., & Brockhoff, K. (July 1977). A procedure for new product positioning in an attribute space, European Journal of Operational Research, 1 (4): 230-238.

Albers, S., & Brockhoff, K. (1980). Optimal Product Attributes in Single Choice Models. Journal of the Operational Research Society, 31, 647-655.

David M. Albritton, D. M., & McMullen, P. R. (January 2007). Optimal product design using a colony of virtual ants. European Journal of Operational Research, 176(1): 498-520.

Gavish, B., Horsky, D., & Srikanth, K. (November 1983). An Approach to the Optimal Positioning of a New Product. Management Science, 29 (11): 1277-1297.

Gruca, T. S., & Klemz, B. R. (May 2003). Optimal new product positioning: A genetic algorithm approach. European Journal of Operational Research, 146 (3): 621-633.

Kohli, R., & Krishnamurti, R. (May 1989). Optimal product design using conjoint analysis: Computational complexity and algorithms. European Journal of Operational Research, 40 (2): 186-195.

Li, C., Tung, A. K. H., Jin, W., & Ester, M. (2007). On Dominating Your Neighborhood Profitably", VLDB, 818-829.

Li, C., Ooi, B. C., Tung, A. K. H, & Wang, S. (2006). DADA: a Data Cube for Dominant Relationship Analysis. SIGMOD, 659-670.

Miah, M., Das, G., Hristidis, V., & Mannila, H. (2008). Standing Out in a Crowd: Selecting Attributes for Maximum Visibility, ICDE, 356-365.

Russell, S. J., & Norvig, P. (2003). Artificial Intelligence: A Modern Approach (2nd ed.). Upper Saddle River, New Jersey: Prentice Hall, pp. 111-114, ISBN 0-13-790395-2, http://aima.cs.berkeley.edu/

Shocker, A. D., & Shrinivasan, V. (February 1974). A consumer-based methodology for the identification of new product ideas. Management Science, 20 (6): 921-937.

Zufryden, F. S. (January 1979). ZIPMAP--A Zero-One Integer Programming Model for Market Segmentation and Product Positioning. The Journal of the Operational Research Society, 30 (1): 63-70.

Muhammed Miah

Southern University at New Orleans

Muhammed Miah (Ph.D. in Computer Science) is the Assistant Professor at the Management Information Systems Department, Southern University at New Orleans, Louisiana. Dr. Miah's research interests include databases, data mining, information retrieval, e-commerce and e-business, maximizing visibility, bioinformatics, GIS, etc.[kappa]

Table 1 Query log Q for Running Example Query ID AC Auto Four Power Power Turbo Trans Door Brakes Doors [q.sub.1] 1 0 1 ? ? ? [q.sub.2] 1 ? 0 ? 1 ? [q.sub.3] 0 ? 1 ? 1 ? [q.sub.4] ? 0 0 1 1 ? [q.sub.5] ? 1 ? 0 ? 1 [q.sub.6] 1 0 0 ? ? 0

Printer friendly Cite/link Email Feedback | |

Author: | Miah, Muhammed |
---|---|

Publication: | International Journal of Business, Marketing, and Decision Sciences (IJBMDS) |

Geographic Code: | 1USA |

Date: | Sep 22, 2010 |

Words: | 4048 |

Previous Article: | Market orientation and the resource-based model of above average returns: a framework for community economic development. |

Next Article: | Preface. |

Topics: |