The Set Covering Machine

Marchand, Mario; Shawe-Taylor, John; Brodley, Carla E.; Danyluk, Andrea
May 2003
Journal of Machine Learning Research;5/15/2003, Vol. 3 Issue 4/5, p723
Academic Journal
We extend the classical algorithms of Valiant and Haussler for learning compact conjunctions and disjunctions of Boolean attributes to allow features that are constructed from the data and to allow a trade-off between accuracy and complexity. The result is a general-purpose learning machine, suitable for practical learning tasks, that we call the set covering machine. We present a version of the set covering machine that uses data-dependent balls for its set of features and compare its performance with the support vector machine. By extending a technique pioneered by Littlestone and Warmuth, we bound its generalization error as a function of the amount of data compression it achieves during training. In experiments with real-world learning tasks, the bound is shown to be extremely tight and to provide an effective guide for model selection.


Related Articles

  • OPTIMIZATION OF LZW (LEMPEL-ZIV-WELCH) ALGORITHM TO REDUCE TIME COMPLEXITY FOR DICTIONARY CREATION IN ENCODING AND DECODING. Nishad, P. M.; Chezian, R. Manicka // Asian Journal of Computer Science & Information Technology;May2012, Vol. 2 Issue 5, p114 

    LZW (Lempel-Ziv-Welch) is a universal lossless data compression algorithm, which takes linear time in encoding and decoding. This paper discusses a methodology to reduce time complexity by combining binary search with LZW. The existing LZW compression algorithm takes large time for dictionary...

  • A Unique Perspective on Data Coding and Decoding. Wen-Yan Wang // Entropy;Jan2011, Vol. 13 Issue 1, p53 

    The concept of a loss-less data compression coding method is proposed, and a detailed description of each of its steps follows. Using the Calgary Corpus and Wikipedia data as the experimental samples and compared with existing algorithms, like PAQ or PPMstr, the new coding method could not only...

  • On Online Learning of Decision Lists. Nevo, Ziv; El-Yaniv, Ran // Journal of Machine Learning Research;Spring2003, Vol. 3 Issue 2, p271 

    A fundamental open problem in computational learning theory is whether there is an attribute efficient learning algorithm for the concept class of decision lists (Rivest, 1987; Blum, 1996). We consider a weaker problem, where the concept class is restricted to decision lists with D alternations....

  • A Random Linear Coding Algorithm for Cognitive Wireless Mesh Networks. Jin Qi; Shunyi Zhang; Shujing Li; Lu Cao // Journal of Convergence Information Technology;Apr2012, Vol. 7 Issue 6, p112 

    In viewing of the problem that network coding can improve network throughput as well as increase network complexity, this paper proposed a special cognitive wireless mesh network topology while constructed a novel coding algorithm. In the multicast model based on wireless channel, the algorithm...

  • Data distribution schemes of sparse arrays on distributed memory multicomputers. Lin, Chun-Yuan; Chung, Yeh-Ching // Journal of Supercomputing;Jul2007, Vol. 41 Issue 1, p63 

    A data distribution scheme of sparse arrays on a distributed memory multicomputer, in general, is composed of three phases, data partition, data distribution, and data compression. To implement the data distribution scheme, many methods proposed in the literature first perform the data partition...

  • Compression and Filtering of Random Signals under Constraint of Variable Memory. Torokhti, Anatoli; Miklavcic, Stan // Proceedings of World Academy of Science: Engineering & Technolog;Jun2008, Vol. 42, p768 

    We study a new technique for optimal data compression subject to conditions of causality and different types of memory. The technique is based on the assumption that some information about compressed data can be obtained from a solution of the associated problem without constraints of causality...

  • Simple Data Compression by Differential Analysis using Bit Reduction and Number System Theory. Chakraborty, Debashish; Bera, Sandipan; Gupta, Anil Kumar; Mondal, Soujit // International Journal on Information Technology;Dec2011, Vol. 1 Issue 3, p16 

    This is a simple algorithm which is based on number theory system and file differential technique. It employs a technique which unlike other is independent of repetition and frequency of character.In the algorithm original file is broken into some small files using differential techniques and...

  • Quantum Circuit Simulation Using the Density Matrix Renormalization Group. Kawaguchi, A.; Shimizu, K.; Tokura, Y.; Imoto, N. // AIP Conference Proceedings;2004, Vol. 734 Issue 1, p187 

    Simulating large quantum circuits with a classical computer requires enormous computational power, because the number of quantum states increases exponentially as a function of number of qubits. The density matrix renormalization group (DMRG) was introduced by White to study the properties of...

  • Data Compression in Scientific and Statistical Databases. Bassiouni, M. A. // IEEE Transactions on Software Engineering;Oct85, Vol. 11 Issue 10, p1047 

    Scientific and statistical database systems heavily depend on data compression techniques to make possible the management and storage of their large databases. The efficiency of data compression methods has a significant impact on the overall performance of these systems. The purpose of this...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics