Posts
Paper Review: Hoard: A Scalable Memory Allocator for Multithreaded Applications
Introduces 'Hoard' memory allocator design which focuses on tackling false sharing on shared cache lines and blowup to improve scalability, memory efficiency and speed. Hoard's key feature is that it uses a global heap and a processor-local heap.
Paper Review: Multicore Desktop Programming with Intel Threading Building Blocks
This article talks about Intel's Threading Building Blocks (TBB) which is C++ template library which is based on a task-based, modular programming model that uses a work-stealing algorithm for developing composable execution environments for desktop applications.
Paper Review: Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Introduced a heterogenous, single instruction set architecture with by utilizing cores with different properties, with the goal of increasing energy efficiency of chips. Results generally show major improvements in energy consumption with only minor performance sacrifices.
Paper Review: Reducers and other Cilk++ hyper-objects
Introduces hyper-objects as a mechanism to improve parallel performance. Results show improvements over locking and standard C++.
Paper Review: The Implementation of the Cilk-5 Multithreaded Language
Presents the language redesigns and runtime systems modifications introduced in Cilk-5 in order to make it simplify it, as well as to increase efficiency. present a novel 'two clone' compilation strategy when compiling Cilk to C. Results show that the implementation of the work-first principle is...
Paper Review: Foundations of the C++ Concurrency Memory Model
Introduces the C++ memory model which introduces constructs for parallel programming in the C++ language standard. A key takeaway from this paper is the discussion surrounding atomic memory operations at hardware level, its cost, issues in synchronisation as well as differences from software coun...
Paper Review: Java Memory Model
Introduces Java 5.0's new Memory Model with explicit rules for synchronized programs and guarantees against out-of-thin-air values. Presents both a simplified 'Happens-before' model and formal specifications.
Paper Review: Experimenting with Low-Overhead OpenMP Runtime on BG/Q
Proposes optimizations to OpenMP for better scalability on BlueGene/Q by caching thread group allocations. Achieves up to 4.9x improvement in overhead reduction for 64 threads.
Paper Review: x86-TSO: A Rigorous and Usable Programmer's Model for x86 Multiprocessors
Proposes x86-TSO, a mathematically rigorous yet accessible programmer's memory model for x86 processors. Addresses issues in previous Intel and AMD specifications.
Paper Review: Memory Models: A Case for Rethinking Parallel Languages and Hardware
Examines Sequential Consistency with Data Race Free conditions across hardware and software implementations. Proposes future memory models should leverage both programming languages and hardware.
Paper Review: Cache Coherence Protocols - II
Introduces Victim Replication, a new cache management policy extending shared cache schemes to reduce on-chip communication delays. Shows 16% latency reduction compared to L2 Shared scheme.
Paper Review: Cache Coherence Protocols - I
Provides a comprehensive overview of cache coherence protocols, explaining controller networks and protocol structures. Details the key components: states, transactions, and protocol design options.
Paper Review: Future of Microprocessors
Analyzes past scaling methods (transistor-speed, micro-architecture, cache) to predict future microprocessor trends. Suggests multi-core and custom hardware as key growth drivers.
Paper Review: Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms
Evaluates SpMV optimization methods across multicore architectures. Shows in-order, fine-grained multicores offer better scalability and energy efficiency despite single-core latency costs.
Paper Review: Simultaneous Multithreading: Maximizing On-Chip Parallelism
Studies SMT in superscalar processors to reduce instruction cycle waste and bottlenecks. Shows SMT processors have strong performance potential while managing shared resource contention.
Does Latent Semantic Analysis (LSA) work for conversations
What is Latent Semantic Analysis (LSA)? It works well on structured documents, but will it work for analyzing human conversations?