MPI Message Passing

MPI Message Passing

Parallel Computing 2 The Hardware Story Ondej Jakl Institute of Geonics, Academy of Sci. of the CR 1 Motto The history of parallel programming is the history of a few brave programmers trying to get good performance out of whatever the engineers have just inflicted on them! [Wilson 1995, p. 479] 2 Outline of the lecture Classification of parallel hardware SIMD Vector computers

MIMD, multiprocessors and multicomputers + ten History lessons Supercomputers and TOP500 3 Classifications of parallel hardware Flynns classification (1966, 1972?): SISD, SIMD, MISD, MIMD the oldest and most popular nomenclature short for combinations Single/Multiple Instruction Single/Multiple Data e.g.: the SIMD machine, in every instant, performs the same single instruction on multiple data streams Memory arrangement: shared disjoint (distributed) tightly loosely coupled systems System types: sequential/serial computer array processor, vector computer

systolic array multicore computer, (symmetric) multiprocessor, (CC-)UMA, (CC-)NUMA, COMA, DSM multicomputer, massively parallel processor cluster, Beowulf, network of workstations distributed system 4 Some relations shared shared shared shared (virt.) disjoint disjoint disjoint systolic array? multicore computer symmetric multiprocessor, UMA NUMA DSM massively parallel processor

cluster, Beowulf network of workstations distributed system coupled System Type sequential computer vector computer? array processor tightly MISD MIMD Memory shared shared shared/disjoint loosely Flynn SISD

SIMD 5 SISD = sequential computer Sequential (serial) machine: one instruction processes one piece of data at a time One processor with a control unit and one processing (arithmetic, logic) unit interfaced with memory von Neumanns model Pipelining, superscalar processing (several processing units) possible low-level parallelism as long as everything can be regarded as a single processor, the architecture remains SISD Parallel architectures are generalizations of this model Processor Memory Control Unit Processing Unit 6 SIMD = array processor Computer which has many identical interconnected processing elements (PE) under the supervision of a single control unit (CU)

CU transmits the same instruction to all PE PE operate synchronously Each PE works on data from its own memory (on distinct data streams) some systems also provide a shared global memory for communications PE must be allowed to complete its instruction before the next instruction is taken for execution Designed for the data parallel model [next lecture] Not attractive as general-purpose parallel architecture nowadays [next slides] Processing Element Control Unit Memory . . . Processing Element Memory 7

1964 ILLIAC History lesson I: ILLIAC-IV 1964 ARPA contracted University of Illinois to build a parallel supercomputer 1974 experience with research machines 1975 full operation experimental setup just one exemplar built (31 mil. USD) 1982 decommissioned 1984 One central unit (CU) controlled 64 processing elements (PE) a quarter of the planned machine small 1 MB memory (64x16 KB) Actual performance 15 MFLOPS planned 1 GFLOPS 1994 Global bus & 2D mesh between PE

Super fast disk system (500 Mbit/s) compensated for the small memory made I-IV fastest until the mid 1980's at problems with large data processed 2004 Software: low-level Fortran-like CDF 8 ILLIAC-IV more pictures 9 1964 ILLIAC 1974 1984 CM-1 1994

History lesson II: Connection Machine 1981: first description of the CM architecture by Danny Hillis (MIT) 1983: Thinking Machines Corp. DARPA support cornered the market on sex appeal in high-performance computing 1986: CM-1 introduced Richard Feynman played a critical role in developing CM about 80 CM installations 1996 TMC abandoned hardware development 2004 10 CM-1 The CM-1 Connection Machine was designed to provide massive parallelism for the solution of AI problems simulation of intelligence and life

ILLIAC-IV designed primarily for the solution of highly numeric problems importance not so much on the processors themselves, but rather on the nature and mutability of the connections between them 65535 very simple 1-bit processing elements (PE) private 4k memory single bit ADD, AND, OR, MOVE, and SWAP operations to and from memory or one of the single bit flag registers available Every PE was connected to a central/control unit called the microcontroller which issues identical nanoinstructions to all of them to do the work Data Vault disk array 11 CM-1s interconnect Processors connected to form a 256x256 mesh For faster routing between distant processors, clumps of 16 processors were also interconnected by a packet switched network configured as a 12-dimensional hypercube Each processor within the clump is linked to two others in a linear array Extremely fast and flexible

(Thiel94) 12 CM-2, etc. Great fixed-point speed 32 bit integer addition has a peak rate close to 2000 MOPS CM Fortran, LISP* and C* with parallel constructs CM-1 was not very efficient in floating-point calculations important for the commercial success CM-2 (1987), CM-200 faster versions of the same computer architecture memory increased to 64K or 256K per processor one special floating-point accelerator for each 32 1-bit processors added corresponding to the 32-bit width of one floating-point variable 1989: Gordon Bell Prize for absolute performance 6 GFLOPS CM-5 (1993) custom-built processors abandoned standard microprocessors (SPARC MIPS RISC) SIMD principle abandoned appeared in the movie Jurassic Park 13

Decline of the SIMD/array processors No longer considered perspective for general-purpose parallel computers Most problems do not map into the strict data-parallel solution Inefficient parallel execution of (nested) IF-THEN-ELSE or CASE statements Most naturally single-user systems only Entry-level systems too expensive difficult to scale down the price/performance ratio of the necessary high-bandwidth interconnects Built using custom processors, which are not competitive (price, performance) with commodity CPUs Original motivation relatively high cost of control units is no longer valid 14 Vector computer SIMD? Naive conception of a vector computer Adding two real arrays A, B PE: CU:

A[0]+B[0] Memory . . . A[ ]+B[ ] PE: A[N]+B[N] Memory 15 Pipeline processing A single data stream feeds the processor Processor itself provides a multi-stage processing at each stage the data is operated upon by a different part of the computation required for one complex machine code instruction the total time taken to process the data, although constant for one data item, is reduced for a number of data items

Example: Adding two real arrays A[ ], B[ ] Sequential processing ... normalize result .......... ... add numbers .............. ... shift mantissa ............. ... compare exponents ... Pipeline processing A[0]+B[0] A[1]+B[1] A[2]+B[2] A[3]+B[3] 16 Vector computer Built to handle large scientific and engineering calculations Heavily pipelined architecture for efficient operations on vectors and matrices Vector registers: FIFO queues capable of holding ~100 FP values Special instructions for operations on vectors of numbers, e.g.

FP ALUs load a vector register from memory perform operation on elements in the vector registers store data in the vector registers back into memory Vectorization transparent thanks to parallelizing (Fortran) compilers (Morris98) 17 Vector computer SIMD? Vector computers are parallel, in the sense that they execute many instructions at the same time, but each instruction on any piece of data is performed in sequence in respect of the piece of data concerned. (Barry 1996) A problem arises when Flynn's taxonomy is applied to vector supercomputers like the Cray-1. In Hockney and Jesshope's book [4] the Cray-1 is categorised as SIMD machine because it has vector units. However, Hwang and Briggs in their book[5] categorise the Cray-1 as a SISD because there are no multiple processing elements. Which of these classifications is correct comes down to

whether the vector units are regarded as processing a single or multiple data stream. This is open to interpretation, and leads to problems. (Wasel 1994) (Mazke 2004) considers pipelined vector processor as MISD. The vector processors fit the term array processor in its general sense. They are, however, such an important sub-category that they retain their own identity and are referred to as vector computers rather than being lumped in with array processors. (Wasel 1994) 18 1964 ILLIAC History lesson III: Cray-1 The first Cray computer (1976) 1974 Cray-1 1984 CM-1 1994

the first supercomputer the world's most expensive loveseat by Seymour Cray: In all of the machines that I've designed, cost has been very much a secondary consideration. Figure out how to build it as fast as possible, completely disregarding the cost of construction. Besides being a vector computer, it was the fastest scalar machine of its period 133 (160?) MFLOPS peak A hand-crafted machine took months to build At least 16 systems produced 2004 19 Cray-1 Vector registers: FIFO queues capable of holding 64 single precision (64bit) elements Vector pipelines filled from the vector elements in the vector registers reduces the time to fill the pipelines for vector arithmetic operations vector registers can even be filled while the pipelines are performing some other operation

12 different pipelines (functional units) for integer or logical operations on vectors for floating-point operations using scalars or vectors for integer or logical operations on scalars for address calculations The first machine to use chaining vector results may be put back into a vector register or they may be piped directly into another pipeline for an additional vector operation Limited by one memory read and write operation per clock cycle 20 Cray-1 inside 21 Crays from outside

22 1964 ILLIAC History lesson IV: Earth Simulator 1974 Cray-1 1984 CM-1 1994 ES 2004 23 ES as vector processor The fastest supercomputer in the world from 2002 to 2004

[Wiki] 35.86 TFLOPS Developed for running global climate models and problems in solid earth geophysics Built by NEC, based on their SX-6 architecture 640 nodes with 8 vector processors and 16 GByte of operation memory SUPER-UX Interconnection Network (fullcrossbar 12.3 GB/s x 2) operating system 0 1 Node 0 [Amano] 7 0 1

Node 1 7 0 Vector Processor Vector Processor . Vector Processor Shared Memory 16GB Vector Processor Vector Processor Vector Processor

Shared Memory 16GB Vector Processor Vector Processor 122.4 TFLOPS Vector Processor 2009 replaced by Earth Simulator 2 Shared Memory 16GB 1 7 Node 639 24

ES arithmetic processor 25 MISD? A collection of processing elements, all of which execute independent streams of instructions on the same single data stream There are two ways in which this can be done: the same data item can be fed to many processing elements each executing their own stream of instructions the first processing element could pass on its results to the second processing element and so on, thus forming a macro-pipeline Without literal architectural implementation Some authors identify systolic arrays as a possible example of this form a mesh-like network of processors that rhythmically compute and pass (pump) data through the system; basic configuration realization: iWarp machines (1990, Carnegie Mellon and Intel) 26 1964 ILLIAC

C.mmp 1974 Cray-1 1984 CM-1 History lesson IV: C.mmp C.mmp (Carnegie multi-mini-processor) 1970 - 1977, William Wulf et. al, Carnegie Mellon University Experimental setup for research on parallel computer architectures built out of off-the-shelf components 16 DEC PDP-11s connected together through a 16 x 16 crossbar switch to 16 memory modules allowing 16 memory references to take place at once (on different ports) Could be extensively reconfigured 1994

PDP-11 LM P1 LM P2 M1 M2 ... M16 ... MIMD mode: normal mode of operation LM P16 SIMD mode: all the processors are coordinated by a single master controller Crossbar

MISD mode: the processors are arranged in a chain with a single stream of data passing through all of them Novel HYDRA operating system ES 2004 27 MIMD Multiple Instruction Multiple Data Several independent processors capable of executing separate programs (asynchronously) Avoids most problems of SIMD, e.g. can use commodity processors naturally supports multiple users is efficient in conditionally executed parallel code Almost all current interest in parallel computers centres about the MIMD concept Flynns taxonomy too coarse akin to dividing all contemporary computers into just the two categories, parallel and sequential Coarse subdivision based on memory organization: shared / centralized MIMD-SM, tightly coupled multiprocessor for short disjoint / distributed MIMD-DM, loosely coupled

multicomputer for short 28 MIMD-SM multiprocessors Multiple-processor/core computer with shared memory Single address space the same address on different CPUs refers to the same memory location data sharing (shared variables programming model) possible [next lecture] Single copy of the operating system Shared memory hardware is becoming commonplace typically: workstation 1 - 4 processors, server 4 - 64 processors all processors multicore (2-8 cores) Supported by all modern operating systems of course Unix/Linux and Windows Excellent at providing high throughput for a multiprocessing load within limits, scales almost linearly with number of processors Basic division: symmetric /asymmetric processors uniform / non-uniform memory access

29 Symmetric multiprocessor (SMP) Identical (equivalent) processors that share memory share the workload equally access the memory equally at equal speeds operate independently communicate via write/reads to memory P1 ... P2 M1 M2 Pp

... Mm Bus Interconnecting subsystem: bus: less expensive, but restricted throughput crossbar: better performance at higher cost P1 P2 ... M1 Processor . . . Processor

Interconnect. subsystem Memory M2 ... Mm Crossbar 30 Pp SMP = (CC-)UMA UMA Uniform Memory Access In practice, there is a bandwidth bottleneck of the interconnect scalability limited to hundreds of processors at maximum tens in case of bus-based systems Processors may have some local memory (cache) technically difficult to maintain cache consistency

Cache . . . Processor CC-UMA Cache Coherent UMACache Memory Interconnect. subsystem Processor if one processor updates a location in shared memory, all the other processors learn about the update accomplished at the hardware level, expensive 31 (processor) Cache

Chip Processing Element (processor) Memory . . . Cache Interconnect. subsystem Processing Element Chip Processing Element . . . Processing Element

Interconnect. subsystem Memory (core) Cache*) Multicore chip Multiprocessor Multiprocessor vs. multicore (core) Chip *) Depending on design, cache (L2) may be also private to core (AMD) Practically no difference from the point of parallel computing 32

1964 ILLIAC C.mmp 1974 Cray-1 History lesson VI: Sequent 1984: Sequent Balance 8000 SMP the first commercially successful parallel machine up to 20 National Semiconductor NS32016 processors each with a small cache connected to a common memory 1984 Seq.B. CM-1 1994 modified version of BSD Unix they called DYNIX each of their inexpensive processors dedicated to a particular process a series of libraries that could be used to develop applications using more than one processor at a time

designed to compete with the DEC VAX 11/780 sold well to banks, the government, other commercial enterprises, and universities interested in parallel computing 1987: Sequent Symmetry: Intel 80386-based, 2 - 30 processors Another pioneers in MIMD: Pyramid, Encore, Alliant, AT&T ES 2004 33 NUMA & CC-NUMA NUMA Non-Uniform Memory Access Aims at surpassing the scalability limits of the UMA architecture due to memory bandwidth bottleneck Memory physically shared, but access to different portions of the memory may require significantly different times direct access via global address space Many different ways to realize often by physically linking two or more SMPs (= nodes) local memory access is the fastest, access across link is slower hardware includes support circuitry to deal with remote accesses

Cache coherency with NUMA (CC-NUMA) is de facto standard directory-based protocols for cache coherency (no snooping possible) 34 Distributed shared memory (DSM) Memory physically distributed among the processors, but the system gives the illusion that it is shared concept of virtual shared memory Structure close to MIMD-DM systems message passing hidden in the remote memory access Adds more hw scalability for the shared variable programming model Memory Processor . . . Memory Processor

Interconnect. subsystem Shared 35 1964 ILLIAC C.mmp 1974 Cray-1 History lesson VII: KSR Kendall Square Research Corp. start-up since 1986 Henry Burkhardt, Steve Frank 1992 KSR-1 (next slide) rank 168 in TOP500 (256 CPUs) Seq.B 1984 .

CM-1 1994? KSR-2 rank 87 in TOP500 1995 KSR stopped production KSR-1 1994 ES 2004 36 KSR-1 Proprietary 64-bit processors 40 MFLOPS per node 32 MB local memory (called a local cache) up to 1088 processors in a two-level unidirectional communication ring (34 x 32 CPUs) 43.8 GFLOPS peak KSR-2: 5000 processors, 80 MFLOPS per node ALLCACHE engine unique implementation of virtual shared memory

data not found in the local cache are routed automatically from the node that has it classified also as COMA (Cache Only Memory Architecture) each address becomes a name without direct physical relevance cache coherency automatically maintained ideas developed at Swedish Institute of Computer Science (SICS) 37 MIMD-DM multicomputers Multiple-processor computer with distributed memory Disjoint local address spaces the same address on different CPUs refers to different memory locations no cache coherence problems message passing necessary for the processors to interact message passing programming model [next lecture] the primary choice Nodes .. more or less autonomous computers with a separate copy of the operating system can be even MIMD-SM machines (constellations) Hw scaling much easier to than with MIMD-SM Memory Processor

. . . Memory Processor Interconnect. subsystem up to hundreds of thousands nodes with specialized interconnects 38 MIMD-DM types Basic types: massively parallel processor cluster network of workstations 39

Massively parallel processor (MPP) The most tightly coupled MIMD-DM Flagships of the leading computer vendors exterior design, publicity, support, etc. price corresponds to the uniqueness of MPP Up to thousands processor nodes commodity microprocessors killed off custom CPUs ASCI White / IBM SP Custom switching networks to provide low-latency, high-bandwidth access between processor nodes good balance between speed of the processors and speed of the interconnection subsystem Ideal environment for parallel processing homogenous collection of powerful processor nodes very fast interprocess communication shielded from external impacts 40 1964

ILLIAC C.mmp 1974 Cray-1 History lesson VIII: Cray T3D/E Cray T3D Crays (CRI Cray Research Institute) first MPP (1993) captured MPP market leadership from early MPP companies such as Thinking Machines and MasPar Seq.B 1984 . CM-1 KSR-1 C.T3D 1994 exceptionally robust, reliable, sharable and easy-to-administer

Cray T3E its successor (1995) the worlds best selling MPP system Cray T3E-1200: the first supercomputer to sustain one TFLOPS on a real-world application Cray XT3 third generation (2004) AMD Opteron processors ES 2004 41 Cray T3D 2 DEC 21044 (21064?) Alpha commodity CPUs per node 150 MFLOPS peak 64 MB local memory per node systems up to 2048 CPUs (never built) Interconnect: 3D torus (hence T3Ds name) each computing node interconnects in 3 bi-directional dimensions with its nearest neighbours 300 MB/s, very low latency Although the memory of the T3D is physically distributed, it is one

globally addressable address space virtual shared memory considered as NUMA by some authors (Amano) No I/O capability - attached to and hosted by a YMP or C90 front-end 42 Sometimes referred to as distributed system Set of computers connected by a (local area) network very loosely coupled system Computer . . . Computer LAN/WAN Network of workstations (NOW) Often heterogeneous nodes different hardware, operating system, etc.

Uses LANs/WANs for communication With special middleware (e.g. PVM) can simulate parallel hardware Many features then similar to massively parallel processor Issues: reliability, security, etc. 43 Cluster (COW) Specialized, tightly-coupled NOW: Possible roles: High-performance interconnect(s) high availability Gbit Ethernet, Infiniband, Myrinet, etc. load balancing Interactive access restricted/excluded high performance Identical/homogeneous nodes Commodity clusters: Nodes intended to cooperate

assembled from commodity, Nodes without peripheral units off the self (COTS) components (e.g. displays) OS tuned to optimize throughput Todays high-performance Specialized parallelization middleware clusters and MPP converge ... 44 1964 ILLIAC C.mmp 1974 Cray-1 History lesson IX: Beowulf PC based cluster system designed as a cost-effective alternative to large supercomputers Donald Becker and Thomas Sterling, CESDIS*), 1994 16 personal computers (Intel 486DX4 processors)

Channel bonded Ethernet 10 Mbit/s (drivers by Backer) 1984 . Seq.B CM-1 KSR-1 C.T3D 1994 Beow. ES 2004 network traffic striped across two or more Ethernets processors were too fast for a single Ethernet Instant success: Recognized as a new genre within the HPC community prevalence of computers for home & office, new cost-effective components

availability of fully assembled subsystems (processors, motherboards, disks, NICs) mass market competition: prices down, reliability up open source software (Linux OS, GNU compilers, MPI, PVM) obtaining high performance, even from vendor provided parallel platforms, is hard work and requires researchers to adopt a do-it-yourself attitude increased reliance on computational science which demands HPC *) Center of Excellence in Space Data and Information Sciences, a NASA contractor 45 Beowulf class cluster computers Dedicated nodes and networks, serve no other purpose usually identical computing nodes usually one special front-end node Commodity computers, relatively inexpensive, as nodes Networks also commodity entities at least they must interconnect through a standard bus (e.g. PCI) to differentiate from MPP where the network and CPUs are customintegrated at very high cost The nodes all run open source software usually Linux as OS The resulting cluster is used for HPC computational cluster usually just one computation at a time

46 1964 ILLIAC C.mmp 1974 Cray-1 History lesson X: Blue Gene/L A very-massively-parallel processor Result of the Blue Gene initiative (1999 ) produce supercomputers with operating speeds in the petaFLOPS range large-scale simulation of protein folding novel ideas in massively parallel architecture and software e.g. energy efficiency IBM, Lawrence Livermore National Laboratory and others 1984 . Seq.B CM-1 No. 1 in TOP500 in 2004 2007 (70.7 478 TFLOPS)

Employs both shared and distributed memory Relatively modest power & cooling requirements green computing PowerPC 400 processor family 1/100 the physical size of the ES, 1/28 the power per computation KSR-1 Commercially available, attractive cost C.T3D Blue Gene/P BGs second generation (2007) 1994 Beow. JUGENE (Forschungszentrum Juelich, Germany) 294 912 PE (4 per shared memory node) 825 TFLOPS ES 2004 BGL 47 BlueGene/L nodes Dual PowerPC 440/700 MHz (modified PPC400) 0.5-1 GB memory shared by 2 processing elements 5.6 GFLOPS peak performance per node

Compact, low-power building block speed of the processor traded in favour of very dense packaging and a low power consumption more adequate to the memory speed Complete system-on-a-chip a modest amount of fast on-chip memory an on-chip memory controller for access to larger external memory chips 5 network interfaces [next slide] 48 BlueGene/L interconnects Five interconnecting subsystems complementary high-speed low-latency networks two of interest for inter-processor communication 3D torus network simple 3-dimensional nearest neighbour interconnect for most general point-to-point communication patterns

hardware bandwidth 175 MB/s per link Tree network for fast global operations (collective communication patterns like broadcasting, reduction operations, etc.) hardware bandwidth 350 MB/s per link Other networks global barrier and interrupt network Gigabit Ethernet network for connection to other systems Gigabit Ethernet network for machine control 49 Current supercomputers TOP 500 .. the supercomputers premier league of the 500 most powerful (known) computer systems in the world since 1993, updated twice a year based on the High-Performance LINPACK benchmark Rmax, Rpeak [TFLOPS] Since 1993, the statistics agrees with the Moores law performance of the #1 ranked position doubles roughly every 14 months

Current leader: a Cray XT5 system called Jaguar 224 256 Opteron processor cores, Rmax 1750 TFLOPS Parallel machines only uniprocessors disappeared in mid 90s dominated by clusters and MPPs [next slides] only one machine declared as vector computer almost all machines use multicore processors combined shared and distributed memory architectures 50 TOP500 performance development BGL ES 51

TOP500 architectures development 52 Conclusions The parallel hardware scene is pretty stabilized nowadays one has to return to the past to learn about the full variety of possible designs New impulses for HPC by GPU (graphics processing unit) accelerators Cell Broadband Engine (Cell BE) architecture IBM Roadrunner supercomputer, the first petaflops system FPGA (field-programmable gate array) and multi/many-core 53 Further study The treatment of parallel architectures is always one of the first chapters in all parallel textbooks [Lin 2009] Principles of Parallel Programming new enough to mention the advent of multicore chips and its consequences

Tons of material from the computer history on the Internet TOP500, 54 Acknowledgements Material presented in this lecture comes from plenty of resources, mostly on the Internet, including Wikipedia TOP500 Lawrence Livermore National Laboratory and many others 55 56

Recently Viewed Presentations



  • Protein synthesis inhibitors Tetracyclines

    Protein synthesis inhibitors Tetracyclines

    peptidyl transferase . reaction. Due to some similarity of mammalian mitochondrial ribosomes to those of bacteria, protein and ATP synthesis in these organelles may be inhibited at high circulating chloramphenicol levels, producing bone marrow toxicity.
  • Gila County Sheriff 'S Office

    Gila County Sheriff 'S Office

    Welcome to the 2016 Annual Report for the Gila County Sheriff's Office. ... The mission of the Patrol Bureau is to protect and serve the communities of Gila County through community presence, education and training. Patrol maintains interoperability and coordination...
  • Linear Dependence and Independence

    Linear Dependence and Independence

    Linear Dependence and Independence - II. Lemma 3.2.2. Set {A 1,…,A k} of k ≥ 2 mxn matrices with A j ≠ 0is linearly dependent iff at least one matrix can be written as linear combination of the previous ones:
  • Battle of Taste Injustice The Behind  the Karen

    Battle of Taste Injustice The Behind the Karen

    04 05 市場想像 06 Our Social Mission A Fairtrade Coffee Revolution 07 Global Fairtrade Community Visiting the minister of agriculture of Peru in Peru Meeting with the International Fairtrade Towns Committee With the Asia network of Fairtrade International 08 Product...
  • Cerebral Palsy - Health Sciences Center

    Cerebral Palsy - Health Sciences Center

    Delayed GI motility, poor hydration status and decreased activity level can all lead to constipation. Constipation can be a cause for vomiting and for pain. Always make sure that the patient is receiving at least maintenance fluid requirements - often...
  • The new VCE Physical Education study design

    The new VCE Physical Education study design

    Codes. Technical, written and symbolic tools used to construct or suggest meaning in media forms and products. Media codes include the use of camera, acting, setting, mise en scene, editing, lighting, sound, special effects, typography, colour, visual composition, text and...


    From link prediction to relationship prediction . A relationship between two objects could be a composition of two or more links. E.g., two authors have a co-author relationship if and only if they have co-written a paper. Need to re-design...