Multi-objective Design Space Exploration of Multi-Process...

  • Project Name:
    Multi-objective Design Space Exploration of Multi-Processor-SoC Architectures for Embedded Multimedia Applications (MULTICUBE FP7-216693)
  • Project Code:
    FP7-216693
  • Publish Department:
  • Start Time:
    2008-01-01
  • End Time:
    2010-06-31
  • Project Description
    This project is supported by EU Seventh Framework Program (FP7), Small or medium-scale focused research project (STREP) proposal, ICT Call 1 started in Jan, 2008. The MULTICUBE project focuses on the definition of an automatic multi-objective Design Space Exploration (DSE) framework to be used to tune the System-on-Chip architecture for the target application evaluating a set of metrics (e.g. energy, latency, throughput, bandwidth, QoS, etc.) for the next generation embedded multimedia platforms. On one hand, the framework is to find design alternatives that best meet system constraints and cost criteria, strongly dependent on the target application, but also to restrict the search space to principal parameters to enable an efficient exploration. The output of the framework is a set of Pareto configurations with respect to the selected metrics, which can provide the best system configuration satisfying the constraints. On the other hand, the MULTICUBE project will define a run-time DSE framework based on the applications of the results of the static multi-objective design exploration to optimize the run-time allocation and scheduling of different application tasks. The design exploration flow results in a Pareto-optimal set of design alternatives with different speed, energy, memory and communication bandwidth parameters. This information can be used at runtime by the operating system to make an informed decision about how the resources should be distributed over different tasks running on the multi-processor system on-chip.
 

Scalable Simulation Platform with Reusable Logic

  • Project Name:
    Scalable Simulation Platform with Reusable Logic
  • Project Code:
    20096050
  • Publish Department:
  • Start Time:
    2009-01-01
  • End Time:
    2010-12-31
  • Project Description
    This project is supported by ICT Foundation under Grant No. 20096050. On-chip multi-processor has become a growing research field. With the number of processing cores increasing, the simulation and verification becomes a challenge. The first challenge is that simulation performance cannot be achieved when using a pure software simulation, and the time cost is higher and higher. So it introduces FPGA simulation platform. But the second challenge is coming with it, that is, simulating prototype based on FPGA has highly cost. This project focuses on the simulation method of on-chip multi-processors. The purpose is to design a FPGA-based simulation platform with high efficiency and low cost for large scale chipped multi-processors.
 

Research on runtimes for many-core systems

  • Project Name:
    Research on runtimes for many-core systems
  • Project Code:
    2005CB321600
  • Publish Department:
  • Start Time:
    2009-01-01
  • End Time:
    2010-12-31
  • Project Description
    This project is supported by National High-Tech Research and Development Plan of China under Grant No.2009AA01Z103. There are three challenging problems for the runtime systems many core systems. Firstly, productive parallel programming on many core systems is difficult. Secondly, the previous works on runtime systems did not take the topology of on-chip network in the context of many-core architecture into consideration, and thus it incurs imbalance in bandwidth, which hamper performance improvement of applications. Finally, it is necessary to prefetch data dynamically to improve the throughput of memory bandwidth because of the pin count limitation which incurs serious bandwidth contention between increasing processing cores. This project focuses on these problems, aims to release the programmer from painful parallel program tuning, schedule to achieve load balance.
 

Research on many core system design exploiting the behavi...

  • Project Name:
    Research on many core system design exploiting the behavior of Bioinformatics applications
  • Project Code:
    4092044
  • Publish Department:
  • Start Time:
    2009-01-01
  • End Time:
    2012-12-31
  • Project Description
    This project is supported by Beijing Natural Science Foundation under Grant No.4092044. Bioinformatics is an important field but its development is limited by finite computational ability. The contents of this project include three aspects of design methodology for on-chip many-core architecture. Firstly, since there is lots of regular and non-regular computation in bioinformatics, it focuses on supporting scheme of regular and non-regular computation to improve efficiency of bioinformatics. Secondly, since there are lots of dependent relations in different granularities in bioinformatics, it focuses on chipped fast synchronization schemes to enhance the communication speed between different processing elements. Thirdly, it focuses on runtime systems to achieve efficient management of on-chip resources, and to adapt to diversity of bioinformatics algorithms. The purpose of this project is to provide efficient design methodology of on-chip architectures for bioinformatics computation.
 

High Performance On-Chip Memory Systems (Grant No. 60736012)

  • Project Name:
    High Performance On-Chip Memory Systems (Grant No. 60736012)
  • Project Code:
    60736012
  • Publish Department:
  • Start Time:
    2008-01-01
  • End Time:
    2012-12-31
  • Project Description
    This project is supported by National natural science foundation of China, 60736012 started in 2007. It introduces emergent demand for high performance memory systems, when it comes to on-chip multi-processor eras. Memory wall is a critical hamper for overall performance of processor chip. This project is aimed to bridge the speed gap between processing cores and memory, focusing on critical technologies of high performance memory systems. The main contents including organization and protocols of on-chip memory hierarchy, organization and protocols of on-chip data accessing and communication systems, data distribution involved with operating systems and compilation, etc. The purpose of this project is to solve some crucial problems from the above three aspects, such as data locality distribution, cache hit rate, cache coherence in distributed shared memory systems, and power improvement of on-chip memory systems, and so on.
 

SimICT framework

  • Project Name:
    SimICT framework
  • Project Code:
  • Publish Department:
  • Start Time:
  • End Time:
  • Project Description
    SimICT simulation framework is developed basing on the thought of component. The goal of the SimICT is to simulate 1,000+ cores while reducing the pain of development process. SimICT offers automatic parallel simulation environment to users. Using this way, the workload of the parallel simulator developing can be greatly reduced. Main features of SimICT are listed as follows:  Event-driven and cycle-accurate;  Topology re-configurable;  Automatic parallel simulation environment; The following figure shows the architecture of the SimICT. The SimICT parallel simulation framework offers the topology division, communication and synchronization services. The basic element of SimICT is FS (framework service). Each component belongs to one FS, and all each FS runs on one thread or process. The message transmission between components is completed by the FS. The FS architecture is showed in the following figure. Every FS has its own message queue, synchronization unit and topology unite. The message queue is used to buffer the transmission message, the synchronization unit is responsible for synchronization of different FSs, and the topology unit is used to find the destination of the message.
 

256-Core HGJ-MPU Simulator

  • Project Name:
    256-Core HGJ-MPU Simulator
  • Project Code:
  • Publish Department:
  • Start Time:
  • End Time:
  • Project Description
    HGJ-MPU is a novel many-core processor designed for high-performance computing. It has a peak performance of 2TFLOPS when running at 1GHz. It also has lots of efficient features to meet the demands of high performance. For example, HGJ-MPU’s processor cores support multiple modes, which are thread-mode, vector-mode, and thread-vector-mode, respectively. This feature makes it suitable for different kinds of applications. Besides that, HGJ-MPU also has many architecture improvements in both network and memory system on chip. According to our test based on simulator, the real performance of HGJ-MPU exceeds 1TFLOPS. HGJ-MPU simulator is a cycle-accurate, parallel many-core simulator used for HGJ-MPU processor evaluation and verification. The main features of the simulator are as follows:  Cycle-accurate: It can describe the exact timing information of the main components of HGJ-MPU, like cores, routers, caches, etc.  Large-scale simulation: It simulates 256-core architecture, and can be extended to 1000-core scale. The target system can be a cluster-on-chip system with a hybrid programming mechanism combing message-passing and shared-memory.  Highly-configurable: The main architecture parameters can be configured by the user. Fault injection is also supported in our simulation platform.  Fast parallel simulation: HGJ-MPU simulator can be arranged on a parallel host platform, producing a much higher speed then sequential execution. HGJ-MPU simulator has a visual interface for the performance monitor and parameter configuration as figure below. It can display the real-time information, such as each core’s IPC, hit rates of I-Cache and D-Cache. When fault injection function is enabled, the faulted core will be indicated with red color, as the picture shows.