[1] Su Y, Wang Z, Fan Z, et al. HyperFatTree: A Large-Scale Tree-Based Network with Low-Radix Switches[J]. International Journal of Parallel Programming, 2017, 45(1): 172-184.
[2] 臧大伟, 曹政, 王展, 等. 基于 AWGR 的 OCS/EPS 数据中心光电混合网络[J]. 计算机学报, 2016, 39(9): 1868-1882.
[3] Zang D, Chen M, Sun N, et al. OpticV: An energy-efficient datacenter network architecture by MEMS-based all-optical bypassing[C]//IEEE Optical Interconnects Conference (OI), 2016. IEEE, 2016: 70-71.
[4] Yuan L, Liu J, Luo Y, et al. Locality of Computation for Stencil Optimization[M]//Algorithms and Architectures for Parallel Processing. Springer International Publishing, 2016: 449-456.
[5] Xie Z, Cao Z, Wang Z, et al. Modeling Traffic of Big Data Platform for Large Scale Datacenter Networks[C]//Parallel and Distributed Systems (ICPADS), 2016 IEEE 22nd International Conference on. IEEE, 2016: 224-231.
[6] Li X, Tan G, Zhang C, et al. Accelerating large-scale genomic analysis with Spark[C]//Bioinformatics and Biomedicine (BIBM), 2016 IEEE International Conference on. IEEE, 2016: 747-751.
[7] 臧大伟, 曹政, 王展, 等. 基于 AWGR 的 OCS/EPS 数据中心光电混合网络[J]. 计算机学报, 2016, 39(9): 1868-1882.
[8] Zang D, Chen M, Sun N, et al. OpticV: An energy-efficient datacenter network architecture by MEMS-based all-optical bypassing[C]//IEEE Optical Interconnects Conference (OI), 2016. IEEE, 2016: 70-71.
[9] Yuan G, Proietti R, Liu X, et al. ARON: Application-Driven Reconfigurable Optical Networking for HPC Data Centers[C]//ECOC 2016; 42nd European Conference on Optical Communication; Proceedings of. VDE, 2016: 1-3.
[10] 臧大伟, 曹政, 王展, 等. 基于 AWGR 的 OCS/EPS 数据中心光电混合网络[J]. 计算机学报, 2016, 39(9): 1868-1882.
[11] Tan G, Zhang C, Tang W, et al. Accelerating irregular computation in massive short reads mapping on FPGA co-processor[J]. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(5): 1253-1264.
[12] Yan J, Tan G, Mo Z, et al. Graphine: Programming Graph-Parallel Computation of Large Natural Graphs for Multicore Clusters[J]. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(6): 1647-1659.
[13] Proietti R, Cao Z, Nitta C J, et al. A scalable, low-latency, high-throughput, optical interconnect architecture based on arrayed waveguide grating routers[J]. Journal of Lightwave Technology, 2015, 33(4): 911-920.
[14] 王展, 曹政, 刘小丽, 等. 基于单根 I/O 虚拟化的多根 I/O 资源池化方法[J]. 计算机研究与发展, 2015, 52(1): 83-93.
[15] Yao E, Zhang J, Chen M, et al. Detection of soft errors in LU decomposition with partial pivoting using algorithm-based fault tolerance[J]. The International Journal of High Performance Computing Applications, 2015, 29(4): 422-436.
[16] Tan G, Zhang C, Wang W, et al. SuperDragon: A Heterogeneous Parallel System for Accelerating 3D Reconstruction of Cryo-Electron Microscopy Images[J]. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2015, 8(4): 25.
[17] Yan J, Tan G, Sun N. Study on partitioning real-world directed graphs of skewed degree distribution[C]//Parallel Processing (ICPP), 2015 44th International Conference on. IEEE, 2015: 699-708.
[18] Wang Y, Li Q, Tan G. Application Taxonomy via Algorithmic Commonality for Domain-Specific Architecture Desgin[C]//High Performance Computing (HiPC), 2015 IEEE 22nd International Conference on. IEEE, 2015: 21-29.
[19] Zhang X, Tan G, Chen M. A Reliable Distributed Convolutional Neural Network for Biology Image Segmentation[C]//Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on. IEEE, 2015: 777-780.
[20] Zhao X, Liu C, Tan G. Implementation of Short Read Alignment Algorithm in OpenCL on Xeon Phi Coprocessor[C]// IEEE, International Conference on High PERFORMANCE Computing and Communications, 2015 IEEE, International Symposium on Cyberspace Safety and Security, and 2015 IEEE, International Conf on Embedded Software and Systems. IEEE Computer Society, 2015:1633-1636.
[21] Yao E, Tan G. Bit Flipping Errors in High Performance Linpack at Exascale and Beyond[C]//Parallel Processing (ICPP), 2015 44th International Conference on. IEEE, 2015: 420-429.
[22] Luo Y, Tan G, Mo Z, et al. FAST: A fast stencil autotuning framework based on an optimal-solution space model[C]//Proceedings of the 29th ACM on International Conference on Supercomputing. ACM, 2015: 187-196.
[23] Zhang C, Tang W, Guangming T. Accelerating massive short reads mapping for next generation sequencing[C]//Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays. ACM, 2014: 246-246.
[24] Lu H, Tan G, Chen M, et al. Reducing Communication in Parallel Breadth-First Search on Distributed Memory Systems[C]//Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on. IEEE, 2014: 1261-1268.
[25] Luo Y L, Tan G M. Optimizing stencil code via locality of computation[C]//Proceedings of the 23rd international conference on Parallel architectures and compilation. ACM, 2014: 477-478.
[26] Su Y, Cao Z, Fan Z, et al. Building a large-scale direct network with low-radix routers[C]//Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on. IEEE, 2014: 368-375.
[27] Fan Z, Cao Z, Su Y, et al. HiNetSim: A parallel simulator for large-scale hierarchical direct networks[C]//IFIP International Conference on Network and Parallel Computing. Springer Berlin Heidelberg, 2014: 120-131.
[28] Cao Z, Chen F, An X, et al. Accelerating synchronization communications for high-density blade enclosure[C]//Proceedings of the 11th ACM Conference on Computing Frontiers. ACM, 2014: 14.
[29] Yan J, Tan G, Sun N. Exploiting fine-grained parallelism in graph traversal algorithms via lock virtualization on multi-core architecture[J]. The Journal of Supercomputing, 2014, 69(3): 1462-1490.
[30] Zang D, Cao Z, Wang Z, et al. Decentralized NIC-Switching Architecture Using SR-IOV PCI Express Network Device[J]. IEEE Micro, 2014, 34(5): 42-50.
[31] Cao Z, Liu X L, Li Q, et al. An intra-server interconnect fabric for heterogeneous computing[J]. Journal of Computer Science and Technology, 2014, 29(6): 976-988.
[32] Sun N, Tan G, Zhang X, et al. Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms[C]//Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE Computer Society, 2013: 1-10.
[33] 严林, 邢晶, 霍志刚, 等. 面向海量数据存储的 Erasure-Code 分布式文件系统 I/O 优化方法[J]. 计算机工程与科学, 2013, 35(5): 20-27.
[34] 刘兴奎, 邵宗有, 刘新春, 等. 面向深度包检测的 DFA 细粒度并行匹配方法[J]. 计算机研究与发展, 2014, 51(5): 1061-1070.
[35] 王迎瑞, 任江勇, 田荣. 基于 GPU 的高性能稀疏矩阵向量乘及 CG 求解器优化[J]. 计算机科学, 2013, 40(3): 46-49.
[36] 邵宗有, 刘兴奎, 刘新春, 等. 面向骨干网 NIDS 的细粒度并行多模式匹配方法[J]. 计算机科学, 2013, 40(3): 68-73.
[37] 邢晶, 熊劲, 孙凝晖, 等. 一种支持 EB 级存储的可扩展存储空间管理方法[J]. 计算机研究与发展, 2013, 50(8): 1573-1582.
[38] Lv H, Tan G, Chen M, et al. Understanding parallelism in graph traversal on multi-core clusters[J]. Computer Science-Research and Development, 2013, 28(2-3): 193-201.
[39] 李强, 孙凝晖, 霍志刚, 等. MPI Alltoall 通信在多核机群中的优化[J]. 计算机研究与发展, 2013, 50(8): 1744-1754.
[40] Li J, Tan G, Chen M, et al. SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication[C]//ACM SIGPLAN Notices. ACM, 2013, 48(6): 117-126.
[31] Cao Z, Liu X L, Li Q, et al. An intra-server interconnect fabric for heterogeneous computing[J]. Journal of Computer Science and Technology, 2014, 29(6): 976-988.
[32] Sun N, Tan G, Zhang X, et al. Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms[C]//Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE Computer Society, 2013: 1-10.
[33] 严林, 邢晶, 霍志刚, 等. 面向海量数据存储的 Erasure-Code 分布式文件系统 I/O 优化方法[J]. 计算机工程与科学, 2013, 35(5): 20-27.
[34] 刘兴奎, 邵宗有, 刘新春, 等. 面向深度包检测的 DFA 细粒度并行匹配方法[J]. 计算机研究与发展, 2014, 51(5): 1061-1070.
[35] 王迎瑞, 任江勇, 田荣. 基于 GPU 的高性能稀疏矩阵向量乘及 CG 求解器优化[J]. 计算机科学, 2013, 40(3): 46-49.
[36] 邵宗有, 刘兴奎, 刘新春, 等. 面向骨干网 NIDS 的细粒度并行多模式匹配方法[J]. 计算机科学, 2013, 40(3): 68-73.
[37] 邢晶, 熊劲, 孙凝晖, 等. 一种支持 EB 级存储的可扩展存储空间管理方法[J]. 计算机研究与发展, 2013, 50(8): 1573-1582.
[38] Lv H, Tan G, Chen M, et al. Understanding parallelism in graph traversal on multi-core clusters[J]. Computer Science-Research and Development, 2013, 28(2-3): 193-201.
[39] 李强, 孙凝晖, 霍志刚, 等. MPI Alltoall 通信在多核机群中的优化[J]. 计算机研究与发展, 2013, 50(8): 1744-1754.
[40] Li J, Tan G, Chen M, et al. SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication[C]//ACM SIGPLAN Notices. ACM, 2013, 48(6): 117-126.
[41] Sun N, Tan G, Zhang X, et al. Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms[C]//Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE Computer Society, 2013: 1-10.
[42] Yan J, Tan G M, Sun N H. Optimizing parallel S n sweeps on unstructured grids for multi-core clusters[J]. Journal of Computer Science and Technology, 2013, 28(4): 657-670.
[43] Su Y, Liu F, Cao Z, et al. cHPP controller: a high performance hyper-node hardware accelerator[C]//Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2013 International Conference on. IEEE, 2013: 117-123.
[44] 邢晶, 熊劲, 孙凝晖, 等. 一种支持 EB 级存储的可扩展存储空间管理方法[J]. 计算机研究与发展, 2013, 50(8): 1573-1582.
[45] 苏勇, 刘飞龙, 曹政,等. 一种低开销的面向节点内互连的网络接口控制器[C]// 2013全国高性能计算学术年会. 2013.
[46] 刘小兵, 苑鲁峰, 刘勇,等. 基于UVM验证方法学的高密度路由器功能验证平台的设计与实现[C]// 2013全国高性能计算学术年会. 2013.
[47] Yan L, Xing J, Wang T, et al. Write bandwidth optimization of online Erasure Code based cluster file system[C]//Cluster Computing (CLUSTER), 2013 IEEE International Conference on. IEEE, 2013: 1-8.
[48] 刘佩, 邢晶, 霍志刚, 等. 面向分布式文件系统的可扩展数据快照技术[J]. 电子技术 (上海), 2015 (006): 97-102.
[49] 刘厚贵, 邢晶, 霍志刚, 等. 一种支持海量数据备份的可扩展分布式重复数据删除系统[J]. 计算机研究与发展, 2013, 2.
[50] 黎雷生, 田荣. 高可扩展可容错的无网格/粒子程序 petaPar 及其测试[J]. 科研信息化技术与应用, 2013, 4(5): 3-9.
[51] 黎雷生, 田荣. 千万亿次级无网格粒子模拟程序:层次化并行与超线性加速[C]// 2013全国高性能计算学术年会. 2013.
[52] 王迎瑞, 任江勇, 田荣. 基于 GPU 的高性能稀疏矩阵向量乘及 CG 求解器优化[J]. 计算机科学, 2013, 40(3): 46-49.
[53] Lv H, Tan G, Chen M, et al. Understanding parallelism in graph traversal on multi-core clusters[J]. Computer Science – Research and Development, 2013, 28(2):193-201.
[54] 戴福鑫, 谭光明, 张佩珩等. 基于CUDA的RTM算法并行优化[C]// 2012全国高性能计算学术年会. 2012.
[55] 游定山, 杨佳, 沈华等. 一款高性能计算机系统中互联芯片的物理实现[C]// 2012全国高性能计算学术年会. 2012.
[56] 闫洁, 谭光明, 孙凝晖等. GRE:针对大规模图处理的一种新型算法框架[C]// 2012全国高性能计算学术年会. 2012.
[57] 骆裕龙, 谭光明, 闫洁等. 基于多核集群的JASMIN下并行Sn扫描算法的优化[C]// 2012全国高性能计算学术年会. 2012.
[58] Tang W, Wang W, Duan B, et al. Accelerating millions of short reads mapping on a heterogeneous architecture with FPGA accelerator[C]//Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on. IEEE, 2012: 184-187.
[59] Wang W, Tang W, Li L, et al. Investigating memory optimization of hash-index for next generation sequencing on multi-core architecture[C]//Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International. IEEE, 2012: 665-674.
[60] Yang J, Shen H, Liu L, et al. Multi-mode timing closure of D6000 Collective Communication Chip[C]//Solid-State and Integrated Circuit Technology (ICSICT), 2012 IEEE 11th International Conference on. IEEE, 2012: 1-3.
[61] Wang W, Duan B, Tang W, et al. A coarse-grained stream architecture for cryo-electron microscopy images 3D reconstruction[C]//Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays. ACM, 2012: 143-152.
[62] Wang Z, Cao Z, Liu X, et al. Design of Hardware-Based Communication Performance Measurement Tool[C]//Cluster Computing (CLUSTER), 2012 IEEE International Conference on. IEEE, 2012: 580-583.
[63] Li J, Li X, Tan G, et al. An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs[C]//Proceedings of the 26th ACM international conference on Supercomputing. ACM, 2012: 377-386.
[64] 陈飞, 曹政, 王凯, 等. 高性能计算节点中的同步操作加速引擎设计[J]. 电子科技大学学报, 2012, 41(1): 92-97.

作者 gxnzx

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注