Kaby Lake vs Sandy Bridge
Posted:
Sun Mar 26, 2017 7:58 am
by KDE
Hello,
I noticed that there is high CPU usage in kernel in many cases on Sandy Bridge CPU.
For example reading small file cached in memory takes at least 0.1 ms kernel CPU time. I haven't tested with non-grsec kernel.
Do new instructions like SMAP/SMEP supported by Kaby Lake CPUs make CPU usage lower?
Is Kaby Lake GPU affected by higher CPU usage of grsec kernel? Kaby Lake GPU is almost same as Sky Lake GPU.
Re: Kaby Lake vs Sandy Bridge
Posted:
Mon Mar 27, 2017 2:39 pm
by KDE
oprofile output from disk read test with 4.9.18 kernel
- Code: Select all
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
144 23.5679 vmlinux __pax_close_userland
134 21.9313 vmlinux __pax_open_userland
54 8.8380 vmlinux copy_user_generic_string
21 3.4370 vmlinux __get_user_1
15 2.4550 ld-2.23.so do_lookup_x
15 2.4550 vmlinux pax_erase_kstack
9 1.4730 ld-2.23.so _dl_lookup_symbol_x
8 1.3093 vmlinux page_fault
7 1.1457 vmlinux __radix_tree_lookup
6 0.9820 vmlinux find_get_entry
4 0.6547 ld-2.23.so _dl_relocate_object
4 0.6547 vmlinux __access_ok
4 0.6547 vmlinux __put_user_1
4 0.6547 vmlinux filemap_map_pages
4 0.6547 vmlinux patch_pax_enter_kernel_user
4 0.6547 vmlinux patch_pax_exit_kernel_user
4 0.6547 vmlinux strncpy_from_user
3 0.4910 ld-2.23.so check_match
3 0.4910 libc-2.23.so _int_malloc
3 0.4910 libc-2.23.so getenv
3 0.4910 vmlinux __list_del_entry
3 0.4910 vmlinux __mod_node_page_state
3 0.4910 vmlinux clear_page
3 0.4910 vmlinux unmap_page_range
2 0.3273 ld-2.23.so strcmp
2 0.3273 libQt5Core.so.5.7.1 QFSFileEngine::write(char const*, long long)
2 0.3273 libQt5Core.so.5.7.1 QUtf8::convertToUnicode(char const*, int, QTextCodec::ConverterState*)
2 0.3273 libc-2.23.so _IO_file_xsputn@@GLIBC_2.2.5
2 0.3273 libc-2.23.so __tzstring_len
2 0.3273 vmlinux __d_lookup_rcu
2 0.3273 vmlinux __virt_addr_valid
2 0.3273 vmlinux _cond_resched
2 0.3273 vmlinux dnotify_flush
2 0.3273 vmlinux expand_files
2 0.3273 vmlinux filldir
2 0.3273 vmlinux link_path_walk
2 0.3273 vmlinux lookup_fast
2 0.3273 vmlinux page_add_file_rmap
2 0.3273 vmlinux pax_enter_kernel
2 0.3273 vmlinux pax_randomize_kstack
2 0.3273 vmlinux pax_track_stack
2 0.3273 vmlinux release_pages
2 0.3273 vmlinux vfs_fstat
2 0.3273 vmlinux vfs_getattr_nosec
2 0.3273 vmlinux vmacache_find
1 0.1637 disk-read-benchmark main
1 0.1637 ld-2.23.so _dl_check_map_versions
1 0.1637 libQt5Core.so.5.7.1 QByteArray::isNull() const
1 0.1637 libQt5Core.so.5.7.1 QFSFileEnginePrivate::flushFh()
1 0.1637 libQt5Core.so.5.7.1 QFSFileEnginePrivate::nativeOpen(QFlags<QIODevice::OpenModeFlag>)
1 0.1637 libQt5Core.so.5.7.1 QFileDevice::flush()
1 0.1637 libQt5Core.so.5.7.1 QFileDevice::readData(char*, long long)
1 0.1637 libQt5Core.so.5.7.1 QFileDevicePrivate::QFileDevicePrivate()
1 0.1637 libQt5Core.so.5.7.1 QFileInfo::isFile() const
1 0.1637 libQt5Core.so.5.7.1 QFileSystemEntry::nativeFilePath() const
1 0.1637 libQt5Core.so.5.7.1 QFileSystemEntry::resolveFilePath() const
1 0.1637 libQt5Core.so.5.7.1 QHashData::rehash(int)
1 0.1637 libQt5Core.so.5.7.1 QIODevice::openMode() const
1 0.1637 libQt5Core.so.5.7.1 QIODevicePrivate::setWriteChannelCount(int)
1 0.1637 libQt5Core.so.5.7.1 QLocale::QLocale(QLocale::Language, QLocale::Country)
1 0.1637 libQt5Core.so.5.7.1 QLocale::~QLocale()
1 0.1637 libQt5Core.so.5.7.1 QObjectPrivate::~QObjectPrivate()
1 0.1637 libQt5Core.so.5.7.1 QString::fromLatin1_helper(char const*, int)
1 0.1637 libQt5Core.so.5.7.1 QString::lastIndexOf(QChar, int, Qt::CaseSensitivity) const
1 0.1637 libQt5Core.so.5.7.1 QString::operator=(QString const&)
1 0.1637 libQt5Core.so.5.7.1 QString::~QString()
1 0.1637 libQt5Core.so.5.7.1 QTextStream::operator<<(QChar)
1 0.1637 libQt5Core.so.5.7.1 QTextStreamPrivate::putString(QLatin1String, bool)
1 0.1637 libQt5Core.so.5.7.1 QTextStreamPrivate::write(QLatin1String)
1 0.1637 libQt5Core.so.5.7.1 QTime::isValid() const
1 0.1637 libQt5Core.so.5.7.1 QTime::msec() const
1 0.1637 libQt5Core.so.5.7.1 QTime::second() const
1 0.1637 libQt5Core.so.5.7.1 QTime::setHMS(int, int, int, int)
1 0.1637 libQt5Core.so.5.7.1 QUtf8::convertFromUnicode(QChar const*, int, QTextCodec::ConverterState*)
1 0.1637 libQt5Core.so.5.7.1 QVector<QLoggingRule>::~QVector()
1 0.1637 libQt5Core.so.5.7.1 QVector<QRingBuffer>::reallocData(int, int, QFlags<QArrayData::AllocationOption>)
1 0.1637 libQt5Core.so.5.7.1 doubleToAscii(double, QLocaleData::DoubleForm, int, char*, int, bool&, int&, int&)
1 0.1637 libc-2.23.so __GI___mempcpy
1 0.1637 libc-2.23.so __memcmp_sse2
1 0.1637 libc-2.23.so __memmove_ssse3
1 0.1637 libc-2.23.so __tzfile_compute
1 0.1637 libc-2.23.so _dl_addr
1 0.1637 libc-2.23.so _int_free
1 0.1637 libc-2.23.so _int_realloc
1 0.1637 libc-2.23.so fflush
1 0.1637 libc-2.23.so free
1 0.1637 libc-2.23.so malloc_consolidate
1 0.1637 libc-2.23.so readdir_r
1 0.1637 libc-2.23.so realloc
1 0.1637 libc-2.23.so sysconf
1 0.1637 libdouble-conversion.so.1.0.0 double_conversion::FastDtoa(double, double_conversion::FastDtoaMode, int, double_conversion::Vector<char>, int*, int*)
1 0.1637 libdouble-conversion.so.1.0.0 double_conversion::PowersOfTenCache::GetCachedPowerForBinaryExponentRange(int, int, double_conversion::DiyFp*, int*)
1 0.1637 libpcre.so.1.2.8 get_ucp.constprop.6
1 0.1637 libstdc++.so.6.0.21 /usr/lib64/gcc/x86_64-pc-linux-gnu/5.4.0/libstdc++.so.6.0.21
1 0.1637 vmlinux __atime_needs_update
1 0.1637 vmlinux __call_rcu.constprop.68
1 0.1637 vmlinux __check_object_size
1 0.1637 vmlinux __fget_light
1 0.1637 vmlinux __fsnotify_parent
1 0.1637 vmlinux __list_add
1 0.1637 vmlinux __mod_zone_page_state
1 0.1637 vmlinux __native_flush_tlb_single
1 0.1637 vmlinux __rmqueue
1 0.1637 vmlinux __slab_free
1 0.1637 vmlinux __tlb_remove_page_size.part.96
1 0.1637 vmlinux __vma_adjust
1 0.1637 vmlinux __vma_rb_erase
1 0.1637 vmlinux _raw_spin_lock
1 0.1637 vmlinux check_stack_object
1 0.1637 vmlinux copy_page
1 0.1637 vmlinux copy_page_to_iter
1 0.1637 vmlinux dec_zone_page_state
1 0.1637 vmlinux do_dentry_open.isra.22
1 0.1637 vmlinux do_sys_open
1 0.1637 vmlinux down_write
1 0.1637 vmlinux dput
1 0.1637 vmlinux fd_install
1 0.1637 vmlinux find_next_bit
1 0.1637 vmlinux find_vma
1 0.1637 vmlinux fsnotify
1 0.1637 vmlinux generic_file_read_iter
1 0.1637 vmlinux generic_fillattr
1 0.1637 vmlinux gr_chroot_pathat
1 0.1637 vmlinux gr_set_proc_label
1 0.1637 vmlinux kmem_cache_alloc
1 0.1637 vmlinux kmem_cache_free
1 0.1637 vmlinux ldsem_down_read
1 0.1637 vmlinux ldsem_up_read
1 0.1637 vmlinux legitimize_mnt
1 0.1637 vmlinux lockref_put_return
1 0.1637 vmlinux lru_cache_add_active_or_unevictable
1 0.1637 vmlinux mark_page_accessed
1 0.1637 vmlinux memcpy
1 0.1637 vmlinux path_lookupat
1 0.1637 vmlinux perf_event_mmap
1 0.1637 vmlinux perf_lock_task_context
1 0.1637 vmlinux queued_spin_unlock_wait
1 0.1637 vmlinux radix_tree_lookup_slot
1 0.1637 vmlinux rap_sys_getrlimit
1 0.1637 vmlinux rb_erase
1 0.1637 vmlinux restore_nameidata
1 0.1637 vmlinux set_root
1 0.1637 vmlinux task_tick_fair
1 0.1637 vmlinux task_work_add
1 0.1637 vmlinux try_to_wake_up
1 0.1637 vmlinux unlink_file_vma
1 0.1637 vmlinux up_read
1 0.1637 vmlinux up_write
1 0.1637 vmlinux vfs_read
1 0.1637 vmlinux vma_set_page_prot
1 0.1637 vmlinux vsnprintf
1 0.1637 vmlinux wp_page_copy
Re: Kaby Lake vs Sandy Bridge
Posted:
Thu Mar 30, 2017 1:22 pm
by KDE
viewtopic.php?f=7&t=3046It looks like function __pax_open_userland is related to UDEREF which is related to SMAP which means Kaby Lake's SMAP will improve performance.
Is it correct?
Re: Kaby Lake vs Sandy Bridge
Posted:
Fri Apr 28, 2017 8:52 am
by spender
UDEREF doesn't currently make use of SMAP. Moving from Sandy Bridge to Kaby Lake though may improve UDEREF performance a bit by being able to use the invpcid instruction added around Haswell or so, which UDEREF will use when available.
-Brad