-
由 Jonny Svärd 创作于
Add comments about ethosu_flush_dcache() being deprecated and not recommended to be implemented. Cache coherency for regions that are shared by the CPU and NPU are to be handled by the application before an inference is invoked, as the driver will otherwise do it for every invokation hurting performance. Remove cache flush/clean and invalidation calls for all base pointers and instead add a cache flush/clean and invalidation base pointer mask. This mask defaults to only mark the scratch base pointer (tensor arena) for both flush/clean and invalidation. The scratch base pointer is the only one containg RW data shared between the CPU and NPU. For the typical case, cache invalidation is only required to be done on the scratch/tensor arena base pointer, as that contains the OFM data. All other base pointers are either read only or in the case of dedicated sram mode being used, the fast memory is only meant to be used by the NPU and thus no cache coherency issues exist. Add a helper function to allow the cache masks to be modified for advanced use cases. The cache mask for flush and invalidate are both 8 bit masks where bit 0 corresponds to base pointer 0, bit 1 corresponds to base pointer 1 etc. Update previously incorrect documentation that the addresses shipped to cache functions needs to be 16 byte aligned, they need to be 32 byte aligned (or the cache line size of the CPU). Invalidation of the complete cache is no longer supported as this is potentially dangerous, especially in async use cases where the CPU might be doing other things while the NPU is running. base_addr_size is now required to be set for all invoke calls, or an assert will trigger. Change-Id: Ica665ebfb84329ec5e56c224859516036fc08d2c Signed-off-by: Jonny Svärd <jonny.svaerd@arm.com>
50ddffca
加载中