-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Snippets][CPU] Added external repacking via BrgemmCopyB #28179
base: master
Are you sure you want to change the base?
Changes from all commits
2b536b7
b665659
72bf13a
4b33eaa
f1c7435
b6ffdaf
cced16d
fea453e
ed31224
9bb9646
e1951cc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -118,7 +118,23 @@ void RuntimeConfigurator::init_data_info(const lowered::LinearIRCPtr& linear_ir) | |
// input->shape changing ops->load | ||
PortDescriptorPtr desc = nullptr; | ||
const auto& shape_infer_seq = utils::get_first_child_shape_infer_expr_seq(param); | ||
const auto& mem_desc_expr = shape_infer_seq.empty() ? param : shape_infer_seq.back(); | ||
ExpressionPtr mem_desc_expr = param; | ||
if (!shape_infer_seq.empty()) { | ||
// If there is ReshapeWithOrder, we should take its desc because it affects on shape by target order | ||
const auto& reordered_reshape_it = std::find_if(shape_infer_seq.cbegin(), shape_infer_seq.cend(), | ||
[](const ExpressionPtr& expr) { | ||
return ov::is_type<op::ReshapeWithOrder>(expr->get_node()); | ||
}); | ||
if (reordered_reshape_it != shape_infer_seq.cend()) { | ||
const auto& reshape = *reordered_reshape_it; | ||
const auto& etype = reshape->get_node()->get_output_element_type(0); | ||
update_io_parameters(reshape->get_input_port_descriptor(0), etype); | ||
continue; | ||
} | ||
Comment on lines
+124
to
+133
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we check only |
||
|
||
mem_desc_expr = shape_infer_seq.back(); | ||
} | ||
|
||
auto consumer_inputs = mem_desc_expr->get_output_port_connector(0)->get_consumers(); | ||
for (const auto& child_input : consumer_inputs) { | ||
const auto ma = std::dynamic_pointer_cast<snippets::modifier::MemoryAccess>(child_input.get_expr()->get_node()); | ||
|
@@ -127,6 +143,7 @@ void RuntimeConfigurator::init_data_info(const lowered::LinearIRCPtr& linear_ir) | |
break; | ||
} | ||
} | ||
OPENVINO_ASSERT(desc, "Descriptor is missed!"); | ||
const auto& etype = mem_desc_expr->get_node()->get_output_element_type(0); | ||
update_io_parameters(desc, etype); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,12 @@ | |
#pragma once | ||
|
||
#include "emitters/snippets/jit_snippets_call_args.hpp" | ||
|
||
#ifndef OPENVINO_ARCH_ARM64 | ||
# include "emitters/snippets/x64/kernel_executors/brgemm_copy_b.hpp" | ||
#endif | ||
|
||
#include "cache/multi_cache.h" | ||
#include "memory_desc/cpu_blocked_memory_desc.h" | ||
#include "snippets/lowered/port_descriptor.hpp" | ||
#include "snippets/runtime_configurator.hpp" | ||
|
@@ -21,27 +27,61 @@ class CPURuntimeConfig : public ov::snippets::RuntimeConfig { | |
std::string to_string() const override; | ||
#endif | ||
|
||
#ifndef OPENVINO_ARCH_ARM64 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can move this class to a separate file? In this case, we could move all ifdefs there and avoid it in configurator's code There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's one option. Alternatively, we can create a base class for repacking kernels, so RepackedInput could be reused as-is on ARM . I think it would allow us to create a better interface. |
||
struct RepackedInput { | ||
RepackedInput() = default; | ||
RepackedInput(std::shared_ptr<const BrgemmCopyBKernel> kernel, | ||
CpuBlockedMemoryDescPtr desc, | ||
VectorDims in_offsets, | ||
VectorDims out_offsets); | ||
|
||
const std::shared_ptr<const BrgemmCopyBKernel>& kernel() const; | ||
const CpuBlockedMemoryDescPtr& desc() const; | ||
const VectorDims& in_offsets() const; | ||
const VectorDims& out_offsets() const; | ||
|
||
private: | ||
std::shared_ptr<const BrgemmCopyBKernel> m_kernel{nullptr}; | ||
CpuBlockedMemoryDescPtr m_desc{nullptr}; | ||
VectorDims m_in_offsets{}; | ||
VectorDims m_out_offsets{}; | ||
}; | ||
std::unordered_map<size_t, RepackedInput> repacked_inputs = {}; | ||
|
||
enum class RepackingImplType { | ||
NONE, // no kernel-outside repacking | ||
IN_PARALLEL, // should be executed in parallel_nt by each thread | ||
SEPARATE, // should be separathy from kernel executed | ||
}; | ||
RepackingImplType repacking_impl_type = RepackingImplType::NONE; | ||
#endif // OPENVINO_ARCH_ARM64 | ||
|
||
std::vector<jit_snippets_call_args::loop_args_t> loop_args = {}; | ||
std::unordered_map<size_t, CpuBlockedMemoryDescPtr> m_in_requested_descs = {}; | ||
}; | ||
|
||
class CPURuntimeConfigurator : public ov::snippets::RuntimeConfigurator { | ||
public: | ||
CPURuntimeConfigurator(); | ||
CPURuntimeConfigurator(ov::intel_cpu::MultiCacheWeakPtr cache = {}); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor: do we need a default argument here? Wouldn't it be safer to force user to always provide a cache pointer? |
||
|
||
/** | ||
* @brief Calculate Loop parameters of Loop emitters and update these values in CPURuntimeConfig | ||
* @param linear_ir LinearIR | ||
*/ | ||
void update_loop_args(const ov::snippets::lowered::LinearIRCPtr& linear_ir) const; | ||
|
||
const ov::intel_cpu::MultiCacheWeakPtr& get_cache() const { | ||
return compiled_kernel_cache; | ||
} | ||
Comment on lines
+72
to
+74
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do I understand correctly that this cache argument is needed solely for |
||
|
||
protected: | ||
void update(const ov::snippets::lowered::LinearIRCPtr& linear_ir) override; | ||
void update_tensor_rank(const ov::snippets::VectorDims& master_shape) const override; | ||
void init_tensor_rank(const ov::snippets::lowered::LinearIRCPtr& linear_ir) const override; | ||
void initialization(const ov::snippets::lowered::LinearIRCPtr& linear_ir) override; | ||
|
||
static const size_t rank6D; | ||
|
||
ov::intel_cpu::MultiCacheWeakPtr compiled_kernel_cache; | ||
}; | ||
|
||
} // namespace intel_cpu | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor thoughts regarding naming: reshape usually doesn't change the order: in OV ops semantic, it is usually done by Transpose, in plugin graph -- by Reorder. Maybe it is better to refer to one of these ops for the naming?
To point out that the tensor data is not updated, we could use "Fake" prefix in the name. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like idea with
FakeReorder
. However, during our offline discussion with Ivan, I got him that it should beReshape
since it just changes shape. AlsoReorder
is more CPU-related op. And in this case, we would implement it like CPU-op and the we will have problems withSubgraph::is_shape_infer_op
(Subgraph know nothing about backend-specific ops)@IvanNovoselov could you please share your opinion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the highest priority is self-consistency inside Snippets. Alignment with OV is also important but has a lower priority.
That said, we already have
LoadReshape
in Snippets that can change layout, we also haveReshape
that changes shape without actually moving data. So it seems logical to allowReshape
to update a shape with accordance to a desired layout.@a-sidorova, I was thinking about just introducing a new constructor for
Reshape
instead of creating a whole new op. Do I understand correctly that you've created a new op because it's easier to implement shape infer this way? If so, I think that it's time to introduce a base class for shape_infer ops. What do you think?@v-Golubev, getting back to your question: I don't mind to rename ReshapeWithOrder to Reorder as long as we introduce a base class for shape infer ops (all derived ops are "fake" by definition) and rename LoadReshape. How about this option?