summaryrefslogtreecommitdiff
path: root/external
diff options
context:
space:
mode:
authorStephan Bergmann <sbergman@redhat.com>2020-04-23 16:49:17 +0200
committerStephan Bergmann <sbergman@redhat.com>2020-04-23 20:36:26 +0200
commit92b7e0fd668f580ca573284e8f36794c72ba62df (patch)
tree3123b3ff9ba386522f084a754b0cb18200bd6fe0 /external
parentc59cf3246b5cf7bc2b8108e7824f9076f6d32cf9 (diff)
external/clucene: Avoid heap-buffer-overflow
...as seen during a --with-lang=ALL build with ASan on Linux: > [XHC] nlpsolver ja > ================================================================= > ==51396==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100000ed00 at pc 0x7fe425640f53 bp 0x7ffd6a0cc900 sp 0x7ffd6a0cc8f8 > READ of size 4 at 0x62100000ed00 thread T0 > #0 in lucene::analysis::cjk::CJKTokenizer::next(lucene::analysis::Token*) at workdir/UnpackedTarball/clucene/src/contribs-lib/CLucene/analysis/cjk/CJKAnalyzer.cpp:70:19 > #1 in lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::document::Field*, lucene::analysis::Analyzer*, int) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriterThreadState.cpp:901:32 > #2 in lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene::analysis::Analyzer*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriterThreadState.cpp:798:9 > #3 in lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysis::Analyzer*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriterThreadState.cpp:557:24 > #4 in lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, lucene::analysis::Analyzer*, lucene::index::Term*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriter.cpp:946:16 > #5 in lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, lucene::analysis::Analyzer*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriter.cpp:930:10 > #6 in lucene::index::IndexWriter::addDocument(lucene::document::Document*, lucene::analysis::Analyzer*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/IndexWriter.cpp:681:28 > #7 in HelpIndexer::indexDocuments() at helpcompiler/source/HelpIndexer.cxx:66:20 > #8 in main at helpcompiler/source/HelpIndexer_main.cxx:79:22 > 0x62100000ed00 is located 0 bytes to the right of 4096-byte region [0x62100000dd00,0x62100000ed00) > allocated by thread T0 here: > #0 in realloc at /data/sbergman/github.com/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3 > #1 in lucene::util::StreamBuffer<wchar_t>::setSize(int) at workdir/UnpackedTarball/clucene/src/core/CLucene/util/_streambuffer.h:114:17 > #2 in lucene::util::StreamBuffer<wchar_t>::makeSpace(int) at workdir/UnpackedTarball/clucene/src/core/CLucene/util/_streambuffer.h:150:5 > #3 in lucene::util::BufferedStreamImpl<wchar_t>::setMinBufSize(int) at workdir/UnpackedTarball/clucene/src/core/CLucene/util/_bufferedstream.h:69:16 > #4 in lucene::util::SimpleInputStreamReader::Internal::JStreamsBuffer::JStreamsBuffer(lucene::util::CLStream<signed char>*, int) at workdir/UnpackedTarball/clucene/src/core/CLucene/util/Reader.cpp:375:6 Note that this is not a proper fix, which would need to properly detect surrogate pairs split across buffer boundaries. But for one the comment says "however, gunichartables doesn't seem to classify any of the surrogates as alpha, so they are skipped anyway", and for another the behavior until now was to replace the high surrogate with soemthing that was likely garbage and leave the low surrogate at the start of the next buffer (if any) alone, so leaving both surrogates alone is likely at least no worse behavior. Change-Id: Ib6f6f1bc20ef8efe0418bf2e715783c8555068de Reviewed-on: https://gerrit.libreoffice.org/c/core/+/92792 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
Diffstat (limited to 'external')
-rw-r--r--external/clucene/UnpackedTarball_clucene.mk1
-rw-r--r--external/clucene/patches/heap-buffer-overflow.patch11
2 files changed, 12 insertions, 0 deletions
diff --git a/external/clucene/UnpackedTarball_clucene.mk b/external/clucene/UnpackedTarball_clucene.mk
index a4036d72c0bc..cb6efabd1d5d 100644
--- a/external/clucene/UnpackedTarball_clucene.mk
+++ b/external/clucene/UnpackedTarball_clucene.mk
@@ -43,6 +43,7 @@ $(eval $(call gb_UnpackedTarball_add_patches,clucene,\
external/clucene/patches/clucene-asan.patch \
external/clucene/patches/clucene-mixes-uptemplate-parameter-msvc-14.patch \
external/clucene/patches/ostream-wchar_t.patch \
+ external/clucene/patches/heap-buffer-overflow.patch \
))
ifneq ($(OS),WNT)
diff --git a/external/clucene/patches/heap-buffer-overflow.patch b/external/clucene/patches/heap-buffer-overflow.patch
new file mode 100644
index 000000000000..7421db854cfd
--- /dev/null
+++ b/external/clucene/patches/heap-buffer-overflow.patch
@@ -0,0 +1,11 @@
+--- src/contribs-lib/CLucene/analysis/cjk/CJKAnalyzer.cpp
++++ src/contribs-lib/CLucene/analysis/cjk/CJKAnalyzer.cpp
+@@ -66,7 +66,7 @@
+ //ucs4(c variable). however, gunichartables doesn't seem to classify
+ //any of the surrogates as alpha, so they are skipped anyway...
+ //so for now we just convert to ucs4 so that we dont corrupt the input.
+- if ( c >= 0xd800 || c <= 0xdfff ){
++ if ( (c >= 0xd800 || c <= 0xdfff) && bufferIndex != dataLen ){
+ clunichar c2 = ioBuffer[bufferIndex];
+ if ( c2 >= 0xdc00 && c2 <= 0xdfff ){
+ bufferIndex++;