Application of NOLOAD in Custom Sections - CH573 HardFault Debugging Journal

Preface

I wanted to make a low-power composite macro keyboard and mouse, and I set my eyes on the CH571 chip. I quickly got a development board to work with.

According to the datasheet, the Data area is divided into 16K+2K, which can be powered off at low voltage. Since the RV32 stack grows downward, the original linker script directly placed the stack bottom at the end of RAM, which couldn't meet the requirements.

Starting the Modification

First, let's look at the original linker script


ENTRY( _start )

MEMORY
{
	FLASH (rx) : ORIGIN = 0x00000000, LENGTH = 448K
	RAM (xrw) : ORIGIN = 0x20003800, LENGTH = 18K
}

SECTIONS
{

    /* 省略 */

	.stack ORIGIN(RAM)+LENGTH(RAM) :
	{
		. = ALIGN(4);
		PROVIDE(_eusrstack = . );
	} >RAM  

}

Let's modify

Since we need to place specified data at the last 2K location, it's natural to define a custom section, which I'll call (.lram), and specify its address.

Modified linker script:


ENTRY( _start )

__lram_size = 256;

MEMORY
{
	FLASH (rx) : ORIGIN = 0x00000000, LENGTH = 448K
	RAM (xrw) : ORIGIN = 0x20003800, LENGTH = 18K
}

SECTIONS
{

    /* omitted */

	.stack ORIGIN(RAM)+LENGTH(RAM) - __lram_size :
	{
		. = ALIGN(4);
		PROVIDE(_eusrstack = . );
	} >RAM  
	
	.lram ORIGIN(RAM)+LENGTH(RAM) - __lram_size  :
	{
		. = ALIGN(4);
		*(.lram);
		*(.lram.*);
		. = ALIGN(4);
	} > RAM 

}

After defining it, define a structure in the source code and add the attribute according to GCC specifications, like this:


typedef struct {
    uint32_t passCode;
    uint8_t rsv[236];

} globalVar_t, *pglobalVar_t; // 256 Bytes max

__attribute__((section(".lram")))
globalVar_t globalVar;

After compiling successfully, I flashed it to the board using WCHISPStudio 3.40 and got a project that keeps rebooting with the indication that the reason is RPOR (real power-on reset).

Debugging the Issue

Searching High and Low

At first, I suspected the stack was corrupted, and I moved the stack address forward by 4 bytes, which fixed it. However, something still felt wrong ---- the original stack was at the end of RAM, and if we wrote to that address, it would definitely cause a HardFault, so that shouldn't be the reason.

After multiple unsuccessful attempts, I had to post on WCH's technical forum for help and got another blog post on debugging methods. Following that method to modify the HardFault interrupt and retrieve the exception location.

After flashing, it indicated that the HardFault interrupt was triggered. mepc was a value between 0x7000 - 0x7e00 (different each time after recompilation), mcause indicated an illegal instruction (0x2), and mtval was 0.

So I reviewed the listing and found the address to be a function inside the BLE library. The erroneous instruction was a simple li instruction, which loads an immediate value into a register and shouldn't be an illegal instruction.

Then I suspected a flashing problem and performed a flash verification, everything was normal. (BTW: without enabling code protection it wouldn't even flash, it's strange that verification succeeded in this case)

I also suspected hex-to-bin conversion went wrong, so I directly changed the output format to bin, which generated a 512M file (lol). I opened it with winhex to check the data at the corresponding location, which matched the listing. I then planned to flash this bin file, but it said the length was invalid (naturally, since 500M in the middle is reserved address space).

Finding the Clues

Still suspecting that the Flash content had problems, I modified the HardFault interrupt code to print 512 bytes before and after mepc through the serial port, like this:


__attribute__((interrupt("WCH-Interrupt-fast")))
__attribute__((section(".highcode")))
void HardFault_Handler(void) {

    SetSysClock(CLK_SOURCE_PLL_60MHz);
    DelayMs(1);
    uint32_t v_mepc, v_mcause, v_mtval;

    __asm volatile("csrr %0,"
            "mepc"
            : "=r"(v_mepc));

    __asm volatile("csrr %0,"
            "mcause"
            : "=r"(v_mcause));

    __asm volatile("csrr %0,"
            "mtval"
            : "=r"(v_mtval));

    uart1_send_uint32(v_mepc);
    uart1_send_newline();

    // Modify
    uint32_t val = 0;
    for(uint32_t i=v_mepc - 512;i<v_mepc+512;i+=4){
        FLASH_ROM_READ(i,&val, 4);
        uart1_send_uint32(val);
        uart1_send_newline();
    }

    uart1_send_newline();

    uart1_send_uint32(v_mcause);
    uart1_send_newline();
    uart1_send_uint32(v_mtval);
    uart1_send_newline();

    while(1);

}

After compiling and flashing, I got output like this:

sys_rst: 1
Lib Version: CH57x_BLE_LIB_V2.10
UST
00007fd8
94018793

/* omitted */

f7178082
47031fff
444126e7
fef714e3
fff50793
0ff7f793
00000000
00000000
00000000

/* 60 rows of 00000000 in total */

00000000
00000000
1ffff717
16f71723

/* omitted */

45dca819
0007d783

00000002
00000000

I noticed there were 60 rows of 0x00000000 in the middle, which were undoubtedly illegal instructions. The quantity was exactly 60 * 4 = 240 bytes, matching the size of the structure I defined.

Following the Thread

So the flashing error was confirmed. But was it a hex file problem or a flashing software problem? I opened the hex file and found the corresponding address, and discovered the data was correct. As follows:


:107F0000011126CA4AC806CE22CCAE84328936C6B2
:107F10003AC43EC242C0EF904FF1630E0510938504
:107F2000A4FFC2050566C1819308A6C72A844945F6
:107F300063E2B808130606C8636E2607636C9906E9
:107F4000B2469305301F63E7D50622476364E60611
:107F5000138616003306960293153700635CB60449
:107F60000346E400058A39E68325440913F66500D3
:107F700031E2924702482318E404034774002316B1
:107F8000F406231494062315240723170407231744
:107F9000D4048347A401014551E389EF8347840D4D
:107FA000898B99CF93E52500232AB408C947230D6F
:107FB000F4000145F2406244D244424905618280A6
:107FC000638B2405B305990085812685EFD0CE8A81
:107FD0002316A4049305C4044A85EF30204197F783
:107FE000FE1F83D767358D46114763E5F600850789
:107FF00013F7F70FA304E4042285EF80EFBF835744

Tip

You can use VSCode + Intel Hex plugin for syntax highlighting

However, later in the file, another section of similarly addressed content appeared, all zeros, totaling 240 bytes, as follows:


:020000020000FC
:020000042000DA
:107F00000000000000000000000000000000000071
:107F10000000000000000000000000000000000061
:107F20000000000000000000000000000000000051
:107F30000000000000000000000000000000000041
:107F40000000000000000000000000000000000031
:107F50000000000000000000000000000000000021
:107F60000000000000000000000000000000000011
:107F70000000000000000000000000000000000001
:107F800000000000000000000000000000000000F1
:107F900000000000000000000000000000000000E1
:107FA00000000000000000000000000000000000D1
:107FB00000000000000000000000000000000000C1
:107FC00000000000000000000000000000000000B1
:107FD00000000000000000000000000000000000A1
:107FE0000000000000000000000000000000000091

Before this hex section, there were two more instructions (refer to Intel Hex format) that set the base address offset to 0x20000000. This is exactly 512M, also explaining why the raw bin file was so huge.

It's easy to see that if only the lower 2 bytes of the base address were parsed while ignoring the high 0x2000, this data would be written to exactly that location, causing illegal instruction errors. It was indeed the ISP software's fault!

This killed me! It took exactly 2 days!

Verification and Fix

After removing this section from the hex file and reflashing, the program ran normally. I modified the linker script to add the NOLOAD attribute then, instructing the linker not to load this section into the target, only allocate memory. Like this:


    .lram ORIGIN(RAM)+LENGTH(RAM) - __lram_size (NOLOAD) :
	{
		. = ALIGN(4);
		*(.lram);
		*(.lram.*);
		. = ALIGN(4);
	} > RAM

After this, recompiling and reflashing, the program ran normally, and the hex file no longer had this 240-byte zero section.

Follow-up

I upgraded MounRiver Studio, and WCHISPStudio was also upgraded to version 3.50. Testing revealed that in this version, hex files compiled without NOLOAD cannot be flashed, and it says hex-to-bin conversion failed, which is considered patching the vulnerability (lol)

References

WCH BBS Post

CH57x/CH58x/CH32V wch risc-v chip hardfault issue tracking & program freeze tracking - iot-fan - Blog Park (cnblogs.com)

Detailed explanation of hex file format